

Discover more from Next — Today I Learnt About Data Science
Yesterday, Microsoft announced the new Bing. It’s an exciting time for search. Speaking of AI, you should check futuretools.io: a directory of free, freemium and paid AI tools.
Let’s dive in!
Five Stories
1. RMarkdown/Quarto Tips and Tricks
Indrajeet Patil
This website provides several tips and tricks for working with R Markdown and Quarto files. My favourite ones:
Indrajeet also posted an R function daily on Twitter for a year, which I always looked forward to.
2. Microsoft kickstarts the AI arms race
Casey Newton
Search hasn’t changed a lot in the past twenty years. It was a collection of links, and it is a collection of links. The collection may have improved, but it’s still a collection of links. Microsoft has partnered with OpenAI to bring an improved version of ChatGPT to the masses. The new version of Bing, powered by ChatGPT, will offer a new paradigm in search with rapid innovation. Microsoft executives were exultant in the launch event today as they demonstrated how the reimagined Bing could instantly generate results.
We’re walking closer and closer to a world organised by search.
It will be integrated closely with Edge providing services like summarising a PDF with a click, generating LinkedIn posts with prompts, converting a StackOverflow answer written in Python to R and much more. Let’s see what Google’s Bard has to show us. (Google’s special event is today.)
3. Principal Components Analysis (in Python)
Steven Morse
The Principal Components Analysis is a staple in the data science world, but it seems that it has been misunderstood and underappreciated. With multiple definitions and approaches, getting lost in the jumble of information is not hard. In this post, Steven Morse takes a fresh look at PCA, exploring it in a way that is informative and easy to understand.
In the final section, he also shows its performance with image compression, which is really cool!
4. Open source in pharma from five perspectives
Posit PBC.
The pharmaceutical industry uses R extensively. Posit joined hands with five pharmaceutical stakeholders to discuss investment in and long-term commitment to open-source software, particularly in R as a statistical platform.
Shifting to an Open-Source Backbone in Clinical Trials with Roche
Data Science Hangout with Eric Nantz (Eli Lilly)
Data Science Hangout with Mike Smith (Pfizer)
5. My stripper earnings per shift over four years
u/nerdydancing
Reddit is rightly called the front page of the internet. After seeing this post, you’d agree. u/nerdydancing tracked her earnings on each shift for four years. If any dataset promised stories behind each data point, it is probably this one.
Four Packages
MakeItTalk is a Python package developed by Adobe that converts a picture to an audio-led animated video. Marlene Mhangami showed us an example!
GPfit is an R package for fitting computationally stable Gaussian processes (GP) model to a deterministic simulator.
ggdist is an R package for visualising uncertainty. It’s an extension of the ggplot2 world. Matt Dancho shows how to create raincloud plots with them.
graphlayouts provides methods for plotting graphs that are missing from igraph. Website.
Three Jargons
GPT stands for Generative Pre-training Transformer. It is a deep-learning model by OpenAI that is trained to generate human-like text by predicting the next word in a sequence of words, given a prompt.
Transformer is a deep-learning model invented by Google for natural-language processing. It is based on a deep learning algorithm that uses self-attention to process text and generate meaningful representations of words and phrases.
Self-attention is an ML technique that helps a model understand the relationships between different elements in a sequence. It helps the model determine which words in a sentence are most important for determining their meanings. The model does this by assigning weights to each word, indicating how much attention should be paid to it.
Two Tweets



One Meme
That’s a wrap!
I hope you enjoyed today’s letter. See you next week!
Harsh