Who are Twitter Blue users?

Next — Today I Learnt About Data Science | Issue #67

Harshvardhan

Apr 12, 2023

Hi there!

I have a puzzle for you:

How big would a wall clock have to be so that the tip of its minute hand moves a centimeter per second?

Give it a try. Answers at the end.

Five Stories

Who are Twitter Blue Users?

Harshvardhan

My recent blog post analyzes the trends and characteristics of Twitter Blue subscribers, based on data collected by Travis Brown. It shows that most subscribers are regular users with few followers, and that many of them have unsubscribed or been suspended. Elon Musk is the only one among the top-10 most popular accounts to have a Blue.

In fact, Blue gained most subscribers within the first two weeks of launch. With White House, New York Times and many important accounts refusing to pay for the badge, it remains to be seen how Elon Musk handles this looming chaos. You can find R Codes on Github.

R for Applied Epidemiology and Public Health

Edited by Neale Batra

The book is a guide for epidemiologists who want to use R, a programming language and software environment for statistical computing and graphics. It shows how to perform common epidemiological tasks with R code and examples. It also helps epidemiologists learn R skills and apply them to their work.

Junk Charts

Kaiser Fung

This blog is about data visualization and data analysis. The author reviews and critiques various charts and graphs from different sources, and offers suggestions for improvement.

Some cool ones:

Lay off bubbles (How bubble charts scale)
Some chart designs bring out more information than others (Why bump charts are better than two-column bar plot to show differences)
An interactive map that mostly works, except for the color scale (How not to use colours)

Storytelling in ggplot using rounded rectangles

Albert Rapp

This blogpost is about how to create rounded rectangles in ggplot2. Albert shows two ways to achieve this effect: one using the ggchicklet package, and another using grobs, which are graphical objects that can be manipulated. (TIL about grobs.)

He also demonstrates how to improve a plot by highlighting words and adding text elements. Check it out!

National Geographic Society World Water Map

National Geographic

This story is about the global water crisis and how humans are using more water than the water cycle can provide. It is based on a model developed by Utrecht University that shows where and why water gaps arise, how climate change might worsen them, and how they might be managed

Water gap, difference between supply and demand of water, is unevenly distributed though observed globally. Haryana, a agri-dependent state in India has one of the highest water gaps.

India has had to pump more groundwater than any other country.
The bulk of it is for irrigation. In the arid northwestern states of Punjab and Haryana, thirsty rice and wheat are now the dominant crops, and wells the main source of irrigation water. The water table is sinking up to three meters a year.

Choosing between mass famine and groundwater depletion, the Indian government chose the latter.

Four Packages

Segment Anything is an open-source project for image segmentation by Meta AI. It includes a promptable model (SAM) and a huge dataset (SA-1B) with 1 billion masks. SAM can segment any object in any image using different types of prompts, such as clicks, boxes, text, etc. Blogpost. Github.

nanoGPT is the simplest, fastest method to train your own GPT. You will be able to train GPT-2 model on your computer from scratch pretty quickly. It is great for you to experiment with GPT models and learning how it’s trained. Github.

openai is R package for communicating with Open AI’s API. There are functions for GPT with chatting capabilities, Dall-E for generating images and Whisper for converting from speech-to-text. Vignette. Github.

pillar is the package for styling columns of data, artfully using colour and unicode characters to guide the eye. It powers the Tidyverse and RStudio print outputs. Vignette. Github.

Three Jargons

Endogeneity

Endogeneity is a term used in econometrics to describe a situation where an independent variable (or explanatory variable) is correlated with the error term in a regression model. This correlation may arise from omitted variables, measurement error, or simultaneity, leading to biased and inconsistent estimates.

Instrumental Variable (IV)

Instrumental Variable is an econometric technique used to address endogeneity issues in regression models. The method involves using an external variable, known as an instrument, which is correlated with the endogenous explanatory variable but uncorrelated with the error term.

The IV estimator replaces the endogenous variable with its predicted values from the first-stage regression of the endogenous variable on the instrument. This method helps to obtain consistent estimates of the causal effect of the endogenous variable on the dependent variable, assuming that the instrument satisfies the required conditions of relevance and exogeneity.

Granger Causality

Granger Causality is a statistical hypothesis test used to determine whether one time series can help forecast another time series. It is based on the idea that if a variable X Granger-causes variable Y, then past values of X should contain information that helps predict Y. The test involves estimating two separate vector autoregression (VAR) models, comparing the prediction errors, and using statistical tests such as the F-test or the likelihood ratio test to determine if including the lagged values of X significantly improves the prediction of Y.

Two Tweets

https://twitter.com/rappa753/status/1644354374779695105

https://twitter.com/rappa753/status/1645441544630185985

(Apparently, Twitter has blocked Substack from embedding tweets into newsletters. Please just click the links. 🤷‍♂️)

One Meme

Bonus

This is an album that I’m listening these days. It is blissful instrumental, helps me focus and unwind. First caught it in an interview of Daniel Ek, Spotify’s CEO.

Puzzle Answer

A wall clock would have to be approximately 1145.92 cm (or about 11.46 meters) in diameter for the tip of its minute hand to move at 1 centimeter per second. Solution in next issue!

Next — Today I Learnt About Data Science

Who are Twitter Blue users?

Next — Today I Learnt About Data Science | Issue #67

Five Stories

Who are Twitter Blue Users?

Harshvardhan

R for Applied Epidemiology and Public Health

Edited by Neale Batra

Junk Charts

Kaiser Fung

Storytelling in ggplot using rounded rectangles

Albert Rapp

National Geographic Society World Water Map

National Geographic

Four Packages

Three Jargons

Endogeneity

Instrumental Variable (IV)

Granger Causality

Two Tweets

One Meme

Bonus

Puzzle Answer

Discussion about this post