Hidden Patterns in Albanian Street Names

Next | Issue #62

Mar 08, 2023

Hi there!

Today’s stories will tell you about patterns in Albanian street names, how to use Github Copilot for R, college students’ opinion on ChatGPT and more.

Let’s dive in.

Five Stories

How are businesses using ChatGPT?

OpenAI

Last week, OpenAI released APIs for their most recent model that powers ChatGPT at a 10x cheaper prices. Furthermore, they also released APIs for Whisper, a speech to text engine designed for multilingual transcription which was open-sourced over a year ago.

In this blog, OpenAI demonstrates how different companies are using their tech to build interesting features. Like, Snap included My AI — a chat bot within the app. Quizlet has a fully-adaptive tutor for teaching you literally anything. You can ask Instacart for lunch ideas and it’ll plan grocery shopping for you. Notion AI can help you plan and execute. And more.

With the API cost so low, $0.002 per 1K tokens, I think there will be many new apps coming in with the capability.

Read on

Hidden Patterns in Street Names [Part 2]

Dea Bardhoshi

I wrote about Dea’s analysis of Albanian street names a few weeks ago. In a yet another curious piece, Dea discovers who are these people who get this privilege.

The biggest group seems to be politicians, writers and fighters. (These are based on manual labelling by her; kudos for industrious effort.) Which politicians? Late 19th and early 20th century. Communist politicians are the fourth biggest category. Others in the Top-20 are photographers, researchers, and more.

Read on

Github Copilot for R

David Smith

David Smith presented a talk to the NYC Data Hackers on how to use Copilot for R in Visual Studio Code and how it works behind the scenes with OpenAI Codex and Azure OpenAI Service. He showed how to access OpenAI’s Codex and other models from R, and how to tidy datasets with Tidyverse. Github repo.

I’ve used Copilot for a long time with Python. Maybe it’s time I checked it for R!

College students aren’t excited about ChatGPT

Neal Freyman, Morning Brew

In a very small sample study by Morning Brew/Generation Lab, journalists found that 40% of college students had never heard of ChatGPT. Of those who had heard of it, more than half (52%) had never tried it.

Most of those who use it (71%), use it for entertainment, while a significant portion (32%) use it for quick answers. Only 17% reported they knew someone who had cheated using ChatGPT.

Of course, self-reported numbers don’t mean much but maybe educators don’t have much to be scared of.

Read on

Data from satellites reveal the vast extent of fighting in Ukraine

The Economist

The Economist used satellite data from NASA and Sentinel-1 to track the extent and impact of the war in Ukraine, which has affected 14% of municipalities and damaged many buildings. By combining two satellite-based systems that detect fires and changes in building signals, journalists were able to map the war in Ukraine more comprehensively than social media sources.

The satellite data revealed that fighting was not limited to the front lines, but also occurred in areas far from the conflict zone. The data also showed that Ukraine increased its use of American rockets after June last year.

Read on

Four Packages

DataExplorer helps you explore and visualise your data. It’s create_report() is absolute blast. The function can generate basic statistics, data structure, missing data profile, distributions, correlations and PCA in a single R Markdown report. Github.

esquisse adds a drag-and-drop interface for creating plots in R. Simple Shiny app, hugely useful! Github.

calendR can create ready to print calendars with ggplot2. Quite handy. Github.

generativeart can create wonderful art based on mathematical formulations. Check it’s Github to see what it can do.

Three Jargons

Bucketing: A mechanism for grouping categorical data, especially when the number of categories is large, but the number of categories actually appearing in the data is comparatively small.

Cross-validation: A technique for assessing the performance of a machine learning model by splitting the data into multiple subsets and using some of them for training and some of them for testing.

Confounding: A situation where the relationship between a predictor and an outcome variable is distorted by the presence of another variable that affects both of them.

Two Tweets

terence fosstodon @researchremora

Confirmed hate crime incidents in New York City, 2019 to 2022. Inspired and made possible by @GilbertFontana and @R_Graph_Gallery. Data from NYPD. Hard to figure out what confirmed means but at least NYC made its data easily accessible. #ggplot2 adventures, an #rstats tale

9 line graphs showing the number of confirmed hate crimes in New York City from 2019 to 2022

From left to right (top row): Anti-Asian, Anti-Jewish, Anti-Gay
From left to right (middle row): Anti-Black, Anti-Hispanic, Anti-Religion
From left to right (bottom row): Anti-Transgender, Anti-White, Anti-Female

Anti-Asian hate crimes hit a peak of 36 in 2021, second to Anti-Jewish (50).

samia @samiasab90

Finally think I'm complete with this little mini-data viz project! Four charts to show Lebron James' career after breaking the all-time scoring record. Made with #RStats and #Python Code: github.com/samiaab1990/Da…

samia @samiasab90

An NBA shot chart for all szns of Lebron's career so far. Size = shot frequency, color = FG% vs league average. Inspired by the design of @kirkgoldsberry. Thanks to the ballR package for guiding the design of the chart! More plots to come soon 😊 #Python - cleaning #Rstats - viz https://t.co/mBuUrPuAex

One Meme

Bonus

Learn history by visually exploring maps at History Maps. Here’s the one for Indian History. Pretty cool!

Next — Today I Learnt About Data Science

Discussion about this post