How Duolingo’s AI Learns What You Need to Learn
Next | Issue #59
Hi there!
Today, I will share several interesting stories: hidden patterns in Albanian street names, a dark RStudio theme, a tool to embed Python in R, how to make better bar plots, and of course, how Duolingo’s AI works. This article was fascinating for me to read. It goes in-depth while keeping sufficiently general.
There is also a small poll at the end that I’d love for you to respond. Dive in!
Five Stories
1. Hidden Patterns in Street Names
Dea Bardhoshi
Dea explores the gender distribution of street names in Tirana, Albania, using data from OpenStreetMap and hand-labelling (hats off for the effort!). The author found that only 3.3% of the street names were named after women, and most of them were foreign women. She also used natural language processing techniques to analyze the most common words and topics in the street names and found that they reflected the history and culture of Albania.
Dea said she’ll be detailing more learnings in her upcoming newsletter, which I recommend. Jupyter Notebook and Dataset are available on her GitHub.
2. night-owlish: A RStudio Theme
Mara Averick
This project adapts a popular VS Code theme called Night Owl, created by @sdras, to other editors such as Ace and RStudio. The theme is pleasing to see if you’re a dark-mode person. Here’s the code to install and apply it with one line of code. (To install but not apply, set apply = FALSE
.)
rstudioapi::addTheme("https://raw.githubusercontent.com/batpigandme/night-owlish/master/rstheme/night-owlish.rstheme", apply = TRUE)
3. Embedding Python in R
Sage Bionetworks
Using Python from R is a common task for professional data scientists. This package provides a standalone installation for embedding with R. Thus, keeping the system installation separate from the R version. Quite handy for handling all those conflicting package versions.
4. How Duolingo’s AI Learns What You Need to Learn
Klinton Bicknell, Claire Brust and Burr Settles, Duolingo AI Team
Duolingo is a language-learning app that uses a gamelike approach with sophisticated AI systems to guide users through a curriculum that leads to language proficiency. One of the AI systems, called Birdbrain, uses algorithms based on decades of research in educational psychology and recent advances in machine learning to continuously improve the learner's experience.
The company's ambitions go beyond language learning, as it recently launched apps covering childhood literacy and third-grade mathematics. Duolingo's founders were inspired by the 2-sigma problem identified by educational psychologist Benjamin Bloom. They aimed to make an easy-to-use online language tutor that could approximate the supercharging effect of individual tutoring. To automate the three critical attributes of good tutors, Duolingo uses machine learning and other cutting-edge technologies to ensure expertise, keep learners engaged, and provide personalized lessons.
This is a fascinating read with sufficient technical details. Do check it out.
5. Bar plot checklist
Albert Rapp
Bar plots are the most common plots I use daily. Some common modifications I do: flipping coordinate axes, arranging them in decreasing order and sometimes changing colours to show deviation.
In this post, Albert covers a bunch of such techniques. You should bookmark it for the next time you create a bar plot in R.
Four Packages
rcrossref is the R interface to CrossRef’s API. Github.
ezsummary provides some convenient functions for wrangling data. Github.
sparkline provides jQuery-based sparklines, which can also be used in R Markdown documents. Github.
rticles provides R Markdown and LaTeX templates for various journals. You can see the list. Github.
Three Jargons
Convolutional Neural Network (CNN): A type of neural network commonly used for image recognition, which uses convolutional layers to detect features in the input image.
Ensemble Learning: A technique where multiple machine learning models are trained and combined to improve the system's overall performance.
Bias-Variance Tradeoff: A fundamental concept in machine learning where the goal is to find a model that balances the tradeoff between overfitting and underfitting by managing the bias and variance of the model.
Two Tweets


