Hi there!
In today’s letter we will talk about the new open source text-to-audio generator, how Meta is trying to make Wikipedia more accurate with citations, map vs loops in R, a demo of GPT, and more.
A lot of interesting content ahead, keep reading!
Five Stories
How AI could help make Wikipedia entries more accurate
Meta AI
Meta AI has developed an AI system that can automatically verify and suggest citations for Wikipedia articles using a large dataset of web pages and natural language understanding techniques. The system uses a web-scale retrieval library called Sphere to find relevant sources among millions of web pages, and an evidence-ranking model to compare the sources with the claims and rank them according to their likelihood of support.
It can help Wikipedia editors spot and fix citation issues at scale, improve the quality and reliability of Wikipedia, and advance AI research toward smarter systems that can reason about real-world knowledge with more complexity and nuance.
Observable Plot for Fast Interactive Data Visualizations in JavaScript
Allison Horst
This talk is about using JavaScript and Observable Plot for data visualization. Allison notes that JavaScript has become popular in the data world lately and compares the syntax and terminology of ggplot
and Observable Plot to show their similarities. Observable Plot allows for easy interactivity and customization of plots through widgets and JavaScript cells, making it a powerful tool for data visualization.
If you’re looking for an example of Observable Plot, here is one on words known better by males than females, and vice versa.
Using functions instead of for-loops
Albert Rapp
To demonstrate the use of functions in R, Albert wrote an article explaining how to use the map()
function from the {purrr}
package to apply functions instead of for loops to nested data. The use of map()
in R allows for more concise syntax and eliminates the need for bookkeeping code.
The map()
function takes a list and a function as arguments. The function is then applied to each element of the list, and the results are collected in a single output. In R, the function can be declared using either function(x)
or the shorthand \(x)
syntax. He further compares them to map functions in Java Script.
Watch an A.I. Learn to Write by Reading Nothing but…
Aatish Bhatia
In this article, we’ll watch an A.I. — which we’re affectionately calling BabyGPT — try to learn language by reading only the complete works of Jane Austen. It sees just the nearly 800 thousand words in this text — and nothing else.
Here’s an archived static version in case you do not have NY Times subscription.
Bark: Text-Prompted Generative Audio Model
Suno AI
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying.
The usage in Python is dead simple. Hardly ten lines of code. It supports English, Hindi, Chinese, French, German, Spanish, and more. The model is based on encodec, Meta’s Hi-Fi audio compression algorithm, Google’s AudioLM, Andrej Karpathy’s nanoGPT, and more.
Four Packages
bark
is a Python package that can take in a text and create an audio for it. The text could include expressions like “laughs” and “cries”, and also musical notes for generating music. Github.
nanoGPT
is the smallest GPT version that you can play with on your own computer. It is designed for beginners to play and explore the process of creating GPTs. Github.
fun
is a collection of R games and other funny stuff, such as the classic Mine sweeper and sliding puzzles. Github.
waffle is to make waffle plots in R. Use them instead of pie charts. (waffle >> pie) Github.
Three Jargons
Multithreading: A technique that allows a single program to execute multiple tasks concurrently, making better use of available CPU resources.
Multiprocessing: A parallel processing approach where multiple CPUs or cores work on separate tasks simultaneously, improving overall performance.
Garbage Collection: An automatic memory management process that frees up memory occupied by objects or data structures that are no longer needed by the program.
Two Tweets
https://twitter.com/mdancho84/status/1653378059435749377
https://twitter.com/DrJimFan/status/1651968203701231616
One Meme
Bonus
https://twitter.com/culturaltutor/status/1652308781416497152