The Art of Linear Algebra

Next — Today I Learnt About Data Science | Issue #80

Jul 19, 2023

Hi there!

Some housekeeping: I am taking a short break from the current format of Next for the next few weeks. I’m putting my energy into some other projects (details to follow). Instead, I would share some long-form writings. I would still include some external articles, but more as links and references, rather than introductory summaries.

In today’s letter, I start by an artful piece on linear algebra, a visual exploration of Norway’s EV sales, some new LLM news, and more. Read on!

Five Stories

The Art of Linear Algebra

The Art of Linear Algebra brings the beauty of mathematics to life! This repository, created by kenjihiranabe, is a collection of graphic notes based on Gilbert Strang's "Linear Algebra for Everyone". It's a fantastic resource for anyone looking to understand linear algebra in a more visual and intuitive way.

Full page PDF is available here: https://github.com/kenjihiranabe/The-Art-of-Linear-Algebra/blob/main/The-Art-of-Linear-Algebra.pdf

Norway EV Sales

Norway is going gaga over electronic vehicles. Less than 20 years ago, 100% of cars were petrol or diesel run. Now, less than 10% are in that group. The streets run afoot with Tesla everywhere. In this graphic piece, Robbie Andrew explores various aspects of EV sales in Norway.

Fun story: my first Tesla ride ever was with two anonymous ladies in Oslo in 2019 where I literally told them “I’ve never experienced a Tesla ride. Would you mind giving me a ride to the next bus stop?” I half-expected them to bolt, but to my surprise, they were all for it!

Danswer: OpenSource Enterprise Question-Answering

Danswer is an exciting project that's gaining traction in the tech community! It's a platform that allows you to ask questions in natural language and get answers backed by private sources.

This means you can connect it to tools like Slack, GitHub, Confluence, and more to get precise and relevant answers to your queries. The project is open-source and has already garnered 2.4k stars on GitHub, indicating its popularity and the value it provides to its users. The project is licensed under the MIT license, promoting open and unrestricted use while maintaining credit to the original creators.

It requires self-hosting and is free to use.

Llama 2 - Meta AI

Get ready for Llama 2, the next generation of Meta AI's open-source large language model! This model has a permissible license: free for both research and commercial use. Llama 2 comes with model weights and starting code for pretrained and fine-tuned language models, ranging from 7B to a whopping 70B parameters. It's trained on 2 trillion tokens and has double the context length of Llama 1.

The fine-tuned models have been trained on over 1 million human annotations, ensuring high-quality results. Llama 2 outperforms other open-source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. This model has received support from a broad range of global partners and supporters, including Microsoft, Amazon Web Services, Spotify, T-Mobile, Reliance Jio, OpenUK, Dropbox, IBM, Nvidia, and more.

Check it out here!

Claude v2

Claude, another AI chatbot created by Anthropic, is now publicly available for free. It has practically the same offering as ChatGPT, but has a better user interface. It also supports file uploads, which makes the experience a lot smoother. In my limited testing, I found it was more likely to hallucinate than GPT-4. Some benchmarks.

Although Claude might not interest you directly, Cody, an AI powered IDE-assistant might pull your attention. Cody is free for personal use.

Anthropic is an interesting company. It was founded by a few OpenAI researchers who believed in a different approach to creating large language models; valuing safety over everything else. They began by researching AI alignment but quickly realised that to build state of the art safety measures, they need to build state of the art language models. It is a good juggle between building a useful model and safe-and-dull one. Check out this New York Times article on them.

Four Packages

charlatan is an R package to generate fake data in R. Vignette. Github.

faker is a Python package that generates fake data for you. Vignette.

ivreg provides a comprehensive implementation of instrumental variables regression using two-stage least-squares (2SLS) estimation. Vignette. Another vignette.

RSelenium provides R bindings for the Selenium Webdriver API. Selenium is a project focused on automating web browsers. Vignette.

Originally in

Mojo may be the biggest programming language advance in decades

Harshvardhan

May 10, 2023

Read full story

Three Jargons

Duck Typing: This is a programming concept that Python implements, named after the phrase "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck." In Python, it means that you don't care about the type of an object, you care about what it can do. If an object can perform the required operations, then it is considered suitable, regardless of its actual class or type.
Decorator: In Python, a decorator is a special type of function that modifies the behavior of other functions. You can think of decorators as wrappers that add or change the functionality of the function they wrap, without permanently modifying it. Decorators are used with an "@" before the function definition.
Lambda Functions: These are small, anonymous functions that you can create with the lambda keyword. They're handy when you need a quick, small function for something like sorting or filtering data. They can do anything a normal function can do, but their body is limited to a single expression. For example, lambda x: x**2 is a lambda function that squares a number.

Originally in