Discover more from Next — Today I Learnt About Data Science
Next | Issue #64
Imagine you’re celebrating New Year’s Eve in 1976 and you’re ready to welcome 1977 with a bang. You’ve got your party hat on, your confetti in hand, and your eyes on the clock. You start counting down: 10…9…8…7…6…5…4…3…2…1…and then nothing. The clock shows 23:59:60 instead of 00:00:00. What’s going on? Did someone mess with your watch? Did time stop? No, it’s just a leap second, an extra tick that’s added to the global clock every now and then to keep it in sync with the Earth’s rotation. (Apparently scientists decided it won’t happen again.)
Today, I will talk about R universe for finding anything related to R, rbind.io — free website hosting service, some curious aspects of subsetting in R and more. Let’s dive in!
r-universe is a new umbrella project by rOpenSci under which they experiment with various ideas for improving publication and discovery of research software in R. The project aims to help you effectively navigate the R ecosystem to discover what is out there, get a sense of the purpose and quality of individual packages, their developers, and get started using packages immediately and without any hassle.
The project also shows a shuffling list of organizations that publish R packages (sorted by recent activity) which is a fun way to discover what is currently being developed in the R ecosystem. The search homepage is fun to play around.
Rbind is a project that aims to provide a service like WordPress.com or Medium, but driven by the community instead of a company. It hosts websites related to R and/or statistics using the blogdown package, and allows users to transfer their source repositories to Rbind if they want.
Users can also get a free subdomain *.rbind.io for their websites, and get help from the Rbind community by posting their websites and source repositories on the Github organization Rbind.
In this post, Nicholas explores the use of  when subsetting in R, and how this is different from base::subset and dplyr::filter. He shows how using NA inside  can cause unexpected results, such as getting NA rows or weird rows when filtering. He also notes that this issue does not occur with base::subset or dplyr::filter.
Prof Robin Lovelace, associate professor of transport data science at the University of Leeds, gave a talk on pop-up cycleways. Pop-up cycleways emerged as a growing response to reduced traffic and increased need for space for walking and cycling due to physical distancing measures to stop the spread of COVID-19.
Robin talked about how we could generate an evidence base to support investment in new cycleways based on a combination of cycling potential and places where there may be ‘spare space’ for walking and cycling.
I do not remember how I stumbled on this collection of notes on many different programming language and associated tools. There are specific tips on:
pipeliner is a package for R that aims to provide an elegant solution to building machine learning and statistical models by implementing a common workflow with which it is possible to define transformations and generate a prediction that automatically applies the entire pipeline of transformations and inverse-transformations to the inputs and outputs. Vignette. Github.
survex package provides model-agnostic explanations for machine learning survival models. It can work with
survival — other popular packages with survival analysis applications. Vignette. Github.
modelStudio can automate the explanatory analysis of machine learning predictive models. Generate advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks. Vignette. Github.
Adversarial examples: Inputs to a machine learning model that have been intentionally designed to cause the model to make mistakes.
Autoencoders: A type of neural network that learns to reconstruct its input data, often used for dimensionality reduction or anomaly detection.
Transfer learning: A technique where a pre-trained model is used as a starting point for a new task, and then fine-tuned on a smaller dataset for the specific task at hand.
This old thread is a gold mine of simple fun projects.
Thanks for reading Next — Today I Learnt About Data Science! Subscribe for free to receive new posts and support my work.