Newsletter #4: ML, NLP, CI/CD, Tech Leadership And More
Machine Learning at Dropbox, Continuous Deployment at AWS, Breaking up Linkedin Monolithic Messaging Platform and More
Today, as I prepare for a presentation (talk) I titled "Understanding DevOps" this Friday, I want to share something special I've got quickly.
Photo source : Unsplash
Personally, June was a long month for me. Not only did I have my longest bike ride 🚲 in one day ( around Grunnedwald Forest ) but also came across some interesting articles, books, and papers on ML, NLP, Tech Leadership, and Software Engineering that I thought were worth sharing with you. Let's start with ML.
Machine Learning
Using Machine Learning to Predict What File You Need Next
Most file hosting services can intelligently predict what file you will work on next and offer suggestions. In this well-written post, Neeraj Kumar describes how Dropbox built a machine learning model that predicts what files you need next.
How Docker Can Help You Become a More Effective Data Scientist
Container technology is a must-have in most development workflows today. Containers have forever changed the way we build and ship applications. If you're a data scientist wondering how to use container technology like docker to your advantage, Hamel Husain wrote this post to get you started from the ground up.
A Gentle Introduction to Multi GPU and Multi-Node Distributed Training
Stephen Balaban provided a high-level overview of the different types of training regimes that you'll encounter as you move from single GPU to multi-node distributed training. He described where the computation happens, how the gradients are communicated, and how the models are updated.
Faster and Cheaper PyTorch with RaySGD
RaySGD is a lightweight library for distributed deep learning, providing thin wrappers around PyTorch and TensorFlow native modules for parallel training. Richard Liaw wrote a post on how to make distributed Pytorch model training cheap and straightforward with RaySGD. On a side note, Ray Summit Connect is hosting a series of online events this month and beyond. I'm joining one of those.
The Future of Natural Language Processing
I saw this Youtube video on the future of NLP by Thomas Wolf, a science lead at Hugging Face recently, and I thought you might want to see it too. The video is a walkthrough on model size, computation efficiency, fine-tuning, sample efficiency, common sense, and inductive cases. You can also find the slides here.
The Illustrated Transformer
A picture is worth a thousand words," they say. Jay Alammar explained and simplified what transformers are with images in this post. The post was referenced in MIT's Deep Learning lecture. I'm sure you'll enjoy it as much as I did.
How to Deliver on Machine Learning Projects
"One common hurdle in ML teams is maintaining the same level of progress that engineers are used to with traditional software engineering," Emmanuel Ameisen wrote about how your team can maintain the same level of speed with the traditional software engineering team.
Software Engineering
Speeding up a Git Monorepo at Dropbox With <200 Lines of Code
If you do a Google search, you will find tons of opinions on monorepo vs. multi-repo. I believe every organization has reasons for choosing one over another. However, If your organization uses monorepo and Git performance is becoming an issue, you definitely should read how Dropbox speeds up their Github monorepo.
How LinkedIn Redesigned Its 17-Year Old Monolithic Messaging Platform
I love reading articles on how organizations break their monolithic apps into smaller services. In this article, Mary Branscombe wrote about how LinkedIn broke their 17-year old messaging platform into smaller services.
Automating Safe, Hands-Off Deployments
One type of waste in the software development lifecycle is "waiting" – the time interval one or more team members are idle waiting for input from another activity. You can think of it as waiting for approval, lengthy feedback loops, etc. In this post, Clare Liguori, a Principal Software Engineer at AWS, described how code changes make their way to production, and how they are able to deploy many times in a day at AWS.
Engineering Leadership
Don't Lead by Example
I recently read a post from James Cowling, a Principal Engineer at Dropbox, and I have shared it more times than I can count. The wisdom in this post is priceless—a must for every tech lead.
Creating Successful Big Data Teams and Products
Jesse Anderson got me on this one. He created a free, 72-pages book for managers, VPs, CxOs –people who are managing teams or people that are about to or are currently, creating a Big Data solution.
The Path to Technical Leadership: How to Go From Developer to Team Leader
Leadership is not about title or authority. Contrary to what most people think, leadership is more about what you do. It has little to do with your claim. This post from Alex Bachuk will encourage you to step up today, be an expert in your field, and start serving your team.
Papers
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
This is a paper on a neural network-based system for text-to-speech (TTS) synthesis that can generate speech audio in the voice of many different speakers, including those unseen during training.
Acme: A Research Framework for Distributed Reinforcement Learning
This paper introduces Acme, a tool to simplify the development of novel reinforcement learning algorithms that are specifically designed to enable simple agent implementations that can be run at various scales of execution. Acme's goal is to make the results of various RL algorithms developed in academia and industrial labs easier to reproduce and extend.
Thanks for reading
Thanks for reading! If you like this Newsletter and want to support it, please share it with others or buy me a coffee. If you have feedback, feel free to send it.
Cheers 🥂
Samuel