One Month of Machine Learning

November 17, 2021

For the past month, I've been studying machine learning, mainly by working through the free online course Practical Deep Learning for Coders created by fast.ai.

To facilitate my learning process, I've attempted to summarize what I've learned so far, what I think of the course, what questions I still have, and what I'm moving to next to deepen my understanding of ML.

Where I started

I've always been interested in machine learning, and I've gotten close to it at various points in my career so far. I worked for a few years at a startup as a data engineer where my job was to build ETL pipelines to collect and clean data fed into predictive models, and I rubbed shoulders with PhDs that were building the models. I also muscled myself through the Andrew Ng Coursera course on ML a few years ago, although most of what I've learned has faded from my brain since then.

In other words, I started this month feeling like I had some high-level familiarity with some concepts but essentially still a beginner without any hands-on experience.

What I've Learned So Far

This past month was like being a kid in a money booth: I was having a blast trying to grab on to every single piece of information fluttering down all around me, but despite my best effort, lots of it still slipped through my fingers. At present, I'm feeling way more knowledgeable but also more aware of how vast this space is and how much I have left to learn. At a few different points, I've been overwhelmed and discouraged by how much there is to cover, but every time I feel this way, I remind myself that I need to keep chipping away and learning a little more each day. Combined with the fact that I'm just absolutely fascinated by everything I'm learning, this helps me continue moving in the right direction.

One thing I loved about the course (and all the content that the fast.ai team has created) is that it's intended to make deep learning accessible to everyone. They approach this by first showing you what's possible and then slowly peeling back layers of abstraction to help you understand what's happening under the covers. As an example, in the very first lesson, you walk through the steps of building a neural network that can correctly determine if an image of a bear contains a grizzly bear, a black bear, or a teddy bear with near-perfect accuracy.

The advantage to this approach is that as the student, you are exposed to some of the most exciting parts of machine learning right away and can see the entire process of building a deep learning model. This top-down approach to learning contrasted pretty sharply with most courses I'd taken in the past, where you are taught from the bottom up, learning the fundamental principles and then slowly building up to the "full enchilada." Having now been exposed to both teaching styles, I think I prefer the top-down approach overall but found that there are tradeoffs in either case. For me, the most challenging aspect of learning in this top-down way was constantly setting aside my discomfort with the fact that I didn't really understand what was going on at lower levels of abstraction. In retrospect, even though this was more uncomfortable as a learner, I think it's advantageous in the long run because I found myself writing down tons of questions that I could then answer for myself later. Thus, I was much more actively thinking and wondering about what was happening under the hood, and then when I finally learned what was going on, it was gratifying to have that missing piece of information filled in.

Over the eight lectures, you get introduced to machine learning and the major domains where it's being applied (in most cases with exciting results): computer vision, natural language processing, multimodal models that combine text and images, tabular data, and recommendation systems. You also start to build an understanding of the "practice" of machine learning: what it takes to create a successful model (and how to do this ethically), how to deploy it, and the instincts you should have as a machine learning engineer. A big part of building a robust and useful model comes down to spending a lot of time with your data: cleaning it, understanding it, augmenting it, thinking about ways it could be misinterpreted, and then iterating on your model by playing around with hyperparameters like the neural network architecture, the learning rate, the loss function, etc.

You also get exposed to a lot of jargon which is extremely helpful to start building your vocabulary and making other resources you find much more accessible. There are many scary words in machine learning that become easy to understand once they are defined and put in context. I've started making a vocabulary list that I'll publish soon and that I'm going to continue to update as I learn.

During this past month, I essentially did a speed run through the course content. I've watched all 8 of the video lectures and followed along by reading the first 13 chapters of the book (in varying levels of detail). It was a fantastic survey of the space, but I know that I'm nowhere near "finished" with the course relative to the learning objectives for the class, and I'm excited to continue my learning journey. Here are just a few of the gaps and questions I still have that I'm looking forward to addressing:

Questions & Areas for Investigation

I'd like to get crystal clear on exactly where the boundaries are between machine learning and artificial intelligence. I'd also like to formalize my definition of what intelligence is and what consciousness is. I hear this come up on podcasts with researchers a lot, and I haven't heard an answer yet that resonated with me in a clear, concise way.
I want to actually see the model. I want to visualize it with pictures. I want to watch how it changes as it gets trained.
In a similar vein, what's the smallest neural net that somebody has built that does something useful? I want to build one by hand, on paper if I have to (but I'll settle for pure numpy)!
Tangentially, I'd love to dive deeper into how GPUs work in contrast to CPUs. I've only got a high-level understanding and would love to go deeper.
I want to survey all kinds of neural network architectures. Recurrent, convolutional, GANs, etc.
Related to the above, I'd like to build a list of the most influential papers/models over the past few decades and make sure I deeply understand what they brought to the table.
I need to go back and shore up my foundation -- make sure I deeply understand backpropagation, why we need non-linearities, etc. This is probably one of the most important things I need to go deeper on and grok.
Why do we use pickling as the export format? Is anybody making a more standard "model interchange" format? How big are models on disk? How long does inference take, and is that related to the size of the model? Can you speed up inference with caching or other techniques? Can you use existing tools like databases for storing and using models? What about existing data structures?
How flexible can a model be to input data (I'm assuming not very -- could you train a model that takes in images of varying sizes and shapes)?
How much more expensive or slow would it be to train from scratch instead of using transfer learning?
How significant are the tradeoffs for using deeper versions of resnet when it comes to model inference performance? eg resnet18 vs resnet50, what's a rule of thumb for how much slower inference will be if I do transfer learning based on one instead of the other?
What is all this I am hearing about dense vs sparse networks? Is Jeff Hawkins (of Numenta) on the wrong track or on to something special?

And much, much more! Many of my questions are still very high level, and I'm sure I'll generate even more as I wade a bit deeper into building models myself. Here's how I'm planning on spending my time over the course of the next month to deeper my understanding:

What's Next

Buy the Deep Learning for Coders with Fastai and PyTorch book to support the fast.ai course and express my gratitude for making such fantastic content freely available to everyone.
Do a second, slower pass through the first half of the book. This time I'm going to run all the notebooks on my deep learning server I've set up in AWS and make sure I understand what's happening at each stage.
In parallel, I'm going to read Michael Nielsen's Neural Networks and Deep Learning free online book to get a more fundamental understanding of the same concepts. From what I can tell, this book takes a bottom-up approach and will probably be a great way to fill in some of the core gaps I expressed above and get a bit more comfortable with some of the math that I've glossed over so far.
Additionally, I'm going to work through the tutorials for the fastai library and the pytorch library as an excuse to get more practice building models and get more comfortable with these tools.

Once I've done that, I think the next step is to get my hands as dirty as I can by building a ton of models. I'll see how comfortable I'm feeling at that stage, but I feel like doing a challenge (10 models in 10 days, or building 50 models or something) would be a great way to identify more gaps in my understanding and get more hands-on practice.

Bonus: other resources I've found helpful

Even though I've been focusing most of my time on the fast.ai course, I've also been using other resources along the way to fill in the gaps that I wanted to at least mention here as a way to say thank you and in case other's find them useful!

The YouTube channel 3Blue1Brown has a fantastic, short series on what neural networks actually are with great visualizations.
The Lex Fridman podcast has some excellent episodes with world-class machine learning researchers. My favorites so far were the ones with Jeff Hawkins, David Eagleman, Yann LeCun, and Jeremy Howard (one of the co-founders of fast.ai!). He's had some truly incredible guests and conversations and there are so many other episodes I can't wait to listen to.

Moment of Gratitude

As a beginner, I'm amazed at how much content the machine learning community creates and shares to help others learn. Jeremy Howard, Rachel Thomas, Sylvain Gugger, Michael Nielsen, Lex Fridman -- all these people and many future teachers -- thank you for all the time and energy you have spent attempting to make this fascinating topic more accessible to the rest of the world, and for setting such a positive example of how to be inclusive and thoughtful for the rest of the community. I feel so much gratitude and I hope to pay it forward as I continue to learn more and eventually (hopefully) start to do interesting things myself in this space.