Introduction

I recently interviewed for the role of Applied Research Scientist at a company which does a ton of machine learning (mostly deep learning). In order to prepare for the interviews, I scoured the internet for resources. Although there are some excellent guides out there, I felt I could contribute by making yet another list of resources. Hope it helps out a few people!

Disclaimer: Before I begin, let me mention my background since it heavily influenced the resources I focused on. I did the standard Introduction to ML course back in undergrad which covers topics such as Linear Algebra, Probability, Regression, Kernel methods, etc. However, most of my ML-related work these days is simply writing PyTorch code for various deep learning tasks. As a result, I had to brush up on all the mentioned topics (and more!).

So what should I study?!

DSA :/

For somebody who has been doing deep learning for a long time, DSA rounds are the most annoying. The perrennial “Why-should-I-care-about-coin-change-when-i-will-never-write-any-such-algorithm” would always pop up in my mind. I understand it tests your analytical thinking but ugh. Anyway. I think Blind 75 and Neetcode should be enough. I solved about 50% of the problems on Neetcode because I was applying to other roles such as MLE and SDE.

Luckily, the company I interviewed at had no DSA rounds!

Machine Learning :)

I’d suggest the following list of resources (arranged in order of importance):

Lastly, if you do not regularly code, I’d suggest trying out a few tasks on Kaggle to brush up your skills.

Deep Learning :))

Preparation for this section will be very very specific to your area of research e.g. Computer Vision, NLP, LLMs, Speech, etc.

I skimmed through the basics for Speech and LLMs since I had some background in these areas. Given below is a random collection of helpful resources:

System Design

  • Stanford’s CS329 is the most comprehensive introduction to ML System Design. Although I did not have time to explore it in depth, skimming through it definitely helped. All slides/notes are available for free!
  • There are a bunch of Github repos which cover ML System Design (such as this, this and this). Note that these are very high-level. If you have time, I’d suggest going for CS329. If not, the above links might help.

My Interview Experience

I had 4 rounds of interviews:

  • Programming: I was asked to code the fundamental backprop equations for a simple linear layer from scratch in NumPy. It also involved tricks such as the logsumexp trick for numerical stability and tasks such as reusing a Linear Layer for 1x1 Conv in order to test proficiency in PyTorch. Previous experience in PyTorch was very important for this round.
  • Culture: Typical behavioral round involving questions about my past projects, what I learned from them, what I’d do differently, etc. The STAR methodology is helpful for such rounds.
  • ML Theory: I enjoyed this interview because it was slightly open-ended. We covered ideas such as the theoretical justification for Cross-Entropy Loss, the logsumexp trick (again!), can we replace Softmax+CEL with some other loss?, etc. This is where courses such as CS229 and CS231 will come to your rescue!
  • ML System Design: I was asked to design a deep learning pipeline for a common task in the domain of speech. We went over ways to synthetically generate the dataset, model architecture, loss functions and post-processing. I feel a broad overview of the ML/DL landscape is crucial for such interviews because there are design principles which are applicable across domains.