What was your path toward learning ML? What books did you enjoy most while learning ML? What were the blind alleys? originally appeared on Quora - the knowledge sharing network where compelling questions are answered by people with unique insights.
My path to ML was long and winding.
In high school, I had a lot of different interests, most of which weren't related to math or science. I made up my own language with a phonetic alphabet, I took a lot of creative writing and literature classes, etc.
When I graduated from high school, I went to college despite not really wanting to; my parents would pay for me to go to college, they didn't want to pay for me to do anything else, and the modern economic system seems thoroughly rigged against a teenager trying to bootstrap any other career with only grocery store savings. I chose Stanford because it didn't require me to choose a major until junior year, and because it's not necessary to complete your degree in four consecutive years there. For example, John Steinbeck famously attended off and on for years and never completed his degree.
During my first year at Stanford, I took some intro programming classes and hated them. It was mostly C++ and I didn't like how mysterious the behavior was when I had a bug. I was also interested in figuring out how the brain works, so I took some psychology and cognitive science classes. They didn't seem nearly as concrete as I wanted, and I felt like college was as much of a waste of my time and my parents' money as I had originally thought. I decided to do the John Steinbeck thing and hit the pause button on Stanford. I went to Scotland with two friends and tried to start a writing career while working a different day job. I applied to just about every job conceivable (forestry, factory work, fast food) but did not get anything at all, partly because the G7 summit was held nearby and most employers thought that I, a foreign student, must be one of the protestors that follows such events around. Having run out of savings, I went back to relying on my parents, and returned to college.
My second year at Stanford, I figured that the most interesting academic topic I could study was figuring out how the brain works. I had found the psychology and cognitive science classes to be too fluffy in my first year, so I had the idea that neuroscience might be more technical. I spent a year doing all the biology prerequisites along with all the pre-med students, like chemistry and biology and so on. I also started working in an fMRI lab and even got to take a little tiny bit of neuroscience coursework. I actually became enthusiastic about it and was glad I was in college at this point.
During this time, I started to think that I should learn some programming because it would be helpful for doing neuroscience experiments. I took Jerry Cain's more advanced programming class, and really loved it. I had hated the intro to programming classes my first year because it was a complete mystery when anything went wrong, but Jerry explained everything in extreme detail, down to the level of what machine instructions get executed when you run a piece of C code. This made things concrete enough that they were no longer mysterious, and since taking his class I have never felt limited by programming ability or understanding computers. I also started spending a lot of time hanging around Jerry's office hours to learn about software engineering in general.
After my second year at Stanford, I got an internship at the National Institutes of Health, doing neuroscience research with TMS. I was very excited about it, TMS is a technology that uses magnetic fields to influence electrical activity inside a person's brain without having to cut their head open. I had read a lot about it and thought it was the coolest thing ever. After I got there, it turned out to be actually kind of boring. The first time you get to zap someone's brain, it is indeed very exciting. Then you must zap dozens of people's brains in exactly the same way, in order to collect enough experimental data to get statistical significance. I found all of this routine data collection to be very tedious. At the same time, doctors at the NIH taught me a lot of data analysis techniques. They had written machine learning algorithms to do things like ICA for separating out different signals recorded by EEG machines or to recognize events like the motor cortex planning an arm movement. I spent a lot of time working on data analysis and ended up enjoying this a lot more than actually doing the physical experiments with the TMS machine.
When I came back to Stanford for junior year, it was nearly time to declare my major. I was almost certainly I was going to choose biomedical computation. Then I told Jerry Cain about my internship experiences and how I had enjoyed the data analysis. He insisted that I take Andrew Ng's intro to AI class. Back then, I didn't know anything about AI except the fake AI that is used in video games (I had taken up video game programming as a hobby after finishing Jerry Cain's class). I was very skeptical that an AI class was worthwhile but I trusted Jerry's judgment and took the class. After Andrew's lecture on the bias and variance decomposition for linear regression, I was persuaded that there was real science going on in AI, and I declared my major as computer science. At the end of the quarter, Andrew invited me to work on the Stanford AI Robot, and I've been involved in ML research ever since.
It still took a while for me to really commit to ML over neuroscience, and when it was time to go to a PhD program it was hard for me to choose between neuroscience with Bruno Olshausen and machine learning with Yoshua Bengio. I was still interested in both subjects and working with both people and the decision really came down to feeling like it was the right moment in history to study machine learning. That was the real point of no return and I've been happily focused on machine learning since making that decision.
My two favorite books while I was learning ML were Chris Bishop's book and Daphne Koller's book. When I took Andrew's machine learning class, I didn't know linear algebra or vector calculus very well, so a lot of it went over my head. Reading through Chris Bishop's book over the summer very slowly gave me the second chance I needed to pick up the pieces and understand things I didn't catch the first time around. Daphne Koller's book really opened my eyes to the probabilistic worldview and was especially useful for preparing me to spend a lot of time studying Boltzmann machines in grad school.
- Artificial Intelligence: How would you compare Google Research, FAIR, and OpenAI in terms of research goal, working environment, and intellectual (people) resources?
- Computer Science: What are some non-obvious applications of AI that will be huge?
- Career Advice: As someone who has to decide between entering industry now and doing a PhD, what advice do you have to offer?