I remember vividly the day in the early 1980s when Geoff Hinton came to Berkeley to give a talk on his Boltzmann Machine work. To me it was an amazing vision: he discarded everything that was powerful and useful about symbolic AI, and in return got a mechanism that offered three exciting new (to me) things: cognitive plausibility in terms of a model of the brain; a model that was learned from experiences rather than programmed by hand; and representations that were continuous rather than Boolean, thus sidestepping some of the brittleness issues of traditional symbolic expert systems.
It turns out that the Boltzmann machines of that era did not scale up well, but other architectures designed by Hinton, LeCun, Bengio, Olshausen, Osindero, Sutskever, Courville, Ng, and others did. What made the difference? Was it the one-layer-at-a-time training technique? The ReLU activation function? The availability of lots more data? GPU clusters that allow faster training? I can't say for sure, and I hope that continued analysis will give us a better picture. But I can say that in speech recognition, computer vision object recognition, the game of Go, and other fields, the difference has been dramatic: error rates go down when you use deep learning, and both these fields have undergone a complete transformation in the last few years: essentially all the teams have chosen deep learning, because it just works.
Many questions remain. In computer vision, we get tantalizing glimpses of what the deep networks are actually doing: we can identify line recognizers at one level, then, say, eye and nose recognizers at a higher level, followed by face recognizers above that and finally whole person recognizers. But in other areas, it has been harder to understand what the networks are doing. Is that because we don't have the right analysis and visualization tools, or is it that the representations are not actually coherent?
Deep learning works well across many applications when there is a lot of data, but what about one-shot or zero-shot learning, in which it is necessary to transfer and adapt knowledge from other domains to the current domain? What kinds of abstractions are formed by deep networks, and how can we reason with these abstractions and combine them? Networks can be fooled by adversarial inputs; how do we defend against these, and do they represent a fundamental flaw, or an irrelevant trick?
How do we deal with structure in a domain? We have recurrent networks to deal with time, and recursive networks to deal with nested structure, but it is too early to tell whether these are sufficient.
So I'm excited about Deep Learning because so many long-standing fields are excited about it. And I'm interested in understanding more because there are many remaining questions, and answers to these questions will not only tell us more about Deep Learning, but may help us understand Learning, Inference, and Representation in general.
- Google: What does it mean for Google to become an "AI-first" company?
- Artificial Intelligence: Is there any place for software engineers that do not learn AI or Machine Learning in the next 10 years or does everyone have to learn it?
- Algorithms: Is it true that Google "Does not have better algorithms but only more data"?