What Is Deep Learning And Where Is It Headed?

These Questions originally appeared on Quora - the knowledge sharing network where compelling questions are answered by people with unique insights.

Answers by Yoshua Bengio, Head of Montreal Institute for Learning Algorithms, on Quora.

A: Research is by definition exploratory, which means that (a) we do not know what will work and (b) we need to explore many paths, we need a lot of diversity of research directions in the scientific community. So I can only tell you about my current gut feelings and visions of where I see important challenges and opportunities that appeal to my personal aesthetics and instincts. Here are some elements of this:
* unsupervised learning is crucial and we do not do it right yet (there are many arguments I and others have written and talked about to justify this)
* deep learning research is likely to continue its expansion from traditional pattern recognition jobs to full-scale AI tasks involving symbolic manipulation, memory, planning and reasoning. This will be important for reaching to full understanding of natural language and dialogue with humans (i.e., pass the Turing test). Similarly, we are seeing deep learning expanding into the territories of reinforcement learning, control and robotics and that is just the beginning.
* for AI, we probably still have a lot to gain from a better understanding of the brain and trying to find machine learning explanations for what brains are doing
* maximum likelihood can be improved upon, it is not necessarily the best objective when learning in complex high-dimensional domains (as arises in unsupervised learning and structured output scenarios)
* the quest for AI based on deep learning (and not just consumer products) will greatly benefit from substantial increases in computational capabilities, which probably means specialized hardware; this is because AI requires lots of knowledge about the world (and reasoning about it), which requires large models trained over very large datasets and this all requires much more computing power than we currently use.
See also my answers to the "open research areas" question.

...

A: No single paper stands out, and I realize talking to people that different researchers are impressed by different contributions, so the choice of the advances below is very subjective:
* the Batch Normalization paper is exciting because of the impact it already had in training numerous architectures, and it has been adopted as a standard
* the Ladder Networks paper is exciting because it is bringing back unsupervised learning ideas (here some particularly interesting stack of denoising autoencoders) into the competition with straight supervised learning, especially in a semi-supervised context
* this year's papers on generative adversarial networks (GAN), the LAPGAN and DCGAN, have really raised the bar on generative modelling of images in impressive ways, suddenly making this approach the leader and contributing to the spirit of rapid progress in unsupervised learning over the last year; they compete with another big advance in deep generative modelling based on variational autoencoders, including the very impressive DRAW paper from early last year.
* the papers that use content-based attention mechanisms have been numerous over the past year; I saw it start with our neural machine translation with attention, followed by the neural Turing machine (and later the end-to-end memory networks), and many exciting uses of this type of processing for things like caption generation and manipulating data structures (I liked in particular the Pointer Networks and the other papers on differentiable data structure operations with stacks, queues, Teaching Machines to Read and Comprehend, etc.). So this architectural device is here to stay...

...

A: First you need to be trained with the appropriate basis in mathematics and computer science. In the case of deep learning, you can see part 1 of the MIT Press Deep Learning book (available online for now, eventually MIT Press will have a real paper book) to either brush up on these or see which areas of math and CS are most relevant. Then you need to read on machine learning (there are several good books, such as Chris Bishop's and Kevin Murphy's, online videos such as Andrew Ng's coursera's class and Hugo Larochelle's videos on neural networks, and you can get a summary of many of the basic issues in chapter 5 of the Deep Learning book). Then you need to start practicing, i.e., programming some learning algorithms yourself and playing with them on data, try to compete in some Kaggle competitions, for example. Try to become an expert at optimizing hyper-parameters and choosing models appropriately. In parallel, continue reading. If you are interested in deep learning, part 2 of my book will give you the basis for the most common algorithms. At that point you should have enough background to start a steady regimen of reading papers that tickle your fancy.

These questions originally appeared on Quora. - the knowledge sharing network where compelling questions are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+. More questions:​