What product breakthroughs will recent advances in deep learning enable? originally appeared on Quora: the knowledge sharing network where compelling questions are answered by people with unique insights.
Deep Learning refers to a class of machine learning (ML) techniques that combine the following:
- Large neural networks (millions of free parameters).
- High performance computing ( thousands of processors running in parallel).
- Big Data (e.g. millions of color images or recorded chess games).
Deep learning techniques currently achieve state of the art performance in a multitude of problem domains (vision, audio, robotics, natural language processing, to name a few). Recent advances in Deep Learning also incorporate ideas from statistical learning [1,2], reinforcement learning (RL) , and numerical optimization. For a broad survey of the field, see [9,10].
In no particular order, here are some product categories made possible with today's deep learning techniques: customized data compression, compressive sensing, data-driven sensor calibration, offline AI, human-computer interaction, gaming, artistic assistants, unstructured data mining, voice synthesis.
Customized data compression
Suppose you are designing a video conferencing app and want to come up with a lossy encoding scheme to reduce the number of packets you need to send over the Internet. You could use an off-the-shelf codec like H.264, but H.264 is not optimal because it is calibrated for generic video--anything from cat videos to feature films to clouds. It would be nice if instead we had a video codec that was optimized for specifically FaceTime videos. We can save even more bytes than a generic algorithm if we take advantage of the fact that most of the time, there is a face in the center of the screen. However, designing such an encoding scheme is tricky. How do we specify where the face is positioned, how much eyebrow hair the subject has, what color their eyes are, the shape of their jaw, etc? What if their hair is covering one of their eyes? What if there are zero or multiple faces in the picture?
Deep learning can be applied here. Auto-encoders are a type of neural network whose output is merely a copy of the input data. Learning this "identity mapping" would be trivial if it weren't for the fact that the hidden layers of the auto-encoder are chosen to be smaller than the input layer. This "information bottleneck" forces the auto-encoder to learn an compressed representation of the data in the hidden layer, which is then decoded back to the original form by the remaining layers in the network.
Through end-to-end training, auto-encoders and other deep learning techniques adapt to the specific nuances of your data. Unlike principal components analysis, the encoding and decoding steps are not limited to affine (linear) transformations. PCA learns an "encoding linear transform", while auto-encoders learn a "encoding program".
This makes neural nets far more powerful, and allows for complex, domain-specific compression; anything from storing a gazillion selfies on Facebook, to faster YouTube video streaming, to scientific data compression, to reducing the space needed for your personal iTunes library. Imagine if your iTunes library learned a "country music" auto-encoder just to compress your personal music collection!
Compressive sensing is closely related to the decoding aspects of lossy compression. Many interesting signals have a particular structure to them--that is, the distribution of signals is not completely arbitrary. This means that we don't actually have to sample at the Nyquist limit in order to obtain a perfect reconstruction of the signal, as long our decoding algorithm can properly exploit the underlying structure.
Deep learning is applicable here because we can use neural networks to learn the sparse structure without manual feature engineering. Some product applications:
- Super-resolution algorithms (waifu2X), literally an "enhance" button like those from CSI Miami.
- Using WiFi radio wave interference to see people through walls (MIT Wi-Vi).
- Interpreting 3D structure of an object given incomplete observations (such as a 2D image or partial occlusion).
- More accurate reconstructions from sonar / LIDAR data.
Data-driven sensor calibration
Good sensors and measurement devices often rely on expensive, precision-manufactured components.
Take digital cameras, for example. Digital cameras assume the glass lens is of a certain "nice" geometry. When taking a picture, the onboard processor solves the light transport equations through the lens to compute the final image.
If the lens is scratched, or warped or shaped like a bunny (instead of a disc) these assumptions are broken and the images no longer turn out well. Another example: our current decoding models used in MRI and EEG assume the cranium is a perfect sphere in order to keep the math manageable . This sort of works, but sometimes we miss the location of a tumor by a few mm. More accurate photographic and MRI imaging ought to compensate for geometric deviation, whether they result from underlying sources or manufacturing defects.
Fortunately, deep learning allows us to calibrate our decoding algorithms with data.
Instead of a one-size-fits-all decoding model (such as a Kalman filter), we can express more complex biases specifically tuned to each patient or each measuring device. If our camera lens is scratched, we can train the decoding software to implicitly compensate for the altered geometry. This means we no longer have to manufacture and align sensors with utmost precision, and this saves a lot of money.
In some cases, we can do away with hardware completely and let the decoding algorithm compensate for that; the Columbia Computational Photography lab has developed a kind of camera that doesn't have a lens. Software-defined imaging, so to speak.
Being able to run AI algorithms without Internet is crucial for apps that have low latency requirements (i.e. self driving cars & robotics) or do not have reliable connectivity (smartphone apps for traveling).
Deep Learning is especially suitable for this. After the training phase, neural networks can run the feed forward step very quickly. Furthermore, it is straightforward to shrink down large neural nets into small ones, until they are portable enough to run on a smartphone (at the expense of some accuracy).
Google has already done this in their offline camera translation feature in Google Translate App .
Some other possibilities:
- Intelligent assistants (e.g. Siri) that retain some functionality even when offline.
- Wilderness survival app that tells you if that plant is poison ivy, or whether those mushrooms are safe to eat.
- Small drones with on-board TPU chips  that can perform simple obstacle avoidance and navigation.
Deep Neural Networks are the first kind of models that can really see and hear our world with an acceptable level of robustness. This opens up a lot of possibilities for Human-Computer Interaction.
Cameras can now be used to read sign language and read books aloud to people. In fact, deep neural networks can now describe to us in full sentences what they see . Baidu's DuLight project is enabling visually-impaired people to see the world around them through a sight-to-speech earpiece.
We are not limited to vision-based HCI. Deep learning can help calibrate EEG interfaces for paraplegics to interact with computers more rapidly, or provide more accurate decoding tech for projects like Soli .
Games are computationally challenging because they run physics simulation, AI logic, rendering, and multiplayer interaction together in real time. Many of these components have at least O(N^2) in complexity, so our current algorithms have hit their Moore's ceiling.
Deep learning pushes the boundaries on what games are capable of in several ways.
Obviously, there's the "game AI" aspect. In current video games, AI logic for non-playable characters (NPC) are not much more than a bunch of if-then-else statements tweaked to imitate intelligent behavior. This is not clever enough for advanced gamers, and leads to somewhat unchallenging character interaction in single-player mode. Even in multiplayer, a human player is usually the smartest element in the game loop.
This changes with Deep Learning. Google Deepmind's AlphaGo has shown us that Deep Neural Networks, combined with policy gradient learning, are powerful enough to beat the strongest of human players at complex games like Go. The Deep Learning techniques that drive AlphaGo may soon enable NPCs that can exploit the player's weaknesses and provide a more engaging gaming experience. Game data from other players can be sent to the cloud for training the AI to learn from its own mistakes.
Another application of deep learning in games is physics simulation. Instead of simulating fluids and particles from first principles, perhaps we can turn the nonlinear dynamics problem into a regression problem. For instance, if we train a neural net to learn the physical rules that govern fluid dynamics, we can evaluate it quickly during gameplay without having to perform large-scale solutions to Navier stokes equations in real time.
In fact, this has been done already by Ladicky and Jeong 2015 .
For VR applications that must run at 90 FPS minimum, this may be the only viable approach given current hardware constraints.
Third, deep generative modeling techniques can be used to create unlimited, rich procedural content--fauna, character dialogue, animation, music, perhaps the narrative of the game itself. This is an area that is just starting to be explored by games like No Man's Sky, which could potentially make games with endless novel content.
To add a cherry on top, Deep Neural nets are well suited for parallel mini-batched evaluation, which means that AI logic for a 128 NPCs or 32 water simulations might be evaluated simultaneously on a single graphics card.
Given how well neural networks perceive images, audio, and text, it's no surprise that they also work when we use them to draw paintings , compose music , and write fiction .
People have been trying to get computers to compose music and paint pictures for ages, but deep learning is the first one that actually generates "good results". There are already several apps in the App Store that implement these algorithms for giggles, but soon we may see them as assistant generators/filters in professional content creation software.
Data Mining from Unstructured Data
Deep learning isn't at the level where it can extract the same amount of information humans can from web pages, but the vision capabilities of deep neural nets are good enough for allowing machines to understand more than just hypertext.
- Parsing events from scanned flyers.
- Identifying which products on EBay are the same.
- Determining consumer sentiment from webcam.
- Extracting blog content from pages without RSS feeds.
- Integrate photo information into valuing financial instruments, insurance policies, and credit scores.
Generative modeling techniques have come far enough and there is sufficient data out there that it is only a matter of time before someone makes an app that reads aloud to you in Morgan Freeman's or Scarlet Johansen's voice. At Vanguard, my voice is my password.
Bonus: more products
- Adaptive OS / Network stack scheduling - scheduling threads and processes in an OS is a NP hard problem. We don't have a very satisfactory solution to this right now, and scheduling algorithms in modern operating systems, filesystems, and TCP/IP implementations are all fairly simple. Perhaps if a small neural net could be used to adapt to a user's particular scheduling patterns (frame this as an RL problem), we would decrease scheduling overhead incurred by the OS. This might make a lot of sense inside of data centers where the savings can really scale.
- Colony counting & cell tracking for microscopy software (for wet lab research).
- The strategy of "replacing simulation with machine learning" has been useful in the fields of drug design too, presenting enormous speed ups in finding which compounds are helpful or toxic [untethiner 2015].
 Auto encoding variation all bayes [1312.6114] Auto-Encoding Variational Bayes
 One shot deep generative models https://arxiv.org/pdf/1603.05106...
 Project Soli