A: Unsupervised learning. In particular, what objective function should we use? Maximizing likelihood of the observed data, or even of future observed data, seems like the wrong thing to aim for. Consider, for example, predicting every pixel in the next N frames of video. Do we care about the exact intensity values? No, we care about predicting what the world is going to do next (will the car turn left or right? will the glass break if I drop it?). Somehow humans and animals seem to learn to predict at this higher level of abstraction, in terms of objects and relations, without ever receiving any such labeled data. Multi-task reinforcement learning will help, but learning from scalar reward alone seems too limited. Learning to predict the outcome of one's actions seems like it might help (and this can be used in goal-based planning).
A: If by "deep learning" you mean "nested composition of functions", then there are not too many limits, since that describes pretty much all of computing. However, the main problem (in my mind) is that current deep learning methods need too much time and data. This seems inconsistent with the ability of people to learn much more quickly from much smaller sample sizes (e.g., there are 100x more words in the NYT corpus than a child hears by the time they are 3.)
The key question is: what is the best form of representation (inductive bias) for learning? This of course depends on the task. Humans seem to use multiple forms of knowledge representation. For example, see Liz Spelke's work on "core knowledge" in children (and also work by Josh Tenenbaum and collaborators.)
This high level knowledge is of course represented by patterns of neuronal firing, but it might be more statistically (and possibly computationally) more efficient to do learning by manipulating these more structured representations (e.g., in terms of objects and agents and their attributes and relations) rather than taking tiny steps in a super high dimensional continuous parameter space (although the latter approach does seem to be killing it right now...)
A: Good old fashioned generalized linear models (e.g., linear and logistic regression) with suitable feature engineering can work surprisingly well for supervised problems where the input features are informative (e.g. words not pixels). You can also combine them with word embeddings. Random forests also work well.