The Field of Computer Vision Has Changed So Much in Just 5 Years

The Field of Computer Vision Has Changed So Much in Just 5 Years
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

In your opinion, what are the most interesting topics to research in computer vision? originally appeared on Quora - the knowledge sharing network where compelling questions are answered by people with unique insights.

Answer by Andrej Karpathy, Research Scientist at OpenAI, on Quora:

AI is a very hard problem, so as a field we've separated out all of its pieces into separate fields (e.g. NLP, Computer Vision, Control, etc) and we thought that we would solve all of them in isolation and then just plug them together. However, in recent years, the trends in research have convinced me that this is somewhat of a false view that will never come to fruition. Instead, we're seeing a convergence of the fields into complete agents (I also call them "full-stack agents") that include all of the pieces. For example: ATARI game-playing agents do Computer Vision, kind of, insofar as they have a ConvNet somewhere in there.

The field of Computer Vision has undergone such a drastic change during the course of my PhD that it's almost hard to believe. In 2011 when I entered, Computer Vision was its own area with its own problems. It was buzzing with activity, there were people working on object detection, scene classification, attribute classification, action classification, pose estimation, etc, etc. The feeling was that we were going to have all of these systems in all of these different areas that we solve one by one and then we plug it together somehow and produce all these intermediates that we pass on elsewhere. I spent a lot of time thinking about what "solved" Computer Vision would look like - we'd extract everything out of the image and pass it on to some other people who worked on planning, or something like that. This vision has completely broken down in my opinion due to the successes of end-to-end learning.

Therefore, I'm not actually sure what to work on in Computer Vision if you are interested in AI specifically (if you want to work on applications of CV that's different, of course). I don't see CV as this module we solve on a side first, and then plug into an agent later. Instead, I'd encourage people to pop the stack and work on agent building that happens to take pixel inputs on the side of other things and reach interesting end goals we care about as part of one fully integrated system.

Related points, by the way, were recently made by Jon Gauthier in the context of NLP in his blog post "On solving language".

This question originally appeared on Quora. - the knowledge sharing network where compelling questions are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+.

More questions:

Popular in the Community

Close

What's Hot