The Field of Computer Vision Has Changed So Much in Just 5 Years

In your opinion, what are the most interesting topics to research in computer vision? originally appeared on Quora - the knowledge sharing network where compelling questions are answered by people with unique insights.

Answer by Andrej Karpathy, Research Scientist at OpenAI, on Quora:

AI is a very hard problem, so as a field we've separated out all of its pieces into separate fields (e.g. NLP, Computer Vision, Control, etc) and we thought that we would solve all of them in isolation and then just plug them together. However, in recent years, the trends in research have convinced me that this is somewhat of a false view that will never come to fruition. Instead, we're seeing a convergence of the fields into complete agents (I also call them "full-stack agents") that include all of the pieces. For example: ATARI game-playing agents do Computer Vision, kind of, insofar as they have a ConvNet somewhere in there.

The field of Computer Vision has undergone such a drastic change during the course of my PhD that it's almost hard to believe. In 2011 when I entered, Computer Vision was its own area with its own problems. It was buzzing with activity, there were people working on object detection, scene classification, attribute classification, action classification, pose estimation, etc, etc. The feeling was that we were going to have all of these systems in all of these different areas that we solve one by one and then we plug it together somehow and produce all these intermediates that we pass on elsewhere. I spent a lot of time thinking about what "solved" Computer Vision would look like - we'd extract everything out of the image and pass it on to some other people who worked on planning, or something like that. This vision has completely broken down in my opinion due to the successes of end-to-end learning.

Therefore, I'm not actually sure what to work on in Computer Vision if you are interested in AI specifically (if you want to work on applications of CV that's different, of course). I don't see CV as this module we solve on a side first, and then plug into an agent later. Instead, I'd encourage people to pop the stack and work on agent building that happens to take pixel inputs on the side of other things and reach interesting end goals we care about as part of one fully integrated system.

Related points, by the way, were recently made by Jon Gauthier in the context of NLP in his blog post "On solving language".

This question originally appeared on Quora. - the knowledge sharing network where compelling questions are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+.

Your Loyalty Means The World To Us

Dear HuffPost Reader

Thank you for your past contribution to HuffPost. We are sincerely grateful for readers like you who help us ensure that we can keep our journalism free for everyone.

The stakes are high this year, and our 2024 coverage could use continued support. Would you consider becoming a regular HuffPost contributor?

Dear HuffPost Reader

Thank you for your past contribution to HuffPost. We are sincerely grateful for readers like you who help us ensure that we can keep our journalism free for everyone.

The stakes are high this year, and our 2024 coverage could use continued support. If circumstances have changed since you last contributed, we hope you'll consider contributing to HuffPost once more.

Support HuffPost

technology artificial intelligence research

Submit a tip

What's Hot

The Field of Computer Vision Has Changed So Much in Just 5 Years

Our 2024 Coverage Needs You

It's Another Trump-Biden Showdown — And We Need Your Help

The Future Of Democracy Is At Stake

Our 2024 Coverage Needs You

Your Loyalty Means The World To Us

Related

Popular in the Community

From Our Partner

What's Hot

What's Hot

Our 2024 Coverage Needs You

It's Another Trump-Biden Showdown — And We Need Your Help

The Future Of Democracy Is At Stake

Our 2024 Coverage Needs You

Your Loyalty Means The World To Us

Related

Popular in the Community

From Our Partner

What's Hot

More In Tech