We haven't designed fully sentient artificial intelligence just yet, but we're steadily teaching computers how to see, read, and understand our world. Last month, Google engineers showed off their "Deep Dream," software capable of taking an image and ascertaining what was in it by turning it into a nightmare fusion of flesh and tentacles. The release follows research by scientists from Stanford University, who developed a similar program called NeuralTalk, capable of analyzing images and describing them with eerily accurate sentences.
First published last year, the program and the accompanying study is the work of Fei-Fei Li, director of the Stanford Artificial Intelligence Laboratory, and Andrej Karpathy, a graduate student. Their software is capable of looking at pictures of complex scenes and identifying exactly what's happening. A picture of a man in a black shirt playing guitar, for example, is picked out as "man in black shirt is playing guitar," while pictures of a black-and-white dog jumping over a bar, a man in a blue wetsuit surfing a wave, and little girl eating cake are also correctly described with a single sentence. In several cases, it's unnervingly accurate.
Like Google's Deep Dream, the software uses a neural network to work out what's going on in each picture, comparing parts of the image to those it's already seen and describing them as humans would. Neural networks are designed to be like human brains, and they work a little like children. Once they've been taught the basics of our world — that's what a window usually looks like, that's what a table usually looks like, that's what a cat who's trying to eat a cheeseburger looks like — then they can apply that understanding to other pictures and video.
![pastry-image](https://cdn3.vox-cdn.com/thumbor/uD9ezFmlWqS2S9GMN8grJSKfBJc=/800x0/filters:no_upscale%28%29/cdn0.vox-cdn.com/uploads/chorus_asset/file/3883910/pastry.0.png)
The software easily identifies a dog jumping over a bar
The incredible amount of visual information on the internet has, until recently, had to be manually labeled in order for it to be searchable. When Google first built Google Maps, it relied on a team of employees to dig through and check every single entry, humans given the task of looking at every number captured in the world to make sure it denoted a real address. When they were done, and sick of the tiresome job, they built Google Brain. Where it had previously taken a team weeks of work to complete the task, Google Brain could transcribe all of the Street View data from France in under an hour.
"I consider the pixel data in images and video to be the dark matter of the Internet," Li told The New York Times last year. "We are now starting to illuminate it." Leading the charge for that illumination are web giants such as Facebook and Google, who are keen to categorize the millions of pictures and search results they need to sift through. Previous research focused on single object recognition — in a 2012 Google study, a computer taught itself to recognize a cat — but computer scientists have said this misses the bigger picture. "We've focused on objects, and we've ignored verbs," Ali Farhadi, computer scientist at the University of Washington, told The New York Times.
![truck-identify](https://cdn3.vox-cdn.com/thumbor/-QMhuN6DNch25ElI9wU9zKY_wC8=/800x0/filters:no_upscale%28%29/cdn0.vox-cdn.com/uploads/chorus_asset/file/3883914/truck-google.0.png)
Neural networks have potential applications out in the real world, too. At CES this year, Nvidia's Jen-Hsun Huang announced his company's Drive PX, a "supercomputer" for your car that incorporated "deep neural network computer vision." Using the same learning techniques as other neural networks, Huang said the technology will be able to automatically spot hazards as you drive, warning you of pedestrians, signs, ambulances, and other objects that it's learned about. The neural network means the Drive PX won't need to have reference images for every kind of car — if it's got four wheels like a car, a grille like a car, and a windscreen like a car, it's probably a car. Larger cars could be SUVs, while cars with lights on top could be police vehicles. Huang's company has been chasing this technology for a while, too, having provided the graphics processing units actually used by the Stanford team.
![nvidia-neural](https://cdn0.vox-cdn.com/thumbor/yaixqbAAZhPL0lEH2XoCeYAuwSw=/800x0/filters:no_upscale%28%29/cdn0.vox-cdn.com/uploads/chorus_asset/file/3883936/nvidia.0.jpg)
No comments:
Post a Comment