Using artificial intelligence to teach computers to see

by Dave Michels

Creating a self-driving car should not be difficult, but it’s taking a while. Autonomous vehicles have been making headlines for years now, yet few of us have ever been in one or even seen one. We know that flying planes is more difficult than driving cars, yet pilots have enjoyed autopilot for decades. What gives?

The answer is clear, or more precisely, clear vision. Pilots have used autopilot for decades in clear, open skies. Roads are more complex.

The actual mechanics of operating a vehicle (accelerating, braking, steering, etc.) are all well understood and programmable. Most of the rules and logic of driving are programmable, too. But understanding and instructing vision is very complex. The good news is that incredible progress is being made, and the technology will have far-reaching implications.

The complex challenge of artificial vision

When Steve Jobs wanted the Mac to make a big impression, he arranged for it to say “hello” at its 1984 launch. Yet while computers have been “speaking” for more than 30 years, we still haven’t mastered artificial vision.

Vision is a powerful sense on its own and it complements other senses, such as taste and sound. Vision is far more complex than our other senses because it is tightly coupled with the brain that interprets the images we see. We reflexively pull away from something hot to avoid a burn, but understanding visual threats often requires cognitive processing.

+ Also on Network World: Artificial intelligence in the enterprise: It’s on +

Cars and computing devices frequently come equipped with temperature, speed and other sensors, but vision is best described as a horse of a different color. Even though autonomous cars are equipped with 20-plus cameras and a sophisticated LIDAR system, they still have trouble seeing what is obvious to humans. Several examples illustrate how complex this is. For example, in one unfortunate accident with a Tesla vehicle running on autopilot, the car could not distinguish the side of a truck trailer from overcast sky.

Deep learning to teach computers to see

Digital photography has come a long way, but understanding what’s in the image is new. And it is not practical to use programmatic definitions to teach computers to see. Computer vision is best accomplished by deep learning as opposed to machine learning. Deep learning is a form of artificial intelligence (AI) that mimics the way we learn—from example. Since computers don’t have eyes, visual “experience” comes from billions of photos.

Recent breakthroughs in computer vision leveraged the internet to create giant repositories of cataloged images. In 2015, ImageNet employed 50,000 workers in 167 countries to clean, sort and label nearly a billion images for computers to learn to see.

A billion images may seem like a lot, but it’s nothing compared to what a human sees during childhood. Children can distinguish cats from dogs thanks to visual experience, not definitions. In an attempt to replicate the accumulated knowledge built from experience, ImageNet’s photos have detailed descriptions. The result? Computers using the database can distinguish consistently between images of cats and dogs.

This development sounds trivial until you think about all of the different sizes and colors cats may present, as well as possible positions and situations cats get into—not to mention decoys, such as dolls and pillows, that could be mistaken for a cat. For the first time in history, computer vision is becoming a reality.

Developing a more complex AI vision

An autonomous car not only needs to identify items, but it must also anticipate actions. Cars, motorcycles, bicycles and pedestrians all have different anticipated actions when the signal turns green.

We are so surrounded by computers and cameras that it’s easy to forget how limited computer vision has been. We create entire workarounds to accommodate our vision-impaired machines. For example, our money has invisible codes so we can pay machines. We take a ticket when entering a parking garage because the computer can’t track our car’s entry and exit any other way. We plaster barcodes—which make zero visual sense to humans—over all of our products so computers can “see” the difference between ketchup and mustard.

Computers now can identify different car makes and models. This is leading to a treasure trove of big data analysis that is revealing relationships between car types (and values) and surrounding crime rates, property prices and even election outcomes.

Long ago when I parked at college, I hung a big (expensive) parking pass from my mirror for human parking enforcers to see. My son’s school doesn’t bother. Instead, they drive a camera-equipped vehicle around the parking lot to check license plates against the database of paid parking permits. Why bother with tags when computers can easily read license plates?

The future of AI vision

Computer vision is benefiting from several technologies that are experiencing rapid innovation, including AI, big data, biometrics and digital imaging. As computers learn to see, we are going to give them a lot more to watch.

For example, in Tokyo I saw an NEC grocery store checkout system that could accurately identify different types of produce when the cashier placed it on the scale. It could even distinguish between different types of red apples—probably better than many human cashiers.

The big advantage of vision is that it’s passive. Consider the cat-and-mouse game of speed traps with radar and radar detectors. Future speed traps can simply use two camera locations to determine speed without radar. The cameras can also determine the driver and license plate—all with passive vision instead of active radio signals.

As computers become able, they will begin to serve entirely new roles in addition to driving. Consider:

  • Computer-assisted lifeguards that can distinguish if a child is splashing or drowning in a pool
  • Surveillance computers that identify when suspicious people are on premises
  • Video software that alerts management if it sees a disturbance or a medical situation, or when lines are too long and an additional register needs to be opened
  • Forest-fire spotting computers that monitor even remote areas for signs of wildfire

In addition to fingerprints, video biometrics also include facial recognition. The obvious opportunity is with public safety. Instead of law enforcement that checks wants and warrants on a per-case basis, soon cameras around cities will do so automatically. Many countries now take photos as people enter through passport control.

Computer vision is also poised to change retail research more than loyalty cards did. Cameras may also replace frequent shopper cards to visually track loyalty. A camera can determine who (age, gender, ethnicity) is buying what products, how long they took to make a decision, and even if he/she read the label or considered competitive products.

New opportunities also are emerging with anti-fraud verification. MasterCard already uses selfies to fight fraud. Uber recently added selfies to its driver login process for an additional verification measure.

In the past, computers have penetrated nearly every part of our society despite being nearly blind. As their ability to mimic vision gains momentum, you (and they) haven’t seen anything yet.

This article is published as part of the IDG Contributor Network. Want to Join?

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.