United States Department of Veterans Affairs
R&D HOME » CURRENTS » DEC10
ORD Home Research Currents publications

Finding a way: Researchers explore new technologies to help the blind navigate

Lee Stearns, a graduate student in computer science at the University of Maryland, College Park, is working with VA investigators and other students on a computer vision system that promises to enable blind people to recognize currency, navigate indoors or outdoors,  and locate lost objects. Show me the money—Lee Stearns (left), a graduate student in computer science at the University of Maryland, and lead investigator Dr. Cha-Min Tang of the Baltimore VA Medical Center demonstrate a computer vision system that promises to aid the blind with recognizing currency, navigating indoors or outdoors, and finding lost objects.
See also: More wayfinding methods

While the seeing eye dog and long white cane are likely to endure as trusted wayfinding aids for the blind, a new generation of digital aids is emerging. Nowadays, it's not uncommon to see people with vision loss using "talking" handheld GPS devices—often along with a guide dog—to navigate along city streets.

GPS has its limits, though. Directions for pedestrians can be off by 50 or even 100 feet in certain instances. Clouds or tall buildings can block signals. Indoors, GPS may not work at all. And even under ideal conditions, a consumer GPS device is usually accurate to only about 10 feet. For a blind user, that can mean the difference between walking on the sidewalk and veering off into the street.

A VA-funded group of researchers is designing a computer vision system to bridge these limitations and offer added mobility and independence for blinded Veterans and others with vision loss.

"We envision combining our system with technologies such as GPS," says Cha-Min Tang, MD, PhD, of the Baltimore VA Medical Center and the University of Maryland. A neurologist with a technology bent, he is pursuing his inventive ideas with the help of VA rehabilitation engineer David Ross, MEd, MSEE, at VA's Atlanta Vision Loss Center. Also at the core of the effort is a talented, enthusiastic group of computer science students from the University of Maryland, College Park, under the mentorship of Rama Chellappa, PhD, MSEE.

How does the system work? A blind person wears stereo headphones and attaches a small webcam and microphone to his lapel. The devices are wired to a small laptop carried in a backpack. (In the future a smartphone may be able to handle the computing.) When the user says "find the restroom," for example, the computer compares the webcam's view with still images of the area around the target that have been preloaded onto the computer. Beeps and other audio signals, in stereo, indicate how he needs to proceed. Computer-generated speech provides additional feedback, such as how far he is from the target.

Google could complement system

For indoor navigation, still images are needed for every 15 to 20 feet along each path the user might follow. The idea is that a sighted volunteer would snap these ahead of time and upload them onto the user's computer. Tang believes large public sites such as universities or medical centers could eventually offer downloadable libraries containing images of high-traffic areas on their campuses: entrances, elevators, hallways, restrooms, cafeterias, possibly nearby bus stops or train stations.

For outdoors, a different approach is needed. Graduate student Lee Stearns says one idea is to rely on GPS to let the user know roughly where he is, and then call up a small set of relevant images for that location. At that point, computer vision would take over and give more precise guidance.

Similar libraries of location-specific still images are increasingly available in Street View, a relatively new feature of Google Maps. The images are taken with multiple-lens cameras that capture 360-degree views. The company describes it as "the last zoom layer on the map—when you've zoomed all the way and you find yourself virtually standing on the street."

Another option, says Stearns, is to add an inertial navigation unit, or INU, into the mix. About the size of a flash drive, INUs have gyroscopes, accelerometers and magnetic sensors. They are used to aid navigation on airplanes and submarines. "You might be able to get a Google map and use GPS to tell you more or less where you are—say, within 100 feet—and then the INU and the camera will tell you how you're moving and exactly where you are on the map," says Stearns.

Researchers at the Atlanta VA, meanwhile, are evaluating an alternative approach. It relies on a smartphone to stream video frames to a central server. The server analyzes the images and sends back navigation data. The plus is that users don't have to carry their own computer. The minus is that connection speeds can affect how fast the system works.

"Both approaches have their advantages," notes Tang. "I think the ultimate solution is to build a combined system using complementary technologies. What is most important from the perspective of the blind is that the system be reliable under a wide range of conditions."

More than just wayfinding

Wayfinding is just one of the tasks handled by Tang's proposed system. "There are a lot of groups working on isolated tasks, such as money recognition, obstacle detection or navigation," he says. "What we want is a platform that will be able to integrate a variety of tasks as they are developed in the future. With that in mind, we"ve spent considerable effort on building a better interface between the user and the computer."

Natural speech is one of the keys to that interface, says Stearns. "You could ask it, how far to this destination, where's the nearest place to eat, where's the bathroom, where are my keys?" Various applications could be triggered by the user's voice, and guidance would likewise be given back, in most cases, through computer-generated speech.

Money recognition is one example. For this task, the students have used an established algorithm known as SIFT. Demonstrating the technology to a visitor at the Baltimore VA, Tang hands a ten-dollar bill to Stearns, who is outfitted with the webcam. The camera detects the money in under a second and announces "ten dollars" in Stearns' headphones.

"You can hold any U.S. paper money and the system will instantly recognize it," says Tang. "Any denomination—even a two-dollar bill. You can even cover up most of the bill and the system will still recognize it. It will also be easy to train it for other currencies."

Some blind people already use handheld money scanners, but "the problem is that they"re an expensive device that people have to carry around just for that one task, as opposed to a system that can recognize money as well as help in wayfinding and other tasks," explains Stearns.

The system Tang envisions can also be trained to help find lost objects. A sighted person would take pictures of the user's personal effects and upload them to the computer, along with keywords. If the user were to misplace his cane or cell phone, he could simply say those words and the webcam would scan the environment for the visual information matching the uploaded image.

Future goal: Facial recognition

Tang and the students have no shortage of ideas on how to move their system to the next phase and pull in yet more technologies. Their conversation is peppered with references to new high-tech devices and applications. Showing off their technology at a recent demonstration, Tang and Stearns had a lively debate over the potential merits of using two webcams instead of one. The cameras, each about the size of a ping pong ball, would be worn about 12 inches apart on the user"s chest or shoulders.

Tang: "A two-camera system will offer a wider field of view and give you depth and precise measurement of distance. Parallax can be calculated by the computer very quickly."
Stearns: "You can also get depth from a single camera, as long as you're moving, and it would be smaller and less expensive."
Tang: "But that's not as accurate."
Stearns: "You may have to combine them. That"s part of what my master"s thesis is going to be on."

The team also plans to tap existing software—and add algorithms of their own—to enable facial recognition and expression analysis.

"Without sight," says Tang, "you don't know if the person talking to you is facing you or paying attention. Is he smiling, angry, expressionless? Our webcam can give continuous feedback. It may seem unimportant, but it enhances the quality of the interaction."