Award#0086075 - ITR: Personalized Spatial Audio via Scientific Computing and Computer Vision

HOME

Search Awards

Recent Awards

Presidential and Honorary Awards

About Awards

How to Manage Your Award

Grant Policy Manual

Grant General Conditions

Cooperative Agreement Conditions

Special Conditions

Federal Demonstration Partnership

Policy Office Website

Award Abstract #0086075
ITR: Personalized Spatial Audio via Scientific Computing and Computer Vision

NSF Org:	IIS Division of Information & Intelligent Systems


Initial Amendment Date:	September 7, 2000

Latest Amendment Date:	March 18, 2004

Award Number:	0086075

Award Instrument:	Continuing grant

Program Manager:	Ephraim P. Glinert IIS Division of Information & Intelligent Systems CSE Directorate for Computer & Information Science & Engineering

Start Date:	September 1, 2000

Expires:	August 31, 2006 (Estimated)

Awarded Amount to Date:	$2999995

Investigator(s):	Larry Davis lsd@umiacs.umd.edu (Principal Investigator) V. Ralph Algazi (Co-Principal Investigator) Richard Duda (Co-Principal Investigator) Ramani Duraiswami (Co-Principal Investigator) Qing Huo Liu (Co-Principal Investigator)

Sponsor:	University of Maryland College Park 3112 LEE BLDG COLLEGE PARK, MD 20742 301/405-6269

NSF Program(s):	INFORMATION TECHNOLOGY RESEARC

Field Application(s):	0104000 Information Systems

Program Reference Code(s):	HPCC,9218,1661

Program Element Code(s):	1640

ABSTRACT

This is the first 4 years funding of a five-year continuing award. Humans are very good at discerning the spatial origin of sound using a mixture of frequency-dependent interaural time difference (ITD), interaural level difference (ILD), and pinna spectral cues in disparate environments ranging from open spaces to small crowded rooms. This ability helps us to interact with others and the environment by sorting out individual sounds from a mixture, and helps us to survive by warning us of danger over a wider region of space compared to vision. These advantages of spatial sound are important for human-computer interaction.

While the frequency-independent ITD cues (delays) associated with the two ears are relatively easy to render over headphones, the ILD (level difference) and pinna elevation cues are not. For a given source location and frequency content, the sound scattered by the person's torso, head and pinnae, and is received differently at the two ears, leading to differences in the intensity and spectral features of the received sound. These effects are encoded in an extremely individual "Head Related Transfer Function" (HRTF) that depends on the person's anatomical features (structure of the torso, head and pinnae). This individuality has made it difficult to use the HRTF in the proposed applications. Recent research, including that of members of this team, has focused on measuring the HRTFs for individuals in specific environments, on constructing models of the HRTF, on understanding how the geometry of the body is related to the characteristics of HRTF, and how the brain processes the cues to derive spatial information. However, this research has also indicated that the brain is extraordinarily perceptive to errors in cues that result when sound is rendered with an incorrect HRTF.

In this project the PI and his team will use numerical methods to compute individualized HRTFs from accurate 3-D surface models of the body. They will use multiview, multiframe computational vision techniques to extract the surface models from imagery. They will then use boundary element methods employing fast multipole/ transform techniques and parallel processing to compute the HRTFs from the surface models. The resulting HRTFs will be evaluated both by objective comparisons with acoustically measured HRTFs and by psychoacoustic testing, and will be used in demonstrations of virtual reality, augmented reality, and teleconferencing. A major advantage of this vision-based approach is that it will allow the PI and his team to investigate and model the way that HRTFs change with body posture, providing the potential of tracking dynamic environments. Thus, the project will include fundamental research to extend the static HRTF measurements to dynamic situations in different environments, using a combination of visual tracking to locate the person in real space, and construction of in-room HRTFs from free-field HRTFs using fast iterative techniques. This will provide a scientific foundation for HCI applications of audio rendering. The research will in addition yield algorithms and understanding that will have an impact on varied fields, including computer vision based model creation; scientific computing; computational acoustics for noise control and land mine detection; neurophysiological understanding of human audition; etc.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Web Policies and Important Links

The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-5111, FIRS: (800) 877-8339 | TDD: (800) 281-8749

Last Updated:
April 2, 2007
Text Only

Last Updated:April 2, 2007