Award Abstract #0086075
ITR: Personalized Spatial Audio via Scientific Computing and Computer Vision
NSF Org: |
IIS
Division of Information & Intelligent Systems
|
|
|
Initial Amendment Date: |
September 7, 2000 |
|
Latest Amendment Date: |
March 18, 2004 |
|
Award Number: |
0086075 |
|
Award Instrument: |
Continuing grant |
|
Program Manager: |
Ephraim P. Glinert
IIS Division of Information & Intelligent Systems
CSE Directorate for Computer & Information Science & Engineering
|
|
Start Date: |
September 1, 2000 |
|
Expires: |
August 31, 2006 (Estimated) |
|
Awarded Amount to Date: |
$2999995 |
|
Investigator(s): |
Larry Davis lsd@umiacs.umd.edu (Principal Investigator)
V. Ralph Algazi (Co-Principal Investigator) Richard Duda (Co-Principal Investigator) Ramani Duraiswami (Co-Principal Investigator) Qing Huo Liu (Co-Principal Investigator)
|
|
Sponsor: |
University of Maryland College Park
3112 LEE BLDG
COLLEGE PARK, MD 20742 301/405-6269
|
|
NSF Program(s): |
INFORMATION TECHNOLOGY RESEARC
|
|
Field Application(s): |
0104000 Information Systems
|
|
Program Reference Code(s): |
HPCC,9218,1661
|
|
Program Element Code(s): |
1640
|
ABSTRACT
This is the first 4 years funding of a five-year continuing award. Humans are very good at discerning the spatial origin of sound using a mixture of frequency-dependent interaural time difference (ITD), interaural level difference (ILD), and pinna spectral cues in disparate environments ranging from open spaces to small crowded rooms. This ability helps us to interact with others and the environment by sorting out individual sounds from a mixture, and helps us to survive by warning us of danger over a wider region of space compared to vision. These advantages of spatial sound are important for human-computer interaction.
While the frequency-independent ITD cues (delays) associated with the two ears are relatively easy to render over headphones, the ILD (level difference) and pinna elevation cues are not. For a given source location and frequency content, the sound scattered by the person's torso, head and pinnae, and is received differently at the two ears, leading to differences in the intensity and spectral features of the received sound. These effects are encoded in an extremely individual "Head Related Transfer Function" (HRTF) that depends on the person's anatomical features (structure of the torso, head and pinnae). This individuality has made it difficult to use the HRTF in the proposed applications. Recent research, including that of members of this team, has focused on measuring the HRTFs for individuals in specific environments, on constructing models of the HRTF, on understanding how the geometry of the body is related to the characteristics of HRTF, and how the brain processes the cues to derive spatial information. However, this research has also indicated that the brain is extraordinarily perceptive to errors in cues that result when sound is rendered with an incorrect HRTF.
In this project the PI and his team will use numerical methods to compute individualized HRTFs from accurate 3-D surface models of the body. They will use multiview, multiframe computational vision techniques to extract the surface models from imagery. They will then use boundary element methods employing fast multipole/ transform techniques and parallel processing to compute the HRTFs from the surface models. The resulting HRTFs will be evaluated both by objective comparisons with acoustically measured HRTFs and by psychoacoustic testing, and will be used in demonstrations of virtual reality, augmented reality, and teleconferencing. A major advantage of this vision-based approach is that it will allow the PI and his team to investigate and model the way that HRTFs change with body posture, providing the potential of tracking dynamic environments. Thus, the project will include fundamental research to extend the static HRTF measurements to dynamic situations in different environments, using a combination of visual tracking to locate the person in real space, and construction of in-room HRTFs from free-field HRTFs using fast iterative techniques. This will provide a scientific foundation for HCI applications of audio rendering. The research will in addition yield algorithms and understanding that will have an impact on varied fields, including computer vision based model creation; scientific computing; computational acoustics for noise control and land mine detection; neurophysiological understanding of human audition; etc.
Please report errors in award information by writing to: awardsearch@nsf.gov.
|