<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%> NIST Speech Group Website
Information Technology Lab, Information Access Division NIST: National Institute of Standards and Technology

  • Speech Group Home
  • Benchmark Tests
  • Tools
  • Test Beds
  • Publications
  • Links
  • Contacts
  • Lessons Learned

    The intitial room setup allowed us to gain valuable experience that will help us, along with researchers' feedback, to improve the quality of the recorded data.

    Several interesting points were raised, as follows:

    Hardware infrastructure
    When setting up an automatic sensor collection system such as NIST's meeting room, hardware is obviously an important concern. Since, the data is recorded via a distributed system controlled by NIST's Smart Flow system, network bandwith was a concern in the original setup. Some tradeoffs between quality and data size had to be made to keep network utilization within acceptable limits. Even so, a private switch was required. Off-the-shelf PCs used in the data collection system proved to be not as reliable as expected. In particular, the initial setup called for PCs to be located under the floor room but caused too many defects in these systems due to low air flow in particular. This in turn required the use of longer video cable from the cameras to the capture system. Finally, with 13 different processors, hundreds of sensors, and battery-operated equipment, maintenance and quality control are significant. A good quality assurance plan is of utmost importance to ensure the integrity of the collected data.

    Video quality
    Several factors influenced the video quality in the Pilot Corpus setup. First, the JPEG compression level used during the video capture process was chosen to be quite high (creating a JPEG image around 50KB per frame) in order to limit the data bandwidth on the network. Decreasing the compression level would increase the image quality. However, it turns out that the limiting factor for video quality is, in the pilot setup, the quality and length of the video cables. Since the picture is digitalized on capture equipment located in an anjoining room and not directly on the cameras, the analog signal transmitted is subject to attenuation and electrical interference. Cameras with onboard digital converters would solve this problem. They are still quite expensive and most trade resolution for conversion speed (i.e. trading frame rate or resolution to maintain the same level of performance).

    Array microphones
    The NIST Smart Spaces Lab Mark-II array microphones employed in the pilot setup had analog lines to a processor. To reduce problems with signal intereference and attenuation, 2-meter cables were used. This necessitated that that the processors were placed in the data collection room near the mic arrays. To maintain a "normal meeting" sound environment, the processors were placed under the raised floor -- directly beneath the array mics which were mounted on posts. Unfortunately, dust and lack of air circulation under the floor caused frequent array mic data collection failures which ultimately led us to decide not to distribute array data. Fortunately, the NIST Smart Spaces Lab has developed a Mark-III array mic which employs an on-board A/D -- thus permitting longer cabling distances. The Mark-III arrays will be employed in the next phase of data collection.

    Software architecture
    The biggest challenge when dealing with multi-modal data generated by several different sources is synchronization. The pilot setup uses NTP for system synchronization but the level of precision required reaches the limits of what NTP can do. Using a cinema clap associated with a flash proved to be useful to post-process some of the out-of-sync data.

     

     

    Page Created: September 19, 2007
    Last Updated: December 19, 2007

    Speech Group is part of IAD and ITL
    NIST is an agency of the U.S. Department of Commerce
    Privacy Policy | Security Notices|
    Accessibility Statement | Disclaimer | FOIA