<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%> RT Spring 2004 Evaluation
Information Technology Lab, Information Access Division NIST: National Institute of Standards and Technology

  • Speech Group Home
  • Benchmark Tests
  • Tools
  • Test Beds
  • Publications
  • Links
  • Contacts
  • Rich Transcription Spring 2004 Evaluation

    This page contains information pertaining to the Rich Transcription Spring 2004 Evaluation (RT-04S) of Speech-to-Text Transcription and Speaker Segmentation in the Meeting Domain.

    Evaluation Plan

    The Rich Transcription 2004 Spring Meeting Recognition evaluation plan is now available. Please let us know if you have any questions/concerns.

    Training Data

    While any publicly available data can be used for training, we have worked with the LDC and CMU and ICSI to put together meeting domain training and development resources for the evaluation. This data consist of:

    CorpusDurationLDC Corpus Order #LDC Transcriptions Order #Additional notes
    CMU Meeting Corpus10 hours (18 meetings) LDC2004E04 LDC2004E05 Errata
    ICSI Meeting Corpus˜72 hours (75 meetings) LDC2004S02 LDC2004T04
    NIST Pilot Meeting Corpus13 hours (17 meetings) LDC2004E01 LDC2004E02

    These data are currently available to RT-04S evaluation participants only. The LDC will make them available to the general public as they are able.

    The NIST data has been "quick transcribed" and made available quickly so that it can be used several weeks prior to the evaluation. If possible, it will be re-released at the beginning of the evaluation with additional quality control. See this and the NIST Meeting Pilot Corpus websites for updates.

    Development Data

    The 80-minute test set used in the RT-02 Meeting Recognition Evaluation is the designated development test data for the RT-04 Meeting Recognition Evaluation. NIST has re-released this data with additional distant mics (if the data collection sites provided them). Although this data is comprised of 10-minute excerpts from the same data collection sites which will be represented in the RT-04 evaluation test set, it is not completely reflective of the evaluation test data since it contains lapel mics in lieu of head mics for the LDC and CMU data and some different distant mics for LDC data. Unfortunately, because of resource constraints, we were unable to create an entirely new development test set for this evaluation.

    This data is currently available to RT-04S evaluation participants only. The LDC will make them available to the general public as they are able. We will make more information and the scoring files for the Development Test Set available in RT-04 format as soon as we are able.

    Note that some of the meetings in the development test set were included in the above training data releases. See the development test data documentation for the mapping of devtest meeting IDs to the original collection site meeting IDs so that these may be eliminated from your training sets.

    Evaluation

    The evaluation data will consist of an approximately 90-minute multi-site test set containing 8 meeting excerpts of approximately 11 minutes each. The test data was collected at CMU, ICSI, LDC, and NIST. Each meeting excerpt will contain a head-mic recording for each subject and one or more distant microphone recordings (whatever the data collection sites provided to NIST).

    Reference transcripts for the evaluation excerpts will be prepared by the LDC according to its Careful Transcription Procedure for Meetings. These are similar to the procedures used to prepare test-quality reference transcripts of conversational telephone data. The reference transcripts will be processed by NIST into the STM format for SCLITE scoring.

    This data will be made available to RT-04S evaluation participants only. It will be released sometime in the future (not prior to March 2005) by the LDC as development material for the next such evaluation.

    ICASSP 2004 Meeting Recognition Workshop

    Evaluation participants will have an automatic slot at the ICASSP 2004 Meeting Recognition Workshop in Montreal on May 17, 2004 and will be expected to contribute a paper and presentation for the workshop. See the ICASSP 2004 Meeting Recognition Workshop page for more details.

    Important Dates

    Training Data availableFebruary 2
    Evaluation Spec availableFebruary 17
    Abstracts for non-evaluation papers dueMarch 15
    Notification of acceptance of non-evaluation papersMarch 19
    Committment to participate in evaluationMarch 1
    Evaluation beginsMarch 8
    Evaluation system output dueMarch 22
    Scored results availableMarch 26
    Non-evaluation papers dueApril 19
    Evaluation papers dueApril 27
    WorkshopMay 17

    Contact Information

    If you are interested in participating in the evaluation, workshop, or obtaining additional information, please contact us.

    NIST RT-04 website comments and corrections should be emailed to our webmaster

    Update History

    • May 12, 2004: Updated training data table
    • March 12, 2004: Updated calendar to reflect change of schedule for the ICASSP workshop.
    • March 8, 2004: Added evaluation documentation page.
    • February 26, 2004: Added ISL errata.
    • February 25, 2004: Fixed broken navigation. Added link to CMU Meeting Room project.
    • February 24, 2004: Fixed broken link. Eval plan is now in PDF format. Added mapping between original meeting names and new ones. See the dev test data documentation.
    • February 20, 2004: Added transcription files. See the dev test data documentation.
    • February 17, 2004: Added eval plan.
    • February 9, 2004: Created separate page for dev/test set documentation.
    • February 7, 2004: Fixed typos and edited for clarity.
    • February 6, 2004: Added details.
    • January 8, 2004: Added basic content.
    [ RT Home ]

     

     

    Page Created: December 23, 2003
    Last Updated: December 19, 2007

    Speech Group is part of IAD and ITL
    NIST is an agency of the U.S. Department of Commerce
    Privacy Policy | Security Notices|
    Accessibility Statement | Disclaimer | FOIA