Minutes from Notes from RT telecon, 10/17/06. Updated 10/24/06. Attendees: CMU: Burger NIST: Fiscus, Garofolo, Rose, Ajot, Michel IBM: Potamianos ICSI: Wooters, Janin UKA: McDonough, Rochet, others Sheffield: Hain, Wan LIMSI: Lamel, Adda, Galibert, Barras, Bilinski, Gauvain LDC: Strassel - Task modifications - Speaker Attributed STT Task - NIST volunteers to champion the task: eval code in place Discussion: General consensus to include the task this year. the primary metric will be WER-based, but SPKR DER will also be measured Action: NIST to define task and evaluation protocol via eval plan - Speech Activity Detection (SAD) Task to be sunset Discussion: General concensus to sunset the task, but since the call attendees where mainly fully involved in the evaluation and most participants in this task were not present, NIST should first poll the community before acting. Action Item: NIST will send an email that proposes dropping the speaker activity detection (SAD) task. If no objections, it will not be formally part of the evaluation tasks. - Speaker Detection (SPKR) Task - SPKR systems will be diagnostically scored for SAD Discussion: General consensus to report SPKR performance using SAD metric - Lecture data will not be used for the SPKR task Discussion: The composition of lecture data was discussed at length. The soon-to-be released development resources include: lecture, Q&A, and "coffee break" excerpts. This is also true for the lecture eval data. The question discussed then was "Is the coffee break data in domain?" Action: NIST will send an email to discuss whether or not to exclude coffee breaks from the data, because of problems with noise, small number of samples, and domain drift. This may require looking at the CHIL dev data. - Shift to forced word alignment-mediated reference segmentation Discussion: General concensus to do so. Action: NIST will provide forced alignments for SPKR reference data. - New tasks: - Who will champion each new task? - Vocal classification task: nonspeech, non-lexical speech, and lexical speech. - N-modal classification task: nonspeech, speech, overlapping speech Discussion: - There will not be speaker classification - There will not be N-modal classification - Text-detection in video will be an evaluation task run by USF - Side discussion occurred on deciding which "side information" to include. Specifically speaker accent. Note that for some experiments, side data may intentionally be withheld. We discussed the possible causes of word error rate (for different classes of data?). Some contributing factors are hypothesized to be the following: (a) speaker dialect, (b) domain/data-type, (c) experimental setup, (d) interactivity. To gain insight into how dialect affects system performance requires some additional thought/experimental design to decide how to isolate this; for example, what happens to the language models for German vs. Indian vs. Chinese speakers in English? Other questions related to dialect ... further discussion (and literature search) is needed here. - Data issues - New development data: Has additional data been released? - Cutoff date for new data releases Discussion: Dec. 31 is the cutoff date for training data resources Action: Janin will research and report what development data from AMI has been released and also what data to specify as dev test. Action: CHIL will report on the soon-to-be-released DEV data and it will be documented on a web site - Data harmonization effort: Who would like to form a working group? Action: Form a working group for standardized meeting room description and supporting metadata. Suggestion was made to organize this using a wiki web page. - Evaluation data: - At the workshop we decided to build a single test set which contains both conference and lecture data and that the meeting type will be side information. Any opposing view points? Discussion: After discussion it was clear that some groups do not have the resources to field systems for both sub-domains so the test sets will remain separate. - Should we require at least one mic array? Discussion: It would be good, but not feasible. Action: Requirements for data collection to be supplied by NIST, including instructions for synchronization. There should be video from at least one camera. - Who will donate meetings to the test set? Discussion: The participants agreed that the eval set should be unexposed meetings NIST: YES CMU: Likely AMI: Looking in to it. They are not sure if unexposed data exists. - Putative Evaluation Schedule Discussion: The schedule was not discussed except to question if the workshop will conflict with MLMI. It does not, MLMI is June 28-30, 2007. Action: NIST will be sending a draft of the evaluation schedule, either by 20 Oct 2006, or early the following week (23 Oct 2006). Other Action Items: Action: Next telecon will be around beginning of November, e.g., 6 Nov 2006 or after.