Copyright© 2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The W3C Voice Browser working group aims to develop specifications to enable access to the Web using spoken interaction. This document is part of a set of requirements studies for voice browsers, and provides details of the requirements for reusable components for spoken dialogs.
This document describes the requirements for reusable dialog components for spoken interaction, as a precursor to starting work on specifications. Related requirement drafts are linked from the introduction. The requirements are being released as working drafts but are not intended to become proposed recommendations.
This specification is a Working Draft of the Voice Browser working group for review by W3C members and other interested parties. This is the first public version of this document. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".
Publication as a Working Draft does not imply endorsement by the W3C membership, nor of members of the Voice Browser working groups. This is still a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Working Drafts as other than "work in progress."
This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only). This document is for public review. Comments should be sent to the public mailing list <www-voice@w3.org> (archive).
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.
NOTE: Text in an italicized teal font is a comment.
Reusable dialog components provide pre-packaged functionality "out-of-the-box" that enables developers to quickly build applications by providing standard default settings and behavior. They shield developers from having to worry about many of the intricacies associated with building a robust speech dialogue, e.g., confidence score interpretation, error recovery mechanisms, prompting, etc. This behavior can be customized by a developer if necessary to provide application-specific prompts, vocabulary, retry settings, etc.
The main goal of this subgroup is to develop a specification for reusable dialog components within the context of an overall specification for a markup language. The purpose of this document is to establish a prioritized list of requirements for reusable dialog components which any proposed markup language (or extension thereof) should address.
Although desirable to standardize the interface to all dialog components, this standardization is impractical for many dialogs. In order to standardize the interface, one would have to standardize the call flow, since the specifics of the call flow determine the parameters that can be configured. If this document attempts to standardize the call flow (and hence interface) for more complex and debatable dialog components, the resulting standard components are likely to contain only the lowest common denominator of functionality and therefore be of limited usefulness. Even the control flows for such common tasks as acquiring telephone numbers and postal codes can differ from one application and vendor to another. To preserve implementation flexibility for more complex components, this document sorts components into two categories -- those requiring only a return semantics specification and those requiring both configuration and return semantics specifications. This document also provides some general requirements applicable to all components.
Note that the document provides no suggestions as to how these components should be accessed (i.e. through a generic external call interface vs. through specific markup elements vs. through reference to a ML page in a standard library); rather, it is important merely that the functionality described is packaged in an easy-to-use fashion. There may consequently be some overlap between components described here and sections of the Dialog Requirements document. In general, the Dialog Requirements document describes dialog features, while this document describes packaged/contained dialogs. In practice this distinction may disappear for some of the components described, but there is no requirement that a component be implemented using core features of the language itself.
The document is divided into the following sections:
Requirements described in this section apply to all components.
Components must provide support for different locales where appropriate. For example, telephone number, postal code, and address all vary in structure from one country to another. "Support", in this case, can mean either configuration parameters or simply multiple versions covering the range of locales where usage is expected.
Locale information should be available from the application context for use by components. Such information must conform to existing ISO standards for countries and locales.
For mixed initiative interactions, it would be nice to allow multiple components to be active simultaneously during some stages of the overall dialog. For example, possibly only the initial grammars for multiple components would be active simultaneously, with the grammar that matched the user's response determining which component would handle the next interaction.
Return result(s) from a component must conform to the NL format developed by the Natural Language subgroup.
Note: the terminology in this requirement will be updated to match that used by the Natural Language Semantics subgroup. For now, the term "key" refers to a text label that a) does not vary from one invocation of the component to another and b) has an associated "value" that can vary from one invocation of the component to another.
Each component must specify
Components must also provide a means to return additional implementation-specific keys and values.
Components must be able to catch, pass on, or generate exceptions. All exceptions passed on or generated must be catchable by the calling application. Components are complete dialogs and as such, to an extent appropriate for any given component, are expected to appropriately handle many reocgnizer-level conditions/exceptions rather than always passing through such exceptions.
For example, simple user timeouts will most commonly be captured and handled by the component, while an exception signifying the inability to access a server needed by the component would likely be partially handled within the component and then passed on to the calling application. A user hangup would likely just be passed on to the calling application.
Components must have clearly published behavior for the cases in which potential conflicts between "always active" language features and those of the component can occur.
Where reasonable, components will be built using other components to increase consistency in behavior across components.
One of the goals in providing packaged dialogs is to reduce application developers' effort through appropriate abstraction. Such abstraction can occur at many levels and can be implemented in a number of different ways. For example, parameterized dialogs can be implemented as markup elements with attributes and sub-elements, as scripts built of markup elements and variables (perhaps stored in a standard library of such dialogs), or as native, precompiled, or otherwise non-markup language objects or modules. Since this document does not in general address implementation issues, there are no suggestions in this section regarding the appropriateness of the dialogs below being implemented in one way over another.
The components in this section are presented in that context. They represent actual dialogs (or dialog templates) and not dialog features (like help prompts or the ability to construct confirmation dialogs).
The components in this section can be categorized along 3 dimensions:
This section is sorted first along Config/Return vs. Return only, then along Specification priority. Tables containing alternate sorting orders can be found in Section 5.
Components in this section are considered to have control flows that will not change significantly between application or vendor. For these components, both component-specific configurable parameters and the return semantics (see Section 2.3) must be specified. In addition, any general configurable parameters required by Section 2 must still be specified.
This component is intended to be used as a simple confirmation and will prompt the user with a question or statement. The grammar will handle a variety of affirmative and negative responses and return a "yes" or "no", respectively.
Possible configurable parameters are:
The component is intended to be used as a simple confirmation.
This component will prompt for and recognize natural numbers. As an example, this could be useful in applications where a certain number of items are being ordered or processed. This component should be able to recognize natural numbers between 0 and some large number (e.g. 99,999,999).
The component may also have the following requirements:
This component will prompt for and recognize a fixed-length string of digits.
Possible configurable parameters are:
The component may also have the following requirements:
This component will collect a fully-specified date. Any ambiguity in the user's initial statement of the date will be cleared up by the component without benefit of application-maintained context. For example, the component may use additional prompting to disambiguate. The component will return a fully-specified date. If unable to return a fully-specified date, the component will generate an error.
Possible configurable parameters are:
This component will collect a date. Any ambiguity in the user's initial statement of the date will be cleared up by the component without benefit of application-maintained context. For example, the component may use additional prompting to disambiguate. The component will then return as much of the date as it has obtained. Note that this means the component may return either a fully-specified date or a partially-specified date.
Possible configurable parameters are:
This component would provide only simple error detection and reprompting.
Possible configurable parameters are:
This component will prompt for and recognize a fixed-length string of letters.
Possible configurable parameters are:
The component may also have the following requirements:
This component will prompt for and recognize a fixed-length string of letters and digits.
Possible configurable parameters are:
The component may also have the following requirements:
Components in this section are considered to have dialog flows that vary considerably by application. Thus, only the component's return semantics will be specified (as per Section 2.3). Despite the foregoing sentence, any general configurable parameters required by Section 2 must still be specified.
The Time component will provide generic acquisition of clock times -- for example, "three forty-five AM" or "fourteen twenty-three". If the time is ambiguous (hour < 13 and before-/after-noon designation not provided), the component should conduct additional dialog as needed to clarify. Although time zone specifiers must be recognized if spoken by the user, they are not required of the user.
This component plays a prompt offering the caller a menu of items from which she may select a single item and then returns an identifier corresponding to the chosen item.
The purpose of this component is to obtain a money amount. The grammar will accept any common means of specifying amounts in the currency. For example, US Currency would allow "three dollars and twenty five cents" while German currency might allow "zwei Mark fuenfzig".
This component will collect a date. Any ambiguity in the user's initial statement of the date will be cleared up with the benefit of application-maintained context. This component will then return as much of the date as it has obtained. Note that this means the component may return either a fully-specified or partially-specified date.
This component will encapsulate the task of acquiring a telephone number. If a single confirmed number is desired, the component will confirm the number and perform error-recovery, if necessary. Otherwise, it should provide an n-best list of recognized telephone numbers that the application can filter.
This component will prompt for and recognize digit-strings consisting of sections, such as might be found in a credit card number, social services identification number, and the like.
The component may also have the following requirements:
This component will prompt for and recognize alphanumeric strings consisting of sections, such as might be found in a product code, user identifier, automobile license place, etc.
The component may also have the following requirements:
This component would be responsible for confirming one or more items of information given by the user. If the user indicates that one or more items of information are wrong, this component could walk the user through the process of correcting them by calling other pages and/or components.
This component allows the caller to hear items in a list in sequence and optionally navigate through the list. By default, the list would play prompts associated with items, one after another. Note that any item may be selected at any time by speaking the appropriate grammar entry. In addition, the component allows the user to select an item from the list. The component then returns the index of the selected item.
This component allows the caller to hear items in a list in sequence and optionally navigate through the list. By default, the list would play prompts associated with items, one after another. Note that any item may be selected at any time by speaking the appropriate grammar entry. In addition, the component would allow the user to say application-specific commands that cause corresponding event handlers to be executed.
This component will ask for and recognize a valid postal code.
This component will obtain the correct spelling of one or more names (for example, a given name, middle name or initial, and family name). The names may be obtained individually and confirmed together, possibly with some error correction dialog.
This component will prompt for and obtain a spoken name(s), optionally followed by the spelling of the name(s). As with the Spelled name component, this component may obtain the names separately and confirm them together, possibly with some error correction dialog.
This component will encapsulate the task of acquiring credit card information from a caller. It will collect the card type, card number, and expiration date (month and year). It must support a wide variety of standard cards (Visa, MC, AMEX, Diner's club, Discover, etc.)
The component may also have the following requirements:
This component will obtain the user's email address.
This component will provide generic acquisition of clock time ranges -- for example, "between one PM and three PM". If either or both of the times are ambiguous (hour < 13 and before-/after-noon designation not provided), the component should conduct additional dialog as needed to clarify.
This component will acquire a duration. The component will be able to accept any common duration unit (seconds, minutes, etc.) and will conduct additional dialog as necessary to resolve the units if ambiguous.
This component will acquire Uniform Resource Locators as described in IETF RFC XXXX.
This component will obtain the physical or mailing address of the caller.
This component will allow a user to select one or more items from a set of valid options.
This component acquires an arbitrary-length sequence of letters and digits.
Items in this section are ones that will be considered for future study. Although not a part of the requirements document per se, they are included here to inform the reader of other topics that have been discussed and consciously postponed for future study.
Components may optionally allow the initial grammar to be embedded within an extended initial grammar provided by the developer.
An example of this might occur in an implementation of the Date component. In this hypothetical implementation, there is a single initial "date" grammar that will be used to recognize the user's first utterance. This particular implementation is written under the assumption that the user will be asked to give the complete date (as opposed to asking for the day, the month, and the year separately). The developer may wish, then, to change the initial grammar to be something like the following:
I would like to book an appointment for <date>
This future study item is listed as optional because it presumes that the dialog for the component will prompt for and expect all of its data items in the first interaction with the user. Removing the optionality of this requirement would unneccesarily restrict the implementation of the dialog.
Components may optionally perform a final confirmation of any data items acquired. The developer can specify how the confirmation prompt is to be built from the result.
This requirement is listed as optional because it presumes there will be a single confirmation prompt. Making this requirement non-optional would unneccessarily restrict the implementation of the dialog.
This component will prompt for an online help service.
Possible configurable parameters are:
This component will pass the phone call to an operator.
Possible configurable parameters are:
This component will acquire a physical measurement (eg. "five pounds, 4 ounces" or "three light years").
This component will collect an ordinal number.
Note: this may be challenging in Japanese where the ordinal used depends on the object being ordered.
These tables list all the components using various different sorting orders for convenience. For explanations of the three dimensions along which components are organized, see Section 3.2.
Component | Specification priority | Specification level | Task vs. Template |
Address | Nice to | Return only | Task |
Address, email | Should | Return only | Task |
Alpha string, simple | Should | Configure & Return | Task |
Alphanumeric string, non-fixed | Nice to | Return only | Task |
Alphanumeric string, sectioned | Should | Return only | Task |
Alphanumeric string, simple | Should | Configure & Return | Task |
Browsable action list | Should | Return only | Template |
Browsable selection list | Should | Return only | Template |
Credit card information | Should | Return only | Task |
Confirmation & correction dialog | Should | Return only | Template |
Currency | Must | Return only | Task |
Date, context-compensating | Should | Return only | Task |
Date, fully-specified | Must | Configure & Return | Task |
Date, partially-specified | Should | Configure & Return | Task |
Digit string, sectioned | Should | Return only | Task |
Digit string, simple | Must | Configure & Return | Task |
Duration | Should | Return only | Task |
Error-recovery dialog, simple | Should | Configure & Return | Template |
Menu | Must | Return only | Template |
Multiple choice selections | Nice to | Return only | Template |
Name, spelled | Should | Return only | Task |
Name, spoken & spelled | Should | Return only | Task |
Natural numbers | Must | Configure & Return | Task |
Postal code | Should | Return only | Task |
Telephone number | Should | Return only | Task |
Time | Must | Return only | Task |
Time range | Should | Return only | Task |
URL | Should | Return only | Task |
Yes/No | Must | Configure & Return | Task |
Component | Specification priority | Specification level | Task vs. Template |
Yes/No | Must | Configure & Return | Task |
Natural numbers | Must | Configure & Return | Task |
Simple digit string | Must | Configure & Return | Task |
Fully-specified date | Must | Configure & Return | Task |
Time | Must | Return only | Task |
Currency | Must | Return only | Task |
Menu | Must | Return only | Template |
Partially-specified date | Should | Configure & Return | Task |
Simple alpha string | Should | Configure & Return | Task |
Simple alphanumeric string | Should | Configure & Return | Task |
Simple error-recovery dialog | Should | Configure & Return | Template |
Context-compensating date | Should | Return only | Task |
Telephone number | Should | Return only | Task |
Sectioned digit string | Should | Return only | Task |
Sectioned alphanumeric string | Should | Return only | Task |
Postal code | Should | Return only | Task |
Spelled name | Should | Return only | Task |
Spoken & spelled name | Should | Return only | Task |
Credit card information | Should | Return only | Task |
Email address | Should | Return only | Task |
Time range | Should | Return only | Task |
Duration | Should | Return only | Task |
URL | Should | Return only | Task |
Confirmation & correction dialog | Should | Return only | Template |
Browsable selection list | Should | Return only | Template |
Browsable action list | Should | Return only | Template |
Address | Nice to | Return only | Task |
Non-fixed alphanumeric string | Nice to | Return only | Task |
Multiple choice selections | Nice to | Return only | Template |
The editor wishes to thank the members of the resuable dialog components subgroup for their help in preparing this draft:
Michael Brown (Lucent/Bell Labs)
Daniel C. Burnett (Nuance)
Deborah Dahl (Unisys)
Carolina di Cristo (CSELT)
Linda Dorrian (Productivity Works)
Andrew Hunt (SpeechWorks)
Robert Keiller (Canon)
Andreas Kellner (Philips)
David Ladd (Motorola)
Jens Marschner (Philips)
Stephen Potter (Entropic)
Dave Raggett (W3C/HP)
Ramesh Sarukkai (Yahoo Inc.)
Frank Scahill (BT)
Kuansan Wang (Microsoft)