Federal CIO Council

XML Working Group

 

Wednesday, April 17, 2002 Meeting Minutes

 

GSA Headquarters

18th & F Streets, N.W, Room 5141

Washington DC 20405

 

Please send all comments or corrections to these minutes to Glenn Little at glittle@lmi.org.

 

Mr. Ambur opened the meeting by introducing himself, briefly explaining the day’s focus, and asking all participants to introduce themselves. He then turned the meeting over to Mr. Brand Niemann, Jr. of Tax Analysts, for a presentation entitled “Web Publishing on DVD: Repurposing Federal Data in XML.”

.

 

Mr. Niemann:

 

“Web Publishing on DVD: Repurposing Federal Data in XML”

 

Hello—I want to mention by the way that we now have about two gigabytes of “XMLized” data on DVD—an indication of how this project has come to the forefront of my company’s work.

 

Slide 1:  My presentation is on Web publishing on DVD, and repurposing federal data in XML.

 

Slide 2:  [Process & Product.] I’m going to talk about a process and a product. It’s neat to be able to have both. The process involves XML/XSLT, and the product is our own.

 

Slide 3:  [Background on Tax Analysts.] This is a little information about Tax Analysts. We’ve been around for over 30 years, and we work in simplifying and facilitating the tax process.

 

Slide 4:  [Background, continued.] Here’s a little more about my company. I don’t know a lot about the content of tax legislation. As I’ve gotten more involved, and as I learn more about the tax laws, I’ve come to understand how complex they are. We hope to make it easer for tax professionals.

 

Slide 5:  [Goldfarb and SGML.] Charles Goldfarb spoke previously. We think of him as the originator of SGML. His earlier profession was as a tax attorney. I like that tie in. I like to bring it out.

 

Slide 6:  [Product overview.] My project started as a pilot. My company said they’d let us see how far we could go with it. That was about a year ago. There was period when my company was slow to adopt anything at all, and only a month ago did they finally give sufficient approval to develop it at the largest scale. Only a handful of people in my company has even seen what’s been developed. You’ll see some new things here. Our goal today is to be ready for a May 2002 delivery to show our customers, with final rollout in September.

 

Slide 7:  [Document sources.] All our document services originate from the IRS. People may be asking, “What is it you do if nothing is original?” We add value to content, and tax attorneys go through every document to summarize and simplify them. We have the aggregation and organization of data. This is neat because just this week we had a call from IRS saying they wanted to buy back the data that’s been repurposed.

 

Slide 8:  [Database size.] Our databases are very large. A single database, the Letter Rulings, contains 110,000 documents. I’m talking about industrial strength XML processing to handle that many documents. I’m hoping to get a final count this week. I’ll mention the Court Opinions database also. We hope for half a million or more documents in the end.

 

Slide 9:  [Research library.] We have as a current product, a product that consists of 4 CD ROMs. I’ll illustrate it. We currently use a platform that’s 10 or more years old. This product has continued to be successful even using older software—however, because there’s so much data, our customers have to use multiple CD ROMs. We provide a disc with summaries. They have to go to another disc to get the full text. It amounts to swapping the disc. We have a current HTML Lotus Domino Web version.

 

Slide 10:  [DVD solution.] Going forward, since many of our users are CD-ROM based, our next logical solution would be DVD. The DVD solution was proposed 2 years ago. For publishers as a whole, they’re slow to adopt new technology. A greater emphasis has been placed on the actual content. Acceptance of new technology has been slow, however we think we can now go forward with the DVD solution.

 

Slide 11:  [XML’s role.] I mentioned that we’ve been using a proprietary platform for 10 years. It’s called Folio Views from Nextpage. The latest products move from proprietary to open source such as HTML or XML. Here’s the benefit of XML. We had to go through the development of looking at requirement that dictated migrating to an XML platform.

 

Slide 12:  [Summary of tools.] This move to XML has benefited us tremendously, and been a big part of our success. It’s provided us tremendous flexibility, and the industrial ability we needed to take all these discs and put them on DVD. In terms of software and getting data into XML, we use Omnimark v5.0 to convert our Folio Flat file into XML. We used Michael Kay’s Saxon, his tool kit that involves the Java compiler. It involves taking the XSLT style sheet and processing it to an HTML form. The Java compiler is required for this.

 

Slide 13:   [Data migration.] Here’s a quick overview of the migration from Folio Flat file to XML version 1, and also the process to HTML version 4. Our XML is aided by the DTD specification. XSLT and cascading style sheets aid the HTML. Here I want to show you a quick conversion from XML to XSLT. It is a command line process. The beauty of it is that, for example, we have a directory of…(Let’s go to the Internal Revenue Code)…what happens is, I have some sample XML documents. In cases where we have 45-50,000 documents, in this directory we’d have as many documents in the database broken down at that level. At the command line, we can run Java Script with Michael Kay’s Saxon tool kit. It’ll go through the entire directory and convert every XML file to an HTML component. It gives an idea of the magnitude of the processing. It’s as simple as developing a style sheet and pointing to an XML file and processing.  For large databases I would let it run and then come back, and it would be done.

 

Ms. Glenda Hayes:  Brand, could we see one of documents? Are you using formatting or content tags?

 

Mr. Niemann:  I’ll bring up all the style sheets. [Mr. Niemann browsed to several style sheets and displayed them on the screen] Here’s a cascading style sheet. Here’s an XSLT example. We’ve componentized the entire development, so if you look at the first four style sheets, every database shares all the style sheets in common—paragraph formatting, paragraph style, character style, etc. In this database, we create unique properties, so customization can take place here. We have the flexibility (extensibility) of calling other style sheets that can be used together. I’m not sure if I answered your question. Here’s XML here [another example on the screen]. Above we have a definition of a style sheet. We have most of this markup…the “record type” section refers to a Folio. We took the Folio and converted to our own definitions using XML. That includes the fielding. I want to show this to you in the mechanics of product. I’d like to show you afterward. Using this style sheet and an XML file, we can point to these and process the XML file. At the top we have the cascading style sheet, then we have our own. There are other things I’ll show you in the product. Let me move forward.

 

Slide 14:  [Use of Folio Flat file.] This is the neat thing. The XML we generate preserves the relationships in the flat file, and the document formatting. Let me show you the relationships. [Mr. Niemann displayed another example slide indicating the relationships.]

 

Slide 15:  [Overview of DVD product.] OK, here’s what our product looks like. This isn’t our product, but it shows our database for demonstration purposes. Notice we’ve created this formatting. We had all this in our CD ROM product. We captured it all through XML—the formatting, indenting…every database has this hierarchy—the table of contents structure. Literally, the structure you see is captured by the XML process. This link from one database to the next is all XML-captured fielding, linking, and leveling. I’ll show you. Let’s go to the next slide. The neat thing is we’re only a couple of slides away. I want to spend more time on the end product. I’ll show you our wonderful end product. [Mr. Niemann brought the product onto the screen.]

 

This picture is one with the IRS building in the background. We have these pictures to show as part of a research library. We have a formal name and a common short name. We’ll have all these capabilities where you have a usual search, but within the platform we have, we can have very refined searches, which get to the document and field level and the XML process we use.

 

I showed you the new interface. I want to show searches, save searches, and multiple database searches. I want to show you the DVD drive I’m working from on my laptop. I have 1.98 gigabytes of data sitting in this directory. [Mr. Niemann displayed the directory on the screen.] These folders are referring to navigation points to individual databases.

 

The first one—let’s view a table of contents view of a database. That’s pulling the database off the DVD drive. Here’s a table of contents representation of the database.

 

Now let’s do a more specific search. Our tax attorneys refer to every document by some name. Let’s go, for example, to Section 21, which was…I’m now in Section 21. I was able to do this search because of the XML creation I did for these documents. Let’s do a word search on “children.” We’ll search the entire Internal Revenue Code and see what we get. We get a hit list. Let’s go to the first one, go down to the hit… “Dear children” appears in the paragraph, then we keep on going to the next document.

 

This is talking about the index really as opposed to anything else. Now I want to search a large database to give you some ideas on performance. A large one is IRS Documents and Revenue Rulings. Let’s retrieve one…here’s a description of what it is. Let’s retrieve a document…there are 16,000 revenue rulings, so let’s get this one from the DVD drive. We provide examples to make it easy. It brings it up…there’s the document.

 

Mr. David Eng:  How big is the file?

 

Mr. Niemann:  This isn’t the largest—it’s 71 megabytes. Our largest is 522 megabytes and that’s compressed. This database in the original migration was over a gigabyte, and we did some additional compression. These are going to be impressive searches. Tomorrow I’m going to show my company this product. I’m going to write down rule 379…let’s get another example that’s more random [retrieving a search]…there it is. I think that’s pretty good.

 

Mr. Marion Royal:  I missed how you go from proprietary documentation to XML—how did that work?

 

Mr. Niemann:  There were two steps involved. [Mr. Niemann displayed a “summary of tools” slide] We used a text conversion product run from the command line. You write rules. I’ll give you an example.

 

Mr. Royal:  This tool recognizes paragraphs, etc.?

 

Mr. Niemann:  Yes—for example the ampersand—we had to convert every one to “amp” because in XML it’s an escape character. There’s a rule that as text comes in it evaluates it. We have hundreds of rules written to get it to XML.

 

Mr. Royal:  You had to manually define the rules? It didn’t understand XML enough to do it?

 

Mr. Niemann:  We work at this level because we have a lot more control over the product. There is a tool, and it’s gone through several revisions. It may be doing better now, but it wouldn’t do the linking we wanted. I want to show you a couple more searches. Let’s do a multiple database search. Let’s show…I’m going to put these large databases in here and search for...let’s search on the word “default.” Let’s search over all these databases now. It gives us an order of databases we showed in our selection, and others. I want to show you these other documents. [Mr. Niemann had been displaying the result of the search.] There’s a multiple database. The final thing I want to show you is a multiple database search, and you’ll notice at the Court Opinions that we have a heading that enables the reader to see additional fields and decide if it’s the correct document. We can requery these documents and show it by word in terms of hits instead of by title.

 

The final thing I want to do is save this search. I’m showing more of the product now. I can go back to my saved search, bring it back up and requery if I want to. I’ve shown a lot of the features and capabilities.

 

Mr. Ambur:  Are there any quick questions?

 

Mr. Tim Marr:  I’m from the Social Security Administration. We have a similar problem. I wonder how you’ve dealt with the constant changes of documentation—how often do they send you changes, and how do you track and feed the Court Rulings into the system?

 

Mr. Niemann:  This is where this type of product shines. Our current process involves rebuilding the entire database every time a document changes. The new LivePublish build process can recognize a new or changed document and update the document to the database. That’ll be a big deal for publishers because new documents can be published and updated quicker, especially for users that may want to access the recent documents on the web. An XML file that establishes the relationship in our documents manages the build process. We’re counting on XML to have the glue to do that.

 

Mr. Marr:  I’m not sure about the human factor. If these five documents become these 12, but four are no longer relevant, and two are, how do you capture that in XML?

 

Mr. Niemann:  [Mr. Niemann displayed a file involved in the process.] Currently the way it works if the following: Where a new file exists, the name and location of the file is added to the XML build file. Where an old file no longer exists (rarely the case for us), the name and location is subtracted from the XML build file. Where a file is changed, the build process recognizes a new date stamp on the file and automatically updates the file. It is as simple as that. A document management system is not currently in place with our process, however may be a future direction.

 

Mr. Roy Morgan:  I’m from NIST. Where do the DTDs come from?

 

Mr. Niemann: We built them ourselves. I created them. I've found XML Spy to be useful when I occasionally need to create a DTD from scratch.

 

Mr. Morgan:  This group is interested in how you can share those, but in general. It seems to me you’d like it if IRS would deliver documents to you in XML already. How do you plan to do that?

 

Mr. Niemann:  You’re right. We need to figure out how to do that.

 

Mr. Morgan:  You’d like GPO or someone to transform that?

 

Mr. Niemann:  That’d be great.

 

Mr. Kevin Williams:  Have you thought about repurposing your product for other reasons?

 

Mr. Niemann:  I think that’s a step our company is very slow to adopt. This is the furthest we’ve gone, and it’s taken years. I think maybe voice and hand-held devices.

 

Mr. Williams:  I just pictures tax lawyers in court using their hand-helds.

 

End presentation.

 

 

Mr. Ambur:  Are we set for the next round?

 

Ms. Susan Turnbull:  We’re set for the next two.

 

 

Ms. Turnbull:

 

For the remote participants I’m passing around a 1-page handout that we’ll be posting at the XML.gov site. 

 

I wanted to tell you about an ongoing monthly workshop series, the Universal Access Collaboration Expedition Workshop series that George Brett and I began a year ago. It’s a staging area for some of the successes you’ll be seeing this morning. The theme of our workshop on April 16 was Multi-channel Service Delivery of Health Information. We had participants from many agencies including NIH, GPO, IRS, EPA, FEMA, etc.

 

The monthly workshops serve to open up dialogue across seven affinity working groups.  This includes four affinity groups of the CIO Council, and two from the Interagency IT Research and Development Working Group of the National Science and Technology Council. Owen Ambur, Co-chair, XML WG, frequently attends the workshops. We value his contributions, including his vision of how the Blue Pages could be rendered in XML and VoiceXML, one of the presentations today. You’re invited to join us at the workshops. Further information is available at our collaborative learning exchange and knowledge repository at: http://ioa-qpnet-co.gsa.gov/UA-Exp. We’re also piloting a mirror site, http://people.internet2.edu/~ghb/coexp, that is open source and XML-based. George Brett, my workshop co-chair from Internet2 will demonstrate this second site later this morning.

 

In the presentations this morning, the VoiceXML and Blue Pages team will demonstrate how the flexibility of XML enables informational services to be delivered in multiple ways. In the future, we might ask citizens “How would you like your information?” and then be able to respond with ease to seven or more channel preferences (print, Web, phone, hard copy Braille, PDA, Audio Ebook, CD, and DVD). This is the compelling demonstration of the Quad Council-recognized VoiceXML pilot, led by Brand Niemann of EPA. This pilot received a top Innovation Award from the Federal Leadership Councils on March 20, 2002 at FOSE.

 

In the past two weeks, this team has also prototyped what might be possible with the Blue Pages if rendered in XML and VoiceXML. This pilot will be demonstrated today. 

 

There is a new publication available from the Federal Architecture and Infrastructure Committee reflecting what the affinity groups have learned in the workshops about Extending Digital Dividends: Public Goods and Services that Work for All. Janina Sajka, of the American Foundation for the Blind, will demonstrate structured navigation of this guide in XML-based audio ebook format. It will become the first audio Ebook available through GPO. Using the same voice application network, employed for the award-winning EPA pilot, Janina will also demonstrate how this guide can be accessed by phone and navigated through the keypad. Now let’s turn to Janina.

 

 

Ms. Janina Sajka:

 

Thank you all for having me here. This is a favorite topic for me. Those of us who live with print visibility limitations have reaped great benefit from HTML, and we hope for more from XML. I want to give you a tour of how things work today, how we hope they’ll work tomorrow, and hopefully show you where they’re going. We hope this technology will deliver much for print disabilities and the 508 community. The two gains that we expect are from enhanced functionality and usability, and we expect that anything the government publishes should be accessible by people with disabilities. There are some in there. We care about semantics, for example, DAISY-style structural navigation is part of that, and I will show you some of that in products that already exist and are on drawing boards. I want to show you how that happens today, because now there’s not much XML production going on. First, let me define disability the way it affects us.

 

[Ms. Sajka defined the functional limitations of print disability.]

 

There are other reasons for enhanced functionality besides losing one’s vision, such as physical condition and learning disabilities. The differences are often part of the problem. Some of what I’ll show today has been demonstrated to help. I’m a little ahead of myself—I want to talk about how things are today. The end result is to begin to learn to read.

 

How we do things today? If you’re blind, you might get a publication in Braille. That’s good. As it happens, most people who gain apprentice ability do so from the virtue of living long enough. Most of us are at risk of disability just from living long lives. You might have to pick up a new literacy skill. Braille is an example of a new literacy skill. It’s simply more difficult as we age.

 

If you can no longer see the print at all, you’re likely to get it on audiocassette. The Library of Congress maintains and extensive collection for leisure and productive reading, and agencies will give kids textbooks, in addition to making material Web accessible. What would you get? Something like what I have in my hand [Ms. Sajka showed the group a small plastic case.]

 

This is Medicare’s publication on how you find out about your benefits. I have essentially a standard audio player here, about the size of a hardback book, with standard controls. It sounds like this when you start it. I’d like your indulgence to understand how this works. This publication has benefited from focus groups that study how you can better design an audiocassette. Let’s just listen to it…

 

[Ms. Sajka played a portion of the Medicare cassette’s instructions, highlighting the difficulty of comprehending the instruction set solely with audio. Ms. Sajka then demonstrated a knob that speeds the playback for faster listening.]

 

Now this next part is extremely important. If we were simply putting in a cassette of a novel we wouldn’t have a problem because it’s supposed to go end-to-end—but a novel isn’t a directory, or a tax guide, etc. You need to be able to go to a section and figure out how to navigate through it. Here’s what happens.

 

[At this point, the tape provided audio instructions of a series of different beeps one might hear, with each beep designed to convey a different message.]

 

Did anyone hear a different sound?

 

[No one was able to distinguish the beeps from one another.]

 

I didn’t initially either, but they’re different lengths.

 

[Ms. Sajka replayed the beeps.]

 

How about we go looking for the first section, which ought to be the table of contents? We’re instructed to put our player in fast forward and wait for one of those beeps.

 

[At this point, Ms. Sajka experienced some difficulty in navigating to the introductory portion of the tape, further highlighting her point.]

 

When you get good at this, you realize there must be some iterative list going on. Imagine if this was a textbook. There has to be a better way. I hope what you’ll see today will be better for print disability and have wider applications.

 

Indeed, we do have something shortly to become available to students, because what I’ve just shown you is a terrible way to have to study. You have maybe six, seven, eight cassettes. You have to go find them, rewind the tape, then count tones. Good luck. Let me show you a demonstration book here. This is a little bigger than my tape machine. It’s a special CD player with a client parser in here. The controls consist of a column on the left with three buttons, and on the right there are four buttons.

 

[Ms. Sajka described the functionality of each of the buttons.]

 

The best part is in the middle, which looks much like a telephone keypad. It has the same numbering, with a couple of arrow buttons at the bottom. Let me power this up and increase the volume. Let me find one…this is a tour book produced as a demo. I love using this because it helps us explain the way we navigate the book. The “4” and “6” keys have arrows pointing left and right, “2” and “8” change the increment we move forward and backward to map to HTML. “H1” is level one; “H2” is level two. We’re mapping the structure of the book. I’m going to take this to level one and see what the level one headings are.

 

[Ms. Sajka navigated through the book. The group heard entries such as “Beginning of book.”

 

We can literally pick the section we’re interested in. For example, if we’re in Turin and this is our tour book, and our task is to get to the train station, we move forward, and drop down a level. I’m listening to enough to understand where I am.

 

[Ms. Sajka navigated through levels, and the tool verbalized specific travel information.

 

This is already in use in Japan and Sweden. It’s the United Kingdom’s analog to our Library of Congress guide to the handicapped. We missed launching a lot this month. In the U.S., the private nonprofit that does most of text books in audio and ASCII will be opening up a program with about 3000 titles this summer.

 

[Ms. Sajka dialed out on the meeting room telephone to demonstrate another application. The phone had an external speaker attached so that the group could hear. She demonstrated a way to scroll through the same travel guide the group had heard on her machine.]

 

That’s a better way to do IVR. IVR has a bad name—justly earned—because it’s confusing. The mappings are different as you go to a different menu. It’s reasonable to revive it where browsing makes sense. So far, we’re talking about only four buttons on the keypad. They’re reusable, and you use them all the time. The kids who have used this love it. We’re talking to folks at CMS (the new name for HCVA) as a viable technology initially for folks with print disabilities, but potentially for general use because you don’t need to be blind to benefit. What we’ve done is there’s a CD ROM here. Sometimes that’s better, but there are very good synthetic speech tools, so it’s not always necessary to do audio recording s for this kind of information. You can do synthetic speech. We expect with digital talking books we’ll have a fair amount of synthesis used for technical content and bibliographies. The data is there in a form where you can pause, and you can conceivably spell it out. You can’t do that with a narrator.

 

What I’m getting around to is that we have recorded audio, but we’re throwing something away that’s already available on the CD. I’m going to launch a computer-based browser.

 

Mr. Marion Royal:  Did the CD you played have anything to do with XML?

 

Ms. Sajka:  The current requirements were defined a while ago, so this is XML. Currently we’re going to the next standard. Our Library of Congress started a separate standard. Now there’s a single standard—ANSI’s standard. There’s a DTD you can get online.

 

[Ms. Sajka demonstrated a Web page.]

 

It’s in the digital dividend book I brought that you have. I’d like to take you to the guide.

 

[Ms. Sajka displayed a Web page of text, with a table of contents on the left. She changed the size.]

 

This is Z39.86 from an earlier standard. This is all implemented using SMIL.

 

Ms. Turnbull:  We don’t see it , but it can be such that as it’s speaking, it’s yellow- highlighted so we can track it.

 

Ms. Sajka:  Yes—it’s making speech recognition stand on its head. Why does it work? We know the succession of words, so we can do that.

 

I’ll take us to the other title. We have our navigation here, using the “Control, Alt, Arrow” because that’s how we do it on this player. Let me close this player.

 

By the way, we’ll have this book on CD ROM when the guide is published by GPO. Another title you can play with today if it cooperates with me here…

 

[Ms. Sajka pulled up the Martin Luther King “I Have a Dream” Web page. The group heard an introduction to the page, followed by a portion of the speech.]

 

Ms. Sajka:  We intend it to work for people with print disabilities. Learning disabilities are significantly aided by hearing and seeing. If you can get that redundancy, it’ll help. It’ll drive literacy. Any time you can reinforce learning by using an additional sense it helps. The greatest illiteracy is in people who don’t hear.

 

Other than questions, I think I’m done.

 

Mr. Brand Niemann, Sr.:  The mechanics of the phone process were done in a week or less. Janina hired a developer who used a 300-megabyte file, and in less than a week, it showed the power of the online protocols that “Show Me “ provides. The programmer was familiar with Web programming but not XML. “Tellme” sped the process up considerably.

 

Mr. Royal:  You mentioned adding human senses to aid learning. That’s true of all people, not just disabled, is that true?

 

Ms. Sajka:  Yes.

 

Ms. Turnbull:  A third of New Zealand is in school learning, not just the Web, but also the ability to implement multiple channels for learning. It’s very important.

 

Unidentified member:  What format is the “I Have a Dream” book?

 

Answer:  It’s all open, not proprietary. Video, graphics, it flows through HTML as well.

 

Ms. Turnbull:  We made one for FOSE. It took a week to make and cost under a thousand dollars.

 

 

End presentation.

 

 

Mr. Ambur:  Any other questions?  We’re about 10 minutes behind. Let’s take a ten-minute break.

 

 

Mr. Brand Niemann, Sr.

 

“XML Web Services: VoiceXML and Phone Directories”

 

Mr. Niemann displayed the URL for the presentations and training available through his organization, including today’s presentation.

 

Mr. Niemann:  It’s titled XML Web Services. You’ll hear more next month. The subtitle is VoiceXML and Phone Directories. I got involved in this about six months or so ago. It’s received some recognition recently and the challenge is to go beyond that with government Blue Pages—make the government phone pages XML as well.

 

Slide 2 (overview):

 

 1. XML and XML Web Services

 2. EPA’s “Where You Live”

 3. Multi-channel Dissemination of the LEPC Database

 4. XML Content Network and VoiceXML Presentations and Recognitions

 5. Phone Directories as XML Web Services

 6. Contact Information

 

 

I won’t have time to talk about content, but it’s there with the other presentations [at the xml.gov website]. I’ll talk mainly about turning the Blue Pages into a pilot.

 

Slide 3:  [Definition of XML and XML Web Services] The idea is not only to get individual vendors to incorporate XML Web Service standards into products, but to work with one another to make sure the products interoperate. A good example is in VoiceXML—the Tellme product.

 

At EPA, we’ve been told to provide more information specific to the areas in which people live. The context of this application is that every time someone has any emergency that person will call and say, “What are you going to do about this?” We often turn it right around to their local community. Congress required state governors to set up committees that provide information to the EPA. EPA’s charge is to support those groups.

 

Slide 4:  [Local Emergency Planning Committee—LEPC] There’s a database they’ve made for that purpose, with about 3000 listings. It could be updated over the Web. Currently it’s not.

 

Slide 5:  The LEPC database uses multi-channel dissemination. By multi-channel, we mean Field Names, Web, Print, CD/DVD, XML Web Service, Telephone, and Digital Talking Books.

 

Slide 6:  We didn’t have perfect foresight. In terms of it being a Web Services application, but it terms out to do that, the geo referencing come in terms of LandView. Since 9/11, we have been asked us to remove geo-referencing. You won’t see it but it’s there and can be exploited for other information. 

 

Slide 7:  [Image of an LEPC web page with a graphic of the United States, URL http://www.epa.gov/ceppo/lepclist.htm

 ] This is the application. Someone give me a ZIP code. This one of several options you can use to query. We’ve found that not everyone knows their county, but most everyone knows their ZIP. This is the information you see from the Web interface. It could be updated remotely if we get permission.

 

Slide 8:  [Image of a Filemaker Pro form.] The last one was the Web access. Let’s look at the print access. It’s set up to do print-on-demand using the Filemaker tool, about 150 dollars.

 

Slide 9:  [Image of a list of Filemaker locality names and associated addresses, URL http://landview.census.gov/

.] FileMaker follows the Hypercard paradigm, having its roots in the original Hypercard as a subsidiary of Apple Computer. We work with other agencies to make this and other spatial databases available through the LandView program.

 

Slide 10 [Graphic of the Filemaker information flow,

URL http://www.filemaker.com/xml/overview.html.] T

his is the way the Filemaker works. We have a style sheet if you want to see it in a style table.

 

Slide 11:  [Image of XQuery output, URL http://130.11.53.73/lepc/FMPro?-db=LEPC.FP5&-format=-fmp_xml&zip_lepc::zip_code=22181&-find=] Lurk

ing behind that database I just showed you is an XML Web Service. If you knew this was an XML Web Service and had a description so you could query, then you’d have an XML service. I’ve done that on the fly here. Now let’s change the ZIP. I just change the syntax on the URL and I get the XML file version of that. We need one more element to make the EPA service work with the Tellme service. That’s the VoiceXML file itself. We input it on my server and this is the Voice XML file. You can retrieve this and look at it. It’s both conventional markup and some scripting that orchestrated the call. We haven’t made this for public consumption yet, but we’re talking about deployment now, which would involve making it fully accessible for all fields in the database. 

 

Slide 12:  [Voice XML architecture, URL http://www.voicexml.org/, http://www.w3.org/Voice/]

 

 

Slide 13:  [Tellme VoiceXML, URL http://www.tellme.com] Here’s more about how you can access some of the Tellme services.

 

Slide 14:  [Telephone: Tellme Studio VoiceXML, URL http://studio.tellme.com/]

There’s the address for the studio tool. You can go to an 800 number all for free.

 

Slide 15:   [Displays the XML code]

 

Slide 16:  [Scenario and results for a query] Now if we go to the phone… this is something you can do at home.

 

[Mr. Niemann dialed EPA local emergency information he entered the ZIP in at a prompt. The system responded with emergency contact] You get the same material on voice as what shows on the Web.

 

[At this point, an unidentified member mentioned that the voice is synthesized, and therefore easy to tap into a database using XML. 

 

Side 17:  [Background and URL for Digital Talking Books, http://www.loc.gov/nis/niso, http://www.daisy.org] [Mr. Niemann mentioned DAISY—the Digital Audio-based Information System.]

 

Slide 18 &19:  There are three types of players. [Mr. Niemann displayed two of the players on slides 18 and 19, with their associated URLs—http://www.afb.org/aw/AW0203toc.asp for the first,

and http://www.visuaide.com/victorpro.html

 for the “Victor Pro.”]

 

Slide 20:  [A list of XML Content Network and VoiceXML Presentations and Recognitions] We haven’t put the local emergency planning database on CD and run it in a book, but we don’t see any problem with it, because once you have XML content, it translates easily.

 

Unidentified member:  Has senior GSA management recommended you talk to recreation .gov and Geospatial One Stop?

 

Mr. Niemann:  We were asked to provide a tutorial on this on May 22. Nextpage and Tellme will assist in production of that tutorial. We’re putting emphasis on phone directory content now.

 

[As a time consideration, Mr. Niemann bypassed slides 21-29.]

 

Slide 30:  [Outline of Blue Pages activities]

 

 Slides 31 & 32:  [Screen-captures of Blue Pages Web pages.

 

Slide 33:  [Visionary Goals for the Blue Pages. This slide referenced Owen Ambur and Susan Turnbull and some of the relevant work with which they have been involved.] Interestingly, the co-chair of our group [the CIO Council XML Workgroup] has worked on this much longer than we have. His vision goes way back to when XML was starting. Owen was there already with an XML vision for the Blue Pages. Susan has been working on it also. Owen deserves a lot of credit for his vision. I felt the challenge to see how we could use XML as a framework.

 

Slide 34:  [URL for free access to Health and Human Services (HHS) Information, http://www.211.org/interactive%20map.asp]

 

Ms. Turnbull:  Two states don’t have plans to implement statewide “211.” 

 

Mr. Niemann:  There will be $1-2 billion in grants to build the infrastructure. HHS came to us. We’d like to integrate environmental information with HHS’s health information network.

 

Slide 35:  [Coordination with the E-Gov, http://www.fgdc.gov/geo-one-stop/index.html]

This is what’s up now. This is more than just a one shot program. We recently let a contract worth up to $300 million over seven years to build EPA’s node on that network. We talked to Charles Nethaway regarding Recreation One Stop. In phase two of recreation.gov, there are discussions about whether it should be done partially or completely. We had a meeting with John Moeller a week ago. He encouraged us to move forward with this, which would become part of the Geospatial One Stop and have a link to FirstGov.

 

Slides 36 & 37:  Here are the fields we have on the Blue Page listing. It’s not ideal for XML or Voice XML.

 

Slide 38:  We decided to use a three-pronged approach:

 

1–VoiceXML for “ideal” Blue Pages database.

2–VoiceXML for “current best possible” Blue Pages database.

3–XML from current sample Blue Pages database.

 

Slide 39:  [The strategy.] First we have to understand the technology, then look at the products that are available. We have to be careful. Our mention of any product doesn’t mean we’re endorsing it. We have to be very careful.

 

Slide 40: [Strategy continued] We may need a new standards group. In the U.S. INCITS is working on a special metadata standard. We may need a new one. The basis is that we need to standardize XML content. That’s ideal if agencies provide their critical content in the right form. XML tools will allow us to syndicate that. There are three approaches to the strategy:

 

1–RDBMS to XML direct.

2–Modify/repurpose RDBMS for more optimal XML uses.

3–Define new Markup Language and collect new database.

 

M

ost of these are relational databases

 

Slide 41:  If we look at Owen’s vision versus what we have at EPA, we did all but one of these. What we haven’t demonstrated is that you can go from XML to PDF in printing XML output. The architecture to do this is an integrator node that integrates data from individual agencies.

 

Now I’m going to turn it over to Craig Brown from Nextpage, who’ll demonstrate the Tellme part of the product. Craig?

 

 

Mr. Craig Brown: 

 

As Brand said, I’m Craig Brown with Nextpage, based in Salt Lake. Our history is with Folio. We’re focused now with Web based products. I’m a lawyer by training and background. I practiced for four years in San Diego. I’m working on what I want to practice now. I don’t even know what kind of life I want to practice. It’s nice to be part of a good group with Susan and Brand who want to focus on this kind of thing.

 

My job is to help you understand the back end of the technology, and help you understand the content. Many times the content is distributed in many formats. Yesterday my wife was on the phone with her mother, and there was a woman in the neighborhood having a conversation with her husband…[Mr. Brown recounted a joke about a relationship wherein one person thought all was fine and the other did not.]

 

The point is that sometimes we don’t know we’re not getting along. With distributed content, we can have political and technical barriers. Currently with Blue Pages, this information is gathered by hand, repurposed, and sent out to those who need it. Our approach is for it to sit with the agencies that author it, and let you sit there and bring it into one place virtually.

 

Slides 44 & 45 (skipped Slide 43):  [Screen-capture of the “NXT3” browser.] [At this point Mr. Niemann opened an Internet browser, clicked through the NXT3 site, and found a phone number.]

 

Mr. Brown:  It’s very visual. He can navigate because he can see it. Brand is opening up different folders. Brand, let’s try a search. Put in a functional listing, the put in AIDS. You get the AIDS hotline. You don’t need to know that it’s under HHS.

 

Slide 46:  Go to the back end and it’ll show you the XML. It’s important because it’s in a format that VoiceXML can parse and you can hear it. That’s where the Tellme comes in. The most important thing to remember is, imagine each of these folders is sitting on different servers in different locations, in different organizations. You run a search and it pulls it up on the fly. There’s no updating. There are two points here:

 

1-     It’s a format that can be used by VoiceXML, and

2-     It can be pulled up from many locations.

 

I’m going to let Greg O’Connell from Tellme take it over from here.

 

Mr. Greg O’Connell:

 

Thanks Craig. I’m responsible for the Government group in Tellme. I used to work with Netscape. Tellme was founded on the premise of “What way can we make Web information more accessible?” around 1998, when Web penetration was about 29 %. Many people didn’t have access to the Internet. VoiceXML was just coming of age. Microsoft, AOL, and Netscape were familiar with XML-like programming language, as a great way for telephone access to Web information.

 

Slide 47:  [Familiar Web architecture and Tellme architecture.] The concept isn’t to call a number and have a website read to me, but rather…

 

Slide 48:  You have an HTML interface to existing Web platforms and services. What Tellme does through VoiceXML is let you turn your phone into something like a browser. It lets you leverage the same platform you have, to make it accessible over the phone. We inserted the VoiceXML interface layer. From the access, scalability, and performance standpoint, it’s all there and all usable.

 

We’ve been working with Susan and Brand and organizations driving access issues through studio.tellme.com. Brand’s and Janina’s were done through our Studio platform. Tellme has been a major force behind the VoiceXML standard. This studio.tellme XML platform allows developers to borrow time on our network and take advantage of this, build applications, and test and run them on this XML platform. That’s essentially what we did with this EPA example. It’s a prototype, not taking advantage of sexy attributes.

 

Today I’d like to give a real world example of VoiceXML and its power. We’re under contract with Utah and 51 travel advisory services. 511 was recently allocated to states for travel advisories. How do you do that? The traditional approach is frustrating. It requires a separate telephony architecture. Many states have robust travel advisory Web capabilities. For example, in Utah you can get information on speed of traffic on interstates. They had a capable Web platform, and we drew right into it for VoiceXML applications. 

 

We’ve put together a canned voice demo. I’m going to show you the Utah demo. What I’ll do is dial up the 511 service in Utah and you can get an idea of how well it works, then I’ll dial up the canned demo we’ve put together for the Blue Pages. It’s not fully developed, but with the Utah demo you can get a sense of how the application walks you through the service. Then I’ll go back in as a more experienced user and barge in with requests, so you can get a sense of how an experienced user might be able to take advantage of it.

 

[Mr. O’Connell dialed in and performed voice response to automated prompts. The system was interactive. He received traffic information from interstate 80. He then interrupted successfully. The system responded appropriately to his voice interruptions and gave place-specific information about different roads. Mr. O’Connell then switched to information about light rail.

 

Mr. Brown:  That application is entirely open standard, all VoiceXML, all in XML. It brings lots of information together in open environments. It’s allowed people to quickly bring up voice recognition systems in an open way. It couldn’t have happened four or five years ago.

 

Mr. Eng:  Were the Olympics multi-lingual?

 

Mr. Brown:  We didn’t do multi-lingual for the Olympics. We’re not reinventing voice recognition. We use “Nuance.” It does support multi-lingual. We haven’t done that, but there’s no reason why we couldn’t. We do make enhancements to voice applications depending upon the application—for example, in Utah there are a number of synonyms we add in, based on what people say. There are also a number of dialects for which we modify the base Nuance product to make it more accurate. The Spanish or French version might not have the same level of accuracy. We buy those products out of the box. We’ve reached a point where people are willing to accept it.

 

Mr. Eng:  It’s an English form of text, but can it translate?

 

Answer:  VoiceXML is multi-lingual, but there’s no auto-translation. You have to specify.

 

Slides 49 & 50:  [Script of the Tellme demonstration] Let’s dial up the demo to see what we can do with the Blue Pages. It’s very much like the Utah demo. Mr. O’Connell proceeded through a canned demonstration of the blue Pages directory. I want to show you one more quick demo.

 

We have a consumer portal by the way. We get millions of calls through it. It allows us to improve the recognition capability. It’s a lot of fun. It’s just like any other portal on the Web, but it’s by phone. It just kind of gives you an example where you can further automate an application where the user might not have to speak ANY language. It’s kind of a fun application.

 

[Mr. O’Connell dialed in briefly.]

 

Are there any questions?

 

Unidentified member:  What was the phone number for the site you mentioned that you can personalize?

 

Mr. OConnell:  1800-555-TELL, or go to tellme.com. You can sign on, get a password, and get a menu you can choose from. That’s all the services that are available on our consumer voice portal. It’s just like a Yahoo interface. You’ll have to log in the first time. It’ll build your favorites. You can dial in thereafter and get your favorites. By the way—a great feature of it? Wake up calls.

 

Unidentified member:  If I set up my account by telephone, can I go to the website and it’ll recognize me?

 

Mr. Brown:  Yes.

 

[Mr. Niemann called attention to a VoiceXML book written by Mr. O’Connell.]

 

Mr. Royal:  The mixing of XML script within the XML documents—is that part of VoiceXML?

 

Answer:  Yes. It uses XOScript.

 

Mr. Royal:  It seems as if it makes sense to do it externally

 

Answer:  Yes, and many customers do.

 

Mr. Royal:  Are you using SOAP?

 

Mr. Niemann:  In the early configuration—and then people can choose what they want.

 

Mr. Royal:  I was hoping you were using SOAP so you could call it a Web Service.

 

Mr. Niemann:  They’ve implemented conceptually the standard stack of Web Services. It not only provides Web Services, but high level collaboration of material across all the Web content. It’s considered a unique product in the distributed Web content space. Not all of the standards in the standard stack are finished. Most vendors implement their own standard until the full standard is finished, and say they’ll then implement the full standard. We probably won’t go too far in implementing an interim standard.

 

 

End presentation

 

 

Mr. Ambur:  OK, the last presenter is George Brett. George?

 

While we’re waiting for George, with respect to Craig Brown’s presentation, my involvement with Blue Pages came about earlier under the Clinton administration. Al Gore initiated the Blue Pages about eight years ago. The idea was to implement by commonly understood terms rather than office title. I became the coordinator for my agency and became very frustrated with the process. The Fish and Wildlife Service has about 40 major functions we perform. Maintaining that data set was a very cumbersome process. I was driven not by technology, but by a business requirement. That’s how I came to write that paper. I’m very excited with the VoiceXML to see the potential, and see the technology enabling a much more efficient business process.

 

 

Mr. George Brett:

 

“Some Thoughts on XML and Collaboration”

 

Mr. Brett:  The net isn’t live. I’ve changed my title several times.

 

Slide 2:  [Graphic of the transition from the paperbound to the paperless environment] I wanted to hop into the “way back’ machine. I’ve been involved with microcomputers since about 1981. “Hyper” was available about 1991. The notion of information space was limited to Digital. We did have “Newtons” back then—the early versions of PDAs.

 

Slide 3:  [Second “Ancient history” slide.] The graphic in the middle is the Japanese logo for kudzu. Back then the Internet was growing like a weed, and we had mixed feelings about that. I set up a group that looked at it. We were looking at wide area information servers. The underlying standard was Z39.50 search and retrieve. Everything went into the magic black box. There were many services involved—www, gopher, Z39.50, email, etc.

 

My magic black box, which we haven’t yet achieved…at the left end is the source. Just to the other side are the search engines—the interface to the service. In the middle is the black box, on the right are the servers, and on the far right are triangles, which are the clients.

 

In my work in a library, some of those items had to be output on paper, or voice. I’ve been after this kind of resource since then. I’m still waiting. I’m getting more excited though. The problem, though, is that even with Internet2 is when I first started it was the knowledge base —the performance improvement environment (PIE).

 

Slide 4:  [Performance Improvement Environment] We want to collect like stories. I’d like to start with anecdotes and grow them into either business cases or technical studies. What are stories? Oral histories. We have many partners in Internet2 and we want to point to what they have. The point is collaboration.

 

Slide 5:  [E2Epie slide]

 

Over the last couple of years, I’ve been working on this three-pronged notion. We do get our oral case study. How many times do we have to keep repurposing information?

 

Slide 6:  [Communication Channels] [The channels are:

Email, Web, forum, groupwork, realtime, and hard copy.] Why couldn’t we do it once that would fit all six of these communication channels?

 

Slide 7:  [Web/blob/notebook] Susan and I have been looking at this recently. Some folks say, “I need a bridge between where we are and where we’re going.” I call it a notebook.

If you look for BLOG, they talk about it as the new distributed publishing channel. This is kind of where the Web is going. You have reporters and editors, and participants can comment on what was published. They tend to be easy to maintain. There’s a small footprint. I’m going to show you an example I downloaded onto the server. It was about 200 K. With that one core, I can have multiple uses. It’s XML friendly. We can syndicate or do varied output.

 

Ms. Turnbull:  From the Web log it’s available for some of the screen readers.

 

Mr. Brett:  I’ve been working since 1991 on searching across multiple domains. The idea with libraries was do a catalogue search and hit multiple libraries. We did a project with the Patent and Trademark Office. They only had eight lines in. We showed them how using the Internet they could do multiple deliveries and have varied kind of output. I also think this will increase the lifespan of content. We can reach back to the legacy information as well.

 

Slide 8: [My Interest in XML] I’d like to show you quickly the management page and then the rest of it. I will make the presentation available, and there are links available in the presentation as well. Are there questions while I’m pulling this up?

 

In this time frame, my group hasn’t always been the IRS. One group was the Global Schoolhouse. From 93-96 we were doing video conferencing.

 

[Mr. Brett displayed a “coexpedition log” web page.] Here’s the coexpedition log. I’m shifting gears because I don’t have my own machine. Pardon the pauses while I figure out where I’m going now. Here’s the management page—and remembering that this was all generated from less than 200K of text.

 

[Mr. Brett displayed multiple links]

 

I have all this right out there, but most of the world doesn’t know about it because it’s not associated with search engines.

 

[Mr. Brett displayed a page of authors, clicked on author link, and showed a list of activities associated with an author (Susan Turnbull). This shows that Susan is the Super User for that blog.

 

[Mr. Brett displayed another web page.] One of the things we’re coming to understand is that you can cut and paste and it amounts to a two-part entry.

 

[Mr. Brett went back to the coexpedition log, on which he had created a new entry.]

Typically with Web logs you a have calendar showing you when items were posted. There are entries, associated links, and related projects. You can syndicate the sight and this is where XML comes in, What I’m thinking is that if we syndicate other sites, would it be possible if you use a portal or main page that’ll give you current activities in subgroups, using XML. I’m thinking about that with Internet2, there are some things I’d like to draw from other areas, but I don’t want to daily or manually update, so this is a good use.

 

Ms. Turnbull:  I was getting ready for a meeting, and in just a few minutes I did the hyperlinks to these resources and I was familiar with the material for the day. I was able to email relevant information to the people so they would be abreast of what was happening.

 

Mr. Brett:  This is all open-source, by the way. This search engine is provided by Atomz. It’s about five lines of code, freely available. All told, my installation time has been less than five hours. Thank you very much. I appreciate your time.

 

Mr. Ambur:  Are there any other questions or comments? If not, we'll draw the meeting to a close. If you didn't sign the attendance list and you'd like your presence recorded, please do so. Next month's meeting will focus on Web Services interoperability standards. With respect to the syndication of content, we have tentatively scheduled a special forum concerning PRISM, ICE, RSS, and NewsML for July 18. [Editor's note:  The forum has been confirmed. It will be cosponsored by the IdeaAlliance and

will be held in the GSA auditorium.] With reference to George's comment about search engines, FIRM has tentatively scheduled a forum concerning the automated classification of electronic records and the use of XML metatags to improve search quality and precision. The details haven't been firmed up yet, but we'll let you all know when they have been. [Editor's note: FIRM's forum is planned for May 29 at USDA's Jefferson Auditorium.]

 

Last Name

First Name

Organization

Adams

Susie

Microsoft

Ambur

Owen

Interior-FWS

Billups

Prince

DISA

Campbell

Richard

FDIC

Clarke

Art

Tellme Networks

Dalecky

Selene

GPO

Eng

David

EPA

Johnson

Denise

QRC

Marr

Timothy

Lockheed Martin

McKeever

David

i4i

Morgan

Jane

GSA

Niemann, Sr.

Brand

EPA

Niemann, Jr.

Brand

Tax Analysts

O’Connell

Greg

Tellme Networks

Roache

Eddie

LMC

Royal

Marion

GSA

Saiya

Jim

FormatData

Shaw

Georgia

Am. Cty Bank

Smith

Jane

Fenestra

Thunga

Ronjeeth

Humanmarkup.org

Turnbull

Susan

GSA

Tyler

Sean

Microsoft

Weber

Lisa

NARA

Weitland

John

NMIMC

Williams

Kevin

Blue Oxide

Yee

Theresa

LMI