Minutes, xmlWG, March 14, 2001

Federal CIO Council XML Working Group
Draft Meeting Minutes, March 14, 2001
American Institute of Architects Board Room

Please e-mail corrections to these minutes to Laura Green of LMI at lgreen@lmi.org.

Mr. Owen Ambur, co-chair, convened the meeting at 9:00 a.m. at the American Institute of Architects. Mr. Ambur introduced teleconferencing to the meeting and said it was available on a first come first served basis with a maximum of 15 callers. Attendees utilizing the teleconferencing capabilities were asked to access the presentations at http://xml.gov/presentations.htm.

Attendees then introduced themselves.

Announcements

Daniel Bennett announced a conference scheduled for April 25, Congressional Applications of XML (COAX). It will include Legislative Branch, Government Printing Office (GPO), and other agencies’ government documents. The sponsors invite representatives from the Executive and Judicial Branches to attend. It includes lunch and vendors such as Lexis-Nexis and KPMG will be present. The full conference announcement is available at http://www.citizencontact.com/legalxml/conferenceletter.htm. Contact Mr. Bennett directly at daniel@citizencontact.com for additional information.

Presentation: "What is XML Schema?"

This presentation is available on-line at http://xml.gov/presentations.htm in PowerPoint or HTML.

Mr. John Evdemon of XML Solutions, McLean, VA conducted the first portion of the presentation. Mr. Evdemon’s presentation concentrates on the World Wide Web Consortium (W3C) Extensible Markup Language (XML) Schema, Schemas relationship to Document Type Definitions (DTDs), and concludes with some information on Namespaces. Mr. Evdemon kicked off the meeting by inquiring, "How many people are currently using XML?" About half the audience raised their hands. Then he asked if anyone was using XML Schema and two people raised their hands.

W3C XML Schema

XML Schema is an information modeling language for XML developed by the W3C. XML Schema is one of many schema languages (e.g. TREX, RELAX) that use XML as their syntax. This presentation concentrates on the W3C XML Schema model. Mr. Evdemon explained the various uses of the word "schema".

Schema (uppercase) - refers to the W3C XML Schema
Schemas (uppercase and plural) - refers to one or more XML vocabularies (e.g. xCBL, CXML, BizTalk, IFX, etc.
schema (lowercase) - refers to an information model; can be a DTD or Schema; may also be a relational database schema

The W3C XML Schema has all the functionality of DTDs plus some additional features, one of which is data typing. Schema for Object-Oriented XML (SOX), developed by Commerce One, and eXternal Data Representation (XDR), developed by Sun Microsystems, were developed in order to encompass data typing into schemas. XML Schema incorporates ideas from both of these languages and it allows the user to create custom tags (e.g. USPostal Code or International Postal Code). XML Schema is well-formed and supports more complex data types, whereas DTDs do not support data typing.

XML Schema is not yet fully recommended by the W3C. It is still risky to implement; but it should be used for prototypes and now is a good time to start learning it. The Candidate Recommendation is comprised of the following parts:

XML Schema Part 0 - Primer
XML Schema Part 1 - Structures
XML Schema Part 2 - Data Types

[Editor’s note: two days after this presentation, the W3C released XML Schema as a Proposed Recommendation.]

XML Schema compared to DTDs

DTDs enforce a structure with element and attribute names. XML Schema enforces both structure (content model validation) and data types (data type validation). Content Model Validation checks order and nesting of elements (similar to today’s DTDs). Data Type Validation checks the element for valid type and range (e.g. month element must be between the numbers 1 through 12). While DTDs see everything as a string with the exception of enumerated attribute lists. Schema can extend existing data types and can help offload some of the coding that is currently done in other places.

XSD is a namespace. Schema allow individual nodes in an XML document to be associated with type declarations in a schema. This allows namespace integration -- to ensure there will be no namespace collisions. DTDs don’t have namespace functionality and can only be associated with one schema.

XML Schema format is much more verbose than that of DTDs. DTD parsing can be difficult because of the complexity of DTD syntax whereas syntactic rules and full support of data types are embedded in the parser for Schema. Multiple levels of validation occur when using Schema. The Oracle XML Parser and Xerces by Apache are the two parsers that are currently the most compliant with W3C XML Schema. Schema saves money on development costs because the developer can offload simply logic onto Schema rather than other areas of coding- leaving simple validations up to Schema and allowing developers to concentrate on business processes. DTDs won’t go away overnight. They’re the only W3C approved standard.

The primary Schema components are: XML namespaces, element declarations, attribute declarations, simple type definitions, and complex type definitions. Type definitions are reusable structures and are used to create aggregate structures.

Secondary Schema components are: model group definitions, attribute group definitions, identity-constraint definitions, notation declarations, wildcards, and annotations.

Group definitions are for reuse. Include or Ignore sections are not available in Schema as they were in DTDs.

Namespaces

Namespaces were built to make sure elements and attributes are unique across different XML vocabularies. Schema has a two part naming convention: first the local name and then the Universal Resource Identifier (URI) of the namespace. Because of the second part, people tend to want to type the URI into their browser and resolve it. The only function of the URI is to ensure uniqueness of element names and attributes; it may not be capable of being resolved. Namespaces are needed for semantic level integration across languages and organizations. They help to avoid potential mismatches (e.g. different uses of the element name part number). The following tools support the XML Schema Candidate Recommendation from October 24, 2000 (with some limitation).

Parsers - Xerces by Apache and Oracle’s XML Parser
Schema Editors/Validators - XML Spy, XML Authority, XSV (XML Schema Validator)

[Editor’s note: Brief synopses and direct links to product Websites for these tools and other products is available at <http://www.xmlsoftware.com/.]

Presentation: "Applied XML Schema Design"

This presentation is available on-line at http://xml.gov/presentations.htm in PowerPoint or HTML.

Mr. Alexander Falk, CEO and President of Altova - the XML Spy Company, made the presentation and gave the live demo. XML Spy is an XML document and schema editor with XML Schema capabilities. It provides a graphical representation of XML Schema components, its elements, and their content models. XML Schema can define the specific number of repetitions unlike in DTDs. XML Spy graphics are based on the Universal Modeling Language (UML). Attributes and attribute groups have no predefined order; they are in a list rather than in a tree. Complex types and substitution groups are a result of Schema’s influences by Object Oriented Programming paradigms.

Schema have the ability to restrict data types by pattern. A pattern can be a regular expression. Schema uses the same regular expression as Perl. XML Schema compliant parsers will enforce the pattern and developers won’t have to worry about pattern mapping. The Schema developer can specify a group of elements that can represent another element using a substitution group (e.g. substitute VIP for person and your XML document will still be valid).

Developers really wanted the ability to accommodate both a UK and a US address by specifying a complex type called address and defining a second complex type derived from the first complex type. They also want the ability to build very complex and reusable models as well as use existent data types. These functions are available in Schema.

There are several options for Schema development. A Schema can be developed:

from scratch.
by example from Use Cases, existing XML documents, or databases.
by conversion from existing Document Type Definition (DTD), Document Content Description for XML (DCD), Biztalk (XDR) schema or other schema. Conversion methods include:

by hand
using available Perl script (which doesn’t support the latest XML Schema), or
utilizing a Schema editor (e.g. XML SPY, XML Authority, etc.)

Live demo of XML Spy

This tool supports previewing, editing Extensible Stylesheet Language (XSL) or XSL Transformations (XSLT), connecting to repositories, converting XML, and is a good prototyping tool. Free evaluations are available at xmlspy.com. Data can be viewed in a nested format. XML Schema is a collection of components and a sequence of elements. Spy shows the data type and allows users to manipulate the data in a Graphical User Interface (GUI) format or plain text view. Complex data types can be used in multiple places and can all be manipulated in a GUI format and then viewed in plain text. Annotations, which give extra information to software or people, are also supported in GUI and plain text. Spy creates repository ready documentation.

Ms. Yee asked if elements could be easily changed (e.g. could US address easily be changed to international). Mr. Falk said yes and illustrated an example from the W3C Primer that is included in XML Spy.

Mr. Richard Campbell asked, "Can you combine multiply Schema into one?" Mr. Falk responded yes, use import and include namespace to so this. Mr. Falk added that this is also shown in the Primer.

Mr. Walt Houser asked if XML Spy plans to implement deltav (Internet Engineering Task Force (IETF) specifications) or the W3C digital signature recommendation. Mr. Falk responded that WebDAV has already been implemented and versioning will be available in the next version of Spy. Spy can associate and add files based on signatures and the W3C digital signature will be supported when it is recommended.

Mr. Ambur was concerned that Mr. Evdemon had said XML Schema should not be used in production systems and asked what the work group should be advising as far as Schema is concerned. Mr. Falk agreed with Mr. Evdemon but added that many people are already using it. The benefits are available and organizations need to move ahead. The W3C Proposed Recommendation should be pretty stable and big companies are supporting it. Many tools already support it. One goal is to develop vocabulary and he suggested that you should start doing that with XML Schema now. Mr. Evdemon says XML Solutions recommends their internal consultants to use Schema if the process is just getting started.

Mr. Mark Crawford commented that standards bodies are currently doing work based on the W3C XML Schema Proposed Recommendation.

Mr. Houser asked, "How do you handle error trapping? How about array (x) data types in the Schema?" Mr. Falk responded that when it comes to DTD validation the document is either valid or not. The Schema process is different. The XML document instance is first validated to see if it is well-formed and then the Schema validation occurs. In Schema validation, compliant post schema validation set has an augmented view of the validation set with more information; it says whether each component is okay.

Mr. Houser asked how does Spy trap errors when using XSL? Mr. Brian Cole responded Schema can detect if an error tag is valid if it is PCDATA. Mr. Evdemon added different parsers show different errors (e.g. Xerces verses Oracle). Post validation schema can vary quit a bit whereas core syntax group validates information into a set that reflects the low level primitive type that you see in XML. The main thing is how to include support for parameter entities. Post Schema validation would return errors as if it were part of the Schema. Many people are looking to W3C for a standard. Therefore, error support and handling varies between vendors.

Mr. Houser asked, "Is there a specification?" Mr. Evdemon replied that there is SAX (not supported by W3C) and DOM APIs, but again there are differences between vendors.

[Editor’s note: SAX is the Simple API for XML an event-based interface and DOM is the Document Object Model a tree-based interface.]

Break

Mr. Ambur indicated that unless additional corrections are needed the February 14, 2001, minutes will be adopted. If anyone has updates, please communicate them to Laura Green at lgreen@lmi.org.

Presentation: "Formatting XML Records for Capture and Management in DoD Certified Electronic Records Management Systems"

This presentation is available on-line through xml.gov in PowerPoint or HTML. Mr. Jim Capparelli of BroadVision gave the presentation. Mr. Capparelli introduced the following company’s responsible for the demonstration and their components of the presentation:

XML Authoring Tool (i4i)
XML component version control (BroadVision)
Classical document management (FileNET)
Records management (Provenance)
Personalized Web delivery (BroadVision)

This product is a content and content management tool rather a business-to-business (B2B) data exchange tool. Where is the source of truth: paper or electronic documents? There are many business challenges facing eGovernment such as:

a sea of information
inadequate content quality, accuracy, consistency, timeliness (the information morgue)
high turnover of information
current Web site is static and difficult to navigate
people just won’t use it.

BroadVision’s biggest federal sites are USPS and GSA. In 1997, the DoD 5015.2 standard was introduced and has since been endorsed by NARA for civilian agencies. This is a standards-based process using: end-to-end content creation, content management, record management and delivery solution. The Navy was the first customer to buy into SGML, making it easy for Navy to move from SGML to XML. The Web is the number one way to deliver information today. XSL is used to transform XML to whatever format you want. An XML repository uses a DTD in a database and creates the model for how the information will be used.

Mr. Capparelli showed the workflow for a request to update OMB Circular A-130 using MSWord as the XML authoring tool (utilizes i4i products). Using MS Word to create the XML is easy for users because of its common look and feel. Mr. Capparelli opened the A-130 XML document and demonstrated how the user can turn tags on and off in Word. He made changes to the XML instance and then saved it. FileNET’s Panagon Content Services and Panagon IDM Desktop provide the foundation for this demo. The Panagon repository:

shows the document version (production, official)
can lock-in version control at the element level- as opposed to today were you have document management in files which can get messy
provides the capability to highlight document and the "compare" feature will do an analysis of the changes that have been made to the document
lets the user check documents into the repository to start version control.

The BroadVision tool is connected to the repository. Nasra Sakran from FileNET conducted the next segment of the presentation. How do you manage content once the document is official? The user adds it to the repository. This is done by going to Windows Explorer and then to the FileNET Neighborhood. There are five levels of security. The user adds the document to the FileNET Library, to the filing record, and finally extends the appropriate security measures.

Mr. Capparelli resumed the presentation and said once the document is part of the official record it can be distributed to the masses. He said it doesn’t use cookies, but there wasn’t time to go into the details about that. You login as a particular user (i.e. xml.gov user) to see the document in Hyper Text Markup Language (HTML). If you login as a different user and a different view is available. This was a demonstration of the versatility of XML’s feature of separating format from content. Mr. Capparelli showed another example of this by demonstrating how a user could see the OMB Circular on a cell phone, Palm, etc.

Mr. Ambur said everything that exists is a record and that agencies need to get over the notion that some records are not "official" records. Any information that exists is subject to release under a FOIA request or subpoena. Publishing and providing access to information are value-additive extensions of the basic requirements for the management of records throughout their full life cycles. Accessibility, search, and retrieval can be tremendously improved -- not only on the Internet but also on user's own hard drives -- by the application of a logical taxonomy represented in XML metatags.

Mr. Royal had a technical question on how elements change dynamically. Are they created in a database? Mr. Capparelli answered they are not part of the original document; it creates a new instance with the changes.

Mr. Royal asked does it take advantage of metadata. Mr. Houser asked what is the protocol on objects for access control. Mr. Capparelli said Oracle 8i is the underlying database. If LDAP is required, that’s a customization; add users to database using login and password.

Mr. Houser asked what about public key. Mr. Capparelli said it’s not out of the box, that would require customization.

Mr. Houser asked if the level of security was only passwords and ids. Mr. Capparelli said read write manage know access; if a user is writing another application they can use an API (DOM) that usually happens behind the firewall. Mr. Houser responded most attackers are behind the firewall; behind a firewall is a little less secure. Mr. Marty Heinrich from Impact Innovation said third party signature supports PKI (Public Key Infrastructure) integration. Mr. Houser asked, "Have you made it work?" Mr. Heinrich suggested exchanging cards. Mr. Houser said I don’t see an improvement over MS Security control. Mr. Capparelli said XML repository side is records management rather than an official document. Ms. Hilda Johnson said the US Patent and Trademark Office has bundled i4i's product with a PKI solution from Entrust. Digital signatures are required for patent applications. Without them, the applications are not accepted..

Mr. Bill Morgan asked does it have an audit trail. Mr. Capparelli answered audit logs are available through the Oracle database. Fleet, Walmart, and other customers have third party security that the outside world is hitting.

Mr. Rollin Ridgeway from NASA asked if the presentation and contact information was available on xml.gov. Mr. Capparelli responded that all contact information was available on xml.gov except BroadVision’s.

Mr. Houser asked if this had been presented to NARA. Mr. Dan Jansen responded he represented NARA.

Mr. Royal stated that PKI may not be right and there’s a need to look more at digital signatures, especially if you want to sign at a section level.

Mr. John Milligan invited everyone to a presentation on National Archives April 5 by BroadVision on developing electronic archives. All are invited. It’s only $10 for a full morning and early afternoon that expounds on today’s presentation.

Mr. Ambur announced that the Defense Logistics Information Service (DLIS) has one slot in next month’s meeting and the other slot is open. He asked for suggestions on filling that spot and suggested possibly an update on ebXML. Mr. Crawford suggested an update on the EWG /X12 Core components initiative that will continue 19-23 March 2001 in Tyson’s Corner at the EDIFACT Work Group (EWG) meeting. Mr. Houser suggested an update on the 18-23 March 2001 IETF meeting in Minneapolis. Mr. Crawford also suggested Mark Nobles of LMI who is a member of the W3C Security working group could present an update on W3C Security efforts.

[Editor's note: In addition to DLIS, at the April 18 meeting the Defense Information Systems Agency (DISA) has been scheduled to make a presentation on lessons learned in their registry/repository initiative.]

Ms. Amie Milan of KPMG noted that the XBRL consortium has responded to the issues raised in DFAS proof of concept. [Editor's note: The consortium's response will be posted at http://xml.gov/documents/completed/xbrl_consortium_responses.htm.]

Mr. Michael Palmer announced a meeting at KPMG Consulting in Tyson’s Corner, 12 floor, on April 4, from 1:00 -4:00 p.m. It is open to anyone with an interest in XBRL.

XML Working Group Attendance List
March 14, 2001

Last name	First name	Organization
Baldwin	Nathan	DISA
Benedict	Robert	NASA
Bennett	Daniel	CitizenContact.com
Billups	Prince	DISA
Burnstein	Michael	Sybase
Campbell	Richard	FDIC
Capparelli	Jim	BroadVision
Cole	Brian	NSF
Crawford	Mark	LMI
Dingman	Gail	USPS
Drest	Dan	BroadVision
Fitzpatrick	Bob	Filenet
Glace	Jessica	LMI
Gluck	Daniel	OPM HR Data Network
Heinrich	Marty	Impact Innovations
Hoglund	Bruce	DLA (Consultant) JECPO
Horton	Sherry	DLA
Houser	Walt	VA
Hunt	Jim	GSA
Jansen	Dan	NARA
Johnson	Hilda	i4i
Knight	Conrad	USPS
Laing	Grant	Intellor
Leverson	Steve	US Courts
Milan	Amie	KPMG
Milligan	John	CSC
Morgan	Jane	GSA
Niemann	Brand	EPA
Palmer	Michael	KPMG/XBRL
Poot	Lex	DTS
Ridgeway	Roland	NASA Headquarters
Royal	Marion	GSA
Sakran	Nasra	Filenet
Saldo	Paul	Novient
Sinisgalli	Mike	XML Solutions
Todd	Mike	OSD
Tulley	John	i4i
Veith	Lisa	Impact Innovations G
Vineski	Steve	EPA
Walker	John	KPMG Consulting
Walker	Patrick	Impact Innovations G
Warrington	Earl	GSA
Yee	Theresa	LMI

Cathy Fliter of the Department of the Navy and Bernise ? teleconferenced into the meeting. Others joined via teleconference for portions of the meeting but did not have the opportunity to identify themselves. [Editor's note: If you participated by teleconference, please advise Laura Green <lgreen@lmi.org> so that she can note your participation here.]