XML Working Group

Meeting Notes

June 16, 2004


The meeting was hosted by SiloSmashers at their office in Vienna, Virginia.


Owen Ambur announced that the next meeting, scheduled for July 21, will be hosted by the Merit Systems Protections Board. (The draft agenda is available at http://xml.gov/agenda/20040721.htm) He also expressed hope that the first stage of the emerging technology life-cycle management process may soon be implemented. (The draft schema for stage 1 is available at http://xml.gov/working_group.asp#et) Finally, he noted that OMB has initiated a number of new line of business (LOB) initiatives and that XML schemas are implicit in all of them. (OMB’s announcement is available at http://www.whitehouse.gov/omb/pubpress/fy2004/2004-07.pdf)


Ken Sall briefed the group on his involvement in applying XML guidelines to the eTravel project and the Integrated Acquisition Environment (IAE). Steve Hamby noted that version 2.0 of the Navy’s XML guide was issued on Monday, June 14. Ken pointed out that the proposed XML namespace policy is controversial and indicated he did not know if anyone is using it. UN/CEFACT has leveraged ISO 11179 in ebXML core components. He also referenced the OASIS UBL initiative and recommended Eve Maler’s tutorial, which is available at http://www.ibiblio.org/bosak/ubl/tut/csw-xml-for-ebusiness.ppt or http://www.oasis-open.org/committees/ubl/info/ubl-tut.ppt)


Ken noted that UBL provides a lingua franca for business information that is common across different vertical industries. With reference to the graphic on Ken’s slide at http://xml.gov/presentations/silosmashers2/IAEeTravel_files/slide0442.htm, Owen Ambur pointed out that UBL can play a key role in realizing the President’s vision of making all eGov applications “citizen centric” – specifically by implementing common data elements across all of the eGov LOBs, rendering those elements in XML, and making those elements and the schemas in which they are used readily available for discovery, retrieval, and use in an XML registry.


Ken reported that the eTravel project entails 385 elements. ISO 11179 was not followed in identifying those elements. Three vendors will be delivering XML schemas in support of the project. The hope is to align with the Open Travel Alliance. Ken has developed a set of XML schema evaluation criteria based upon the Navy’s checklist. The IAE project is performing UML modeling, adhering to ISO 11179, and may use UBL. Ken noted the distinction between transactional versus validation schemas, with the former being looser than the latter. Version 1 of the IAE included 450 data elements and encompassed 5 shared systems whereas version 2 includes 1300 elements and 20 systems.


Joe Chiusano asked about the format in which the data elements are being documented and someone else asked more specifically about lessons learned with respect to the use of databases rather than spreadsheets. Ken indicated the IAE switched from a spreadsheet to a database and that the database containing the 1300 elements of the IAE encompasses 44 columns, highlighting the complexity of the data.


Ken’s presentation is available at http://xml.gov/presentations/silosmashers2/IAEeTravel.htm and http://xml.gov/presentations/silosmashers2/IAEeTravel.ppt


John Richards introduced the presentation on the specification of an XML schema for tabular data. He has challenged vendors to produce tools implementing the schema and three of those tools will be previewed at an event planned for August 18. Agencies can acquire and pay for those tools through a shared-savings program leveraging the discount offered by GPO for the delivery of publications in valid XML format. The result of John’s initiative is a government-owned DLL for the table model that can be freely shared and used.


Ed Schulke indicated the initial requirement was for a tabular solution for legislative documents. Joe Carmel added that they had started with the CALS model but needed to extend it to meet the requirements for those documents. Ed noted that their job was made easier by the fact that CALs is well-known to XMetaL. Selection of a template automatically populates the necessary codes. John Richards interjected to request that anyone who has other requirements beyond those for legislative and Federal Register documents should convey those requirements to him for inclusion in the project. Joe noted that the tool is in production in the U.S. House of Representatives. Brand Niemann, Jr., asked if there have been problems processing thousands of rows of tabular data in DOM. Ed responded that the application currently works with 100 rows at a time. However, Joe indicated the next release of XMetaL will address this issue, bringing the time required to renumber 3000 rows down from 30 seconds to about 3 seconds. John concluded by reiterating that the software component will be finished in September and will be government-owned, so it will only be necessary to pay for it once in order to reuse it in as many places and as often as needed.


Ed’s presentation is available at http://xml.gov/presentations/dscs/tablexml.htm and http://xml.gov/presentations/dscs/tablexml.ppt


Following the break, Angela Drummond, CEO of SiloSmashers, welcomed the group to their facility and invited us back whenever we may wish to use their facilities again.


Mike Champion provided an update on the XQuery standard. He noted that only 20 percent of an organization’s data is typically stored in highly structured form and, thus, suited to SQL query and retrieval. Since multi-trillions of dollars have been invested in SQL databases, they are not going away, but because they don’t have the notion of hierarchy built in, they are not well suited to XML. XQuery allows for strong data typing but also for treating XML as text. It is actually an XML programming language, not just a query language, thereby enabling not just standardization of data but also access to data. It is appropriate to apply SQL to the 20 percent of the data that is highly structured while using XQuery for the other 80 percent.


A SQL-XML effort has been initiated to extract and render SQL data as XML but its author has called it “barely a good start.” XSLT and XQuery are both based on XPath 2.0. XSLT has similar capability to XQuery but is too difficult for most people. XQuery is more usable for most people. Data can be transformed from one type to another using XQuery. Microsoft has indicated it will use XQuery rather than XSLT. Technically speaking, XQuery is the W3C’s last call working draft status but that understates the fact that it is backed by 5 years of solid experience. About 1000 comments have been received and it will take time to work through them, but in the meantime, there are already 31 implementations and all of the major databases will support XQuery. Test suites have been developed by NIST, W3C, and Bumblebee, which is an open-source initiative.


A new category of tools is emerging, using XQuery for data integration. XQuery 1.0 is read-only and has no capability for insert, update, or delete (data manipulation) but vendors have built those capabilities into their products. The W3C recommendation may be finalized by mid-2005 and the XQuery API for Java (XQJ) will follow shortly thereafter. XQuery 1.1 is currently in the requirements draft stage and will incorporate update and full-text query capabilities. However, the latter is particularly challenging to standardize due to internationalization complexities. Additional requirements include GroupBy, exceptions, duplication elimination, Web Services invocation, thesaurus, taxonomy, and ontology guided queries.


Mike concluded with a reality check while reiterating the potential of XQuery as an XML integration platform. His presentation is available at http://xml.gov/presentations/softwareag1/XQueryUpdate.htm and http://xml.gov/presentations/softwareag1/XQueryUpdate.ppt


Steve Hamby briefed the group on the interactive electronic training manuals (IETM) project for

the Navy’s aircraft carrier manuals. The manuals have been deployed on CDs, which is good for classified information. A client-level database is included on the CD so the application can be run without the need for a database client on the PC. Both text and structured searching are provided. The images are SVG so they can be searched as well. Complex links are supported via XLink, making it possible to pull in multiple chunks at once. Structure is also provided for multiple versions of the same content. Among the problems encountered were discrepancies between the Navy’s IETM guidance versus its XML guidance. Land-based access is provided via the Navy’s intranet. Due to time limitations, Steve was unable to demonstrate the application. His presentation is available at http://xml.gov/presentations/softwareag2/navymanuals.htm and http://xml.gov/presentations/softwareag2/navymanuals.ppt


Barry Schaefer noted his company’s involvement with a NavAir and NavSea project using Abortext’s Epic authoring tool running on Software AG’s Tamino database. One of the unanticipated benefits was that by choosing to stay in a native XML environment, different installations who were unaware of each other and did not plan to do so inadvertently put themselves in position to share data efficiently and effectively.


Ken Sall concluded the meeting by offering those who wished to do the opportunity to use SiloSmashers’ capability to vote on the importance of the top 20 opportunities that had previously been identified for potential pursuit by the xmlWG. The results of that tally are provided below, in descending order.


Ballot Item

5

4

3

2

1

Total

Mean

STD

n

Create strategy for government wide adoption of XML

7

5

1

0

0

58

4.46

0.66

13

Identify best practices

6

4

3

0

0

55

4.23

0.83

13

Standard XML implementation guidance

5

4

3

0

0

50

4.17

0.83

12

Develop project plan for XML registry

6

2

2

1

1

47

3.92

1.38

12

Review & incorporate commercial & gov standards

2

5

5

0

0

45

3.75

0.75

12

Define procedures for XML registry

3

5

2

2

0

45

3.75

1.06

12

XML for long-term preservation of records

4

2

5

1

0

45

3.75

1.06

12

Agency pilots (include B2G & C2G)

4

4

2

3

0

48

3.69

1.18

13

Input to government wide strategy for interoperability

4

2

5

0

1

44

3.67

1.23

12

Determine how XML WG can add value

4

3

2

2

1

43

3.58

1.38

12

Identify process for tracking/sharing standards efforts

0

6

5

1

0

41

3.42

0.67

12

Leverage existing Federal/agency data standards

0

5

6

1

0

40

3.33

0.65

12

Participate with voluntary consensus standards bodies

3

3

2

2

2

39

3.25

1.48

12

Training opportunities, including free, Web-based

1

5

3

4

0

42

3.23

1.01

13

Procedures for distributed content management

1

3

5

3

0

38

3.17

0.94

12

Renew charter & include GAO recommendations as roles

1

3

4

4

0

37

3.08

1

12

Verbiage for inclusion in contracts

3

1

4

2

2

37

3.08

1.44

12

Demonstrate registry interoperability w/State/local/private

1

2

6

2

1

36

3

1.04

12

Map standards to FEAF & ensure government involvement

0

3

7

0

2

35

2.92

1

12

Subgroup to draft policy

0

1

7

4

0

33

2.75

0.62

12


Approximately 30 people participated in the meeting. Among those in physical attendance who registered their presence on the xmlWG meeting roster were:


Owen Ambur, Co-Chair

Jared Andrews, LMI

Mark Anstey, DSCS

George Albor, Software AG

Donna Blades, FTC

Joe Carmel, U.S. House of Representatives

Ed Chase, Adobe

Ron Cote, SiloSmashers

Mike Champion, Software AG

Steve Hamby, Software AG

Chris Kupczyk, LMI

David McKloskey, GPO

Roy Morgan, NIST

Connie Morris, EPA

Vicky Niblett, SAIC/NASA

Brand Niemann, Jr.

Thomas Peters, SiloSmashers

Donna Rickett, EPA

Sol Safran, IRS

Edward Schulke, DSCS

Kenneth Sall, SiloSmashers

Barry Schaefer, X.Systems

Valeria Voci, DSCS

Mike White, NARA/Federal Register


Among those participating by teleconference were:


Rex Brooks, OASIS, HumanML & WSRP TCs

Dave Meyers, Microsoft

Russ Ruggiero, Human ML


Please convey any additions or corrections to Owen_Ambur@ios.doi.gov