Federal CIO Council

XML Working Group

 

Wednesday, May 21, 2003 Meeting Minutes

 

GSA Headquarters

18th & F Streets, N.W, Room 5141

Washington DC 20405

 

Please send all comments or corrections to these minutes to Glenn Little at glittle@lmi.org.

 

Mr. Owen Ambur:  Well we might as well get started. I think there may be some people who can’t  make it because several other groups are also meeting today. I think most of you know each other but let’s go around and introduce ourselves to each other anyway.

 

[Introductions]

 

Mr. Ambur:  Marion [Royal], you said you could talk for hours on the subjects of UBL [OASIS Universal Business Language Technical Committee http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl)] and ebXML [Electronic Business using Extensible Markup Language (http://www.ebxml.org/)], so we’ll be interested in hearing what you have to say in the hour allocated to your presentation this morning.

 

Mr. Marion Royal

General Services Administration

ebXML & UBL Updates

 

Mr. Royal: Well I’m not going to talk for an hour. I’m going to be very brief. I know we have Marc [Le Maitre, of OneName (http://www.onename.com/)] for the Registry group [Federal CIO Council XML Registry Project Team (http://xml.gov/agenda/rrt20030521.htm)], but I’m going to breach the DRM [Data Reference Model http://www.feapmo.gov/feaDrm.htm)] arena in a little bit.

 

Starting with UN/CEFACT [United Nations Centre for Trade Facilitation and Electronic Business (http://www.unece.org/cefact/index.htm)], they’re going through an identity crisis right now. The problem, and again this may be premature, is that CEFACT has had difficulty with lawyers at the U.N. for various reasons. They can’t get the support they need to function. They have regular meetings, and collaboration tools and resources. They’re looking at a couple different options. Some concern a lot of us, and some of us are encouraged. For some, what discourages them is that they’d just become another consortium. Another is that they are looking at other international standards groups to become a part of. One is ISO [International Organization for Standardization (http://www.iso.ch/iso/en/ISOOnline.openerpage)]; one is ITU [International Telecommunication Union (http://www.itu.int/home/)]. Those are interesting options that are attractive to those of us who are trying to create international standards using the expertise at the CEFACT group, because a lot of the top people in the arena participate in that standards organization. The group I participate in, called the E-Business Architecture Team, resolved all the comments on the previous release of the architecture document at our last meeting. We put out another version for public review, and we’ve received no comments. That means we’ve resolved all concerns, and it means we can go forward to full release. The architecture document originated in ebXML. UN/CEFACT went through and updated it, and removed syntax-specific language like XML, because CEFACT is dealing with XML and EDI syntax and others down the road, so they’re trying to work at a higher level.

 

The UBL group at OASIS [Organization for the Advancement of Structured Information Standards (http://www.oasis-open.org/home/index.php)] had a face-to-face meeting three weeks ago. The team I’m on—the Library Subcommittee—went through the comments of the Op70 release, and resolved comments from certain members of the team. For example, the modeling people had comments, and we resolved those. We learned a lot of lessons going through the process. That’s the most rewarding thing coming out of UBL. I’m quoted in one of Brand’s [Niemann, of EPA] presentations, saying I didn’t think UBL would ever be used by the federal government. I may have said that, but I should have said that UBL is a process we should all adopt. We should use the methodology and the reusable types, and organize the types.

 

I’m going to go into the UBL spreadsheet to give you a look at what the Op70 [Library Content] piece looks like, to give you a feel of where we are.

 

Ms. Theresa Yee:  Marion, is there any way we can pull that down? I went to XML.gov and I wasn’t able to find it there. 

 

Mr. Royal:  I don’t have a presentation. If you do an online search of OASIS and UBL, it should take you to the release of Op70. There’s a link there for the spreadsheets [http://oasis-open.org/committees/ubl/lcsc/0p70/xls/UBL_Library_0p70_Reusable.xls]. I’m going to open the reusable type spreadsheet. For those of you on the phone, the example on the screen is a busy spreadsheet. It’s a deep and wide spreadsheet. You’ll see that some of the rows have specific colors. I’ll go to the top here. Across the top, you’ll see the UBLUID. We decided at the last face-to-face that we don’t need a UBL ID. It’s just adding yet another universal identifier to this data. We believe we can identify the reusable types, and as they’re registered in a registry they’ll obtain an ID at that point or we can refer back to UN/CEFACT IDs for component types.

 

Let’s go to the ISO 11179 columns, where we have Object Class, Property Term, and Representation Term. We also have qualifiers between those—Representation Term Qualifier and Property Qualifier. This column says ABIE, BIE, or ASBIE, which means “Aggregate Business Information Entity,” “Business Information Entity,” and “Associate Business Information Entity.” 

 

I’m not going to go into a lot of detail—this is just an example of how these things are constructed. The first subclass is “Contact.” This Contact is to be used in these Accounts’ qualifiers. Then you have various elements within the Contact details. You just go down to Contact, because Contact on the first one is defined further down. I’m starting to drill down to Contact and “Party” Object Class, then the information within Party, such as “Name” and “Address.” These are Associate BIEs that construct the Aggregate BIE.

 

So as we build these Aggregate BIEs, whenever we identify the requirement for adding a basic BIE, we’ll do so, and if it’s used in multiple places, then we have a reusable type. So this spreadsheet contains all the reusable types—either basic or aggregated, that we’ve discovered as we’ve built the documents in UBL (Purchase Order, Dispatch Advice, and others). It’s all based on the UN/CEFACT Core Components.

 

[Mr. Royal displayed another example on the screen.] Core Components are built in a similar fashion. Here, we’ve expressed them in XML.  Using XSD, you find out what information is in the Core Component. They spent the majority of the time developing the Core Component technology specification; it tells you how to manage the Core Components. At completion, they began to define the Core Components in UN/CEFACT—not just XML. This uses W3C XSD. There are other schema tools you can use to represent the same information about these Core Component types. At UBL, we took the Core Component types we needed to do UBL work. Those components that are not at UN/CEFACT that we discuss at UBL, we intend to contribute as candidate components.

 

That’s about it. Are there any questions?

 

Mr. Ambur:  How many components have you defined?

 

Mr. Royal:  I don’t’ know. My guess is less than 100.

 

Mr. Ambur:  I know you’re involved in the FEA-PMO [Federal Enterprise Architecture Program Management Office (http://www.feapmo.gov/fea.htm)] and the Solutions Architects Working Group (SAWG). Are any of the UBL core components being used in the federal government yet?

 

Mr. Royal:  The Core Components specification is available at the UN/CEFACT website, and they have a primer to help you work your way through the specification. This will give a good flavor of what we intend to do with Core Components.

 

Ms. Yee:  It sounds like the ebXML approach is similar to the CICA X12 [American National Standards Institute Context Inspired Component Architecture (http://www.x12.org/x12org/index.cfm)] approach that I was talking about last month, in that we are reusing different parts of the standard structure. For example, the BIEs are reusable. Regarding the ebXML purchase orders—if they don’t match a federal purchase order, are we able to input the data requirements we need and create an ebXML federal purchase order, or do we have to use the purchase order established in UBL, with no flexibility?

 

Mr. Royal:  UBL tried to take a specific business example and work through basic documents. The objective was to keep it as clean and simple as possible. Because UBL intended it to be an example of how it can be done. We fully expect other examples. It’s not intended to be normative

 

Ms. Yee:  So they might have yet to be developed?

 

Mr. Royal:  Right. For example, you have an aerospace order; you could take a regular purchase order and extend it to the aerospace industry.

 

That’s it on OASIS and UN/CEFACT. You asked about the SAWG [FEA Solutions Architects Working Group (http://www.feapmo.gov/sawg_listserv.htm)]. That takes me to the Data Reference Model (DRM). We met Monday two weeks ago, and had a two-day off-site. The proposal was that the FEA DRM should consist of a basic structure in which data that will be defined can be used. The people who’ve been working on the DRM had been looking at UBL, UN/CEFACT, and X12. There seems to be wide consensus on the use of ISO 11179 as the basic structure for managing the data items. The DRM proposes using that to define the structure for data used in the federal government. All along, we started with a mandate not to create a “Gov.ml.” We were supposed to find vocabularies the government could use. There are so many, and no one could define a canonical form to be used by the federal government, so the focus is to find a data that model that can be exchanged. That’s an important distinction.

 

To the best of my knowledge, OMB [Office of Management and Budget] does not require agencies to change legacy systems to match the FEA. The intent is to exchange, agree on a structure, and identify common data types being exchanged. So I believe that what will happen is, we will agree on the ISO 11179 structure, then begin building a catalog of common data types used in exchange of this information between agencies. How it’ll be managed and who will be responsible is still up in the air. There’s a notion that we need a component registry, but there’s not agreement on what one is. Some people believe there are different levels of a component registry. I tend to agree with them. So the SAWG or DRM looks like it’s going down the same path as UBL, UN/CEFACT, and other organizations. I’m encouraged by that, and think we’re on the right track.

 

Mr. Marc Le Maitre:  Can you explain what you mean by levels?

 

Mr. Royal:  They’re only loosely identified. In my mind, we have business components. For example, Pay.gov may be a reusable component; it’s very high level. As you move down, you’ll get things like Web Services, which begin to get to technical components. I have a clear distinction of business and technical components. Others have more, like .NET and Java. So far that wouldn’t go down to reusable objects. We’re still trying to work this out.

 

Mr. Le Maitre:  Where would registration of elements and schemas fall in—low level?

 

Mr. Royal:  Those I consider to be technical components.

 

Mr. Bruce Bargmeyer:  That sounded very good. In terms of the data reference architecture, the EPA might have good useful metadata, like “address,” and so on.

 

Mr. Royal:  Yes—the challenge is to identify the best practices. We’re still trying to determine how to do that. I believe we’ll stand up a catalog of common data types, then let people shoot cannons at them, and say, “That’s not a name to me, but this is a name to me.” I don’t have to change mine, but let’s agree on what the common usage will be. If I were the leader, I would start with a very small library of reusable data types and expand on it reluctantly. I see the DOJ [Department of Justice] library has I don’t know how many types.

 

Mr. Ken Gill:  About 300.

 

Mr. Royal:  I use it as an example of the process. “Justice” has a lot of attributes that may not be needed for other government systems.

 

Mr. Le Maitre:  I agree. One of the reasons I wanted to present this afternoon is because the XRI TC [OASIS Extensible Resource Identifier Technical Committee (http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xri)] is coming to agreement with itself about how entities can cross reference using identifiers, using some things you’re proposing—like “I don’t use the same tag names for data, but we need to establish some equivalencies, because in the field they’re the same thing.”

 

Mr. Royal:  Yes, but the danger is that UDEF uses that to have hierarchical assignment.  We don’t care what the name is, but the attribute uses some taxonomy to identify. The challenge is not to fall back into the EDI world, where we were just using attributes. You begin to lose human readability, etc.

 

Mr. Le Maitre:  My topic this afternoon is human and machine readability, specifically with identifiers.

 

Mr. Ambur:  Marion, you said, “Start with a small library and expand reluctantly.” I agree and disagree.   As far as so-called “enterprise-wide components” are concerned, I agree.   On the other hand, I believe we should have a huge library that lets communities of interest register their own data elements and schemas and so that others can discover and use them in their applications as well.

 

Mr. Royal:  I agree and disagree with you. They should be used with the verticals. I’m talking about the smaller ones, the horizontals. I don’t think we’ll create one giant vertical government library; I think we’ll have a small one. Then, as you have business lines—I think the NIST [National Institute of Standards and Technology (http://www.nist.gov/)]  presentation hints toward delegation of naming responsibility. It’s similar to DoD (they call them Namespace Managers). I don’t think it’s a bad thing, because we do have emerging technology to help resolve the differences in different names.

 

Mr. Ambur:  I would characterize our mission as enabling people to do what they need to do more efficiently and effectively, rather than trying to impose order from the top down.  I hope the ISO 11179 standard provides the classification taxonomy needed for that kind of library.

 

Mr. Royal:  [Mr. Royal displayed a mockup he had devised.] This drawing is one day old, so it’s not really in the works. I just put it together to show my perspective of the different registries being discussed.

 

We’re talking about the themes for responsibility for discussing reusable components in a federated environment. We’re talking about an XML registry/repository for publishing and storing XML artifacts. Also a higher level of XML.gov to identify what’s in the government. The themes are based on the [OMB Exhibit] 300s submitted to the government, so it’s good to find out what’s there, but not to put things in there rapidly. So it’s tightly coupled with the DRM. I think we use the same structure as we put things in here. For those on the phone, I’ll give it to Owen to put on the website.

 

So what I’m showing here is a circle with an upside down Y, separating to three different segments. This contains FEAMS [FEA Management System], business components, and federated XML on the bottom. They can be combined for a consolidated registry of reusable components.

 

We have a single face. We can go to one user interface, providing access to all these registries. The federated XML registry can be broken down into these others.

 

The AIC [Federal CIO Council Architecture and Infrastructure Committee] Component Subcommittee feels they have the responsibility for approving those components that are listed, so we need to define the process for adding components. The same needs to be done with technical components. I believe the process to approve a business process should be the same as for approving technical components in the repository. These are just steps in the process.

 

Mr. Ambur:  So do I understand correctly that what’s wrong with the depiction as you have roughly drawn it is that the process would be serial, starting with the Emerging Technology Subcommittee?

 

[Editor’s Note:  The graphic depicted separate, parallel workflow routes from the Emerging Technology Subcommittee and the Components Subcommittee of the CIO Council’s Architecture and Infrastructure Committee.]

 

Mr. Royal:  No. It’s defined by emerging technology. Then, when it’s graduated as a component, it would be registered.

 

Mr. Ambur:  But Emerging Technology could propose components that would end up in the registry at some point.

 

Mr. Royal:  Yes. Now I understand your question. So that’s just my thinking of how we can rationalize these various registries and repositories. Whether that will ever happen, I don’t know.

 

Mr. Ambur:  One way or another, it will happen.  It’s just not happening efficiently now. Whether the IT innovation management lifecycle is official or virtual doesn’t matter.

 

Mr. Royal:  Yes. I don’t see it as one system. I see it as a consolidation and interface. We don’t need stovepiped registries/repositories.

 

Mr. Gill:  Who’s the customer?

 

Mr. Royal:  I’d like to say it’s the citizen, but the real customer is the project manager of EGov initiatives and other emerging projects and budget items that come up for those managers and their developers.

 

Unknown participant:  The question I’d ask is, one of the issues of a stovepiped registry and repository that we’re seeing at the State level is that some of our primary customers are State and local entities, and I’m wondering whether they’d fit it.

 

Mr. Royal:  I believe at they’re at the same place for State and local. I believe there’s still a need for independent registries, because you have a specific community of interest in [Department of ] Justice. There may be others that don’t need to go to government-wide scale.

 

Unknown participant:  That’s the importance of whatever registries are built—have some common standard people can agree to, to enable the interoperability.

 

Mr. Bruce Cox:  Even within one agency you see the same process—“What’s been approved in the budget and asked for by the customers; what resources are available, etc.” It seems that this might become a model an agency could use. Anyone building registries could use this model.

 

Mr. Royal:  I’d be happy to see it emerge as a model. I’m not even addressing who does this. I just wanted to share that with you. I appreciate the input you’ve provided.

 

That concludes my talk

 

Mr. Ambur:  So what’s the status and timeline on the Data and Information Reference Model (DRM)?

 

Mr. Royal:  The best milestone I’ve heard is “soon.” Norm [Norm Lorentz, OMB Chief Technology Officer] attended the offsite, and he emphasized that this needs to go out very quickly, so “soon” is the best answer I can give you.

 

Mr. Ambur:  I think Brand indicated that each of the reference models is going to be rendered in XML. Is that the case?

 

Mr. Royal:  I believe that was it from the start. They are on the PMO website already.

 

Mr. Ambur:  The Emerging Technology Subcommittee, to which this group reports, has been charged with developing a process whereby the IT lifecycle can be managed on a government basis. It’s a pretty significant challenge, but as far as I can see, the first step is to design a relatively simple XML schema that will structure the data by which people get their foot in the door, so to speak.

 

I’m suggesting that the step should not be a submission at all, just a rendering of a valid XML instance document on the proponent’s own website containing the data required by the schema. At that point, no commitments are being made about anything further being done with any proposed component.  It would just be an opportunity for folks to gain visibility for their proposals in a structured way.  The Emerging Technologies Subcommittee would glean a subset of the data for indexing and redisplay on its website.  What, if anything else happens beyond that would depend upon sponsorship and interest, and level of commitment by .gov folks.  I believe part of the initial schema should allow vendors and others to classify their proposed components according to the FEA.  Having the FEA models rendered in XML would facilitate their reuse as controlled vocabularies in the form folks would complete in order to propose a component.

 

Mr. Royal:  That means we have to apply the same data analysis to the reference models as we define the reference models. I’ve never particularly liked it, but people use the term, “Eat your own dog food…”

 

Mr. Ambur:  Absolutely. The way I would extend the argument is, I believe we should reuse our own data. The buck needs to stop somewhere in the sense of overcoming the NIH syndrome.  [Editor’s Note:  In this context, NIH stands for “not invented here” – not the National Institutes of Health.]  It may not be the way we would have designed the process in our minds.  For example, even though I may have quarrels with way the Business Reference Model been designed, nonetheless I believe we should reference it when we can.

 

With respect to the process the Emerging Technology Subcommittee is developing, I have encouraged engagement of the Industry Advisory Council (IAC) in designing the process.  Betty Harvey, who manages the DC area XML User Group, has expressed interest leading an IAC task group to work in parallel with the Emerging Technologies Subcommittee to design the process so that it works for vendors and integrators as well as for .gov folks.  If anyone is interested in participating in such a task team under the auspices of the XML Working Group, please let me know. I’m sure Betty would like to hear from you too.

 

I don’t think much of massive tomes that are dutifully written and placed upon shelves, documenting in great detail models of how somebody thinks in their own mind that reality should beit should behave.  Likewise, I don’t think we should spend much time developing XML schemas that won’t be used.  As soon as we have one ready to document the first stage of the process, I hope the form vendors will implement it immediately in their applications.  At this point, I’m talking about in the initial stage of the process, which implies no commitments at all.

 

One of the driving forces is that people at the top don’t have time to deal with everyone who’d like to contact them. They need a more structured process to sort out opportunities and priorities.  Even Davis Roberts, who co-chairs the IAC, indicated she is being inundated with vendors directed to her by the CIO Council folks.  She suggested she could see a benefit to integrators, who also need a more effective way to communicate with vendors.  At this point, this is just a heads-up, because the Emerging Technology Subcommittee co-chairs are going to have to decide how they want to proceed.  However, I’d appreciate any input you may have concerning how the process should be designed.

 

Mr. Royal:  I encourage you, when you formulate that, to put it on the mail list so it gets to a wider audience.

 

Mr. Ambur:  Absolutely, but before pushing these ideas to any great extent, I want to have a chance to do a reality check with the Emerging Technology Subcommittee co-chairs at the next meeting.

 

Mr. Royal:  On a different subject, I want to say that I’m encouraged by the fact that people are starting to do data analysis before defining their schemas, using a rational approach. It takes more than saying, “This element on a form equals an element in an XML schema.” The schema subteam on the forms pilot has attracted some of the experts in the arena. The discussion has been along the lines of naming and design rules, data analysis, and reusable types. This is all good. We now have a good example, like Justice’s schema. The closer we get to that, the closer we’ll be to interoperability.

 

Mr. Ambur:  Are there any other questions or comments?  If not, I have another one.  Have you had any contact with the OPM folks lately?  The last time I saw their schema, it was 76 pages of data elements that comprise the official personnel file. Have you heard anything further about it?

 

Mr. Royal:  Not in a while, but I met with the Integrated Acquisition representative.   They’ve been examining data on seven legacy systems they expect to be communicating amongst.  Now they’re at the stage where they need to be doing data analysis to define how it should look in an XML environment.  It’s a good example of thinking about it before blindly developing a schema.

 

Mr. Ambur:  My sense was, they were planning to implement a DoD system and not do a whole lot of analysis, other than looking at that system.

 

Mr. Royal:  They’d done a lot of analysis on the information in the personnel records and that retirement system, and comparing that data at an abstractive layer without concerning themselves about what the outcome would be.  That’s when they got that long list of data elements.  I don’t know what they’ve done since then.

 

End presentation.

 

 

Mr. Ambur:  Well, we’re slightly ahead of schedule. Tim [Bray], are you on the line?

 

[No response]

 

Mr. Ambur:  Where do we stand with Justice?

 

Mr. Gill:  For those not aware, the Justice Data Reference Model is about 40 days into public review for comment, on the OJP.gov [http://www.it.ojp.gov/topic.jsp?topic_id=43] website. It’s a big schema, and it takes a long time to load in XMLSpy. Anyone is free to comment. We’re collecting feedback. It varies from typos to requests for adding additional types and attributes. After the 60-day period for public comment is finished, there’s going to be a short period of time in which the Georgia Tech Research Institute [GTRI] and the practitioners will review the decisions about additions and modification they want to make.

 

We’re also in the process of identifying some evaluation projects: we decided not to use the term “pilot projects.” We have about 15 sites around the country—from regional projects to specific jurisdictions—that are funded and want to implement an XML instance of the [Justice XML Data Dictionary (http://www.it.ojp.gov/topic.jsp?topic_id=80)] 3.0 reference model. We’re going to identify five or six of these sites, and set forth some criteria to get some evaluations of performance. It’s a big schema. It’s a little difficult to build XML documents with that tool. There are some unique aspects that not all XML editors can deal with, like “IDREF,” so we’re going to put out some technical release notes.

 

We have an industry working group of 105 companies, primarily in the law enforcement domain. I met them weeks ago, and they’re going to set up some evaluation and performance test files. Set up some real world messages—driver history, motor vehicle lookup, etc.,—have their developers make them and run them through the lab, and get some performance results and best practices. Ours is ISO 11179 compliant, so tag names can be really long. We want to find some good ways to get around that, and what it does on the payload.

 

We’re also in pursuit of the “Holy Grail” of registry/repository. It includes a lot of domains. It has Intel, Defense—not just Justice in it. Practitioners in the field have told us that if we can create reference documents and post them in a registry of some sort, that’ll help the field out in terms of standardizing on those systems across the country for reuse and gain the advantages of that, and still allow people to extend that. We’re quite excited, and the feedback has been positive. Now we need to put it into action. Even the vendor community, even Microsoft and Oracle, are saying, “We’ve never used object-oriented schema before, so DOJ is going to provide some assistance to GTRI, and provide some on-site training, to help start putting this into practice.

 

Mr. Ambur:  Did I understand you to say something about conformance testing.  If so, will NIST be involved?  I understand they have been working with vendors and industry groups to develop conformance testing labs for various standards.

 

Mr. Gill:  Performance testing—for example there’s a system nationwide where, when an officer does a lookup on you, he types it to a local data terminal. The message is sent to a State switch, then it goes to a national switch in Phoenix. Then it routes to the State. The response is all proprietary, all codes. Unless it’s from a State where he’s seen it before, he has to go look it up, so we want to standardize on those code responses in XML. DOJ funded an XML switch to go in in Phoenix, to translate those proprietary codes. We want to do XML messages. We want to base it on 3.0; we want to have the smallest message possible.

 

So what are some thing s we can do with emerging technologies? Concatenate, etc. Will those be enough? So we want to look at it in the lab. There are projects in the pipeline, funded at the State and local level. Developers are seeing in RFPs [Requests for Proposal] that “This’ll be the standard,” etc.

 

Mr. Ambur:  Next month we have our first presentation by hardware vendors. Their technology is termed by some as “XML acceleration” devices.

 

Mr. Royal:  You mentioned overlaps of domains within schema. It’s not possible to create a modular approach in the Justice schema?

 

Mr. Gill:  There are two answers. If there was a FIPS [Federal Information Processing Standards] table already in XML, we reference them via Namespace, so we’re not recreating them. There are ANSI codes, for example, expressed in XML schema, but others were not. NCIC 2000 [National Crime Information Center (http://www.fas.org/irp/agency/doj/fbi/is/ncic.htm)] is a standard set of codes used by FBI and other law enforcement. It’s not available in XML schema format. We made a deal with them—“You give us the codes, and we’ll put them in XML.” If you maintain them (because they change). The question of modularity is interesting, because some of the feedback from the vendor community is unknown. We’d like to develop partial schema possibilities. I think it needs to come down the pike.

 

Mr. Royal:  The UBL approach is to create the reusable type schema, then out of those are the documents, so as you create the document, you’re dealing with more concise schemas for that application.

 

Mr. Gill:  Yes, applications absolutely. We heard that loud and clear. We GTRI to evaluate the best way to get tools to do that. We might create something homegrown in the short term, with an eye to other tools out there, then of course have those registered as components as well.

 

Mr. Royal:  The other thing is that you begin to have long names. It depends on global versus local elements. A lot of times we have it expressed in the library name, which is the full name, then in UBL we have some rules which can do some truncation—like we might have some details that can truncate. We started by identifying Representation Terms based on the UN/CEFACT Core Component specification, then we began using the Representation Term qualifier as a means of getting that term into the name. Then discovery was that the BIE should not have a qualifier, so you just add to the name and have a two-word Representation Term instead of one word. Those are the lessons we’ve learned.

 

Mr. Gill:  In the last version, we didn’t use fully qualified ISO names, but for consistency in this release, we wanted to be as consistent as possible. One other message that surprised us—they wanted stability. They didn’t want version 3.0, then 3.1, 3.2, and so on and so forth because the ultimate goal was to do 4.0 with RDF [Resource Description Framework] and other technologies. The vendors say they need to be stable for at least a year. It’s an interesting message that surprised us.

 

Mr. Ambur:  Marion, with respect to lessons learned on UBL and ebXML, are some appropriate to include in the [Federal] XML Developer’s Guide?

 

[Editor’s Note:  The draft guide is available at http://xml.gov/documents/in_progress/developersguide.pdf.]

 

Mr. Royal: Absolutely, and seek input from Justice as well.

 

Mr. Ambur:  Do we know where we are on updating and consolidating it with work done by EPA and the Department of the Navy?

 

Mr. Royal:  I believe that’s tabled now, because of a number of things. I think it was your quote which said we’ve achieved a lot just by having the draft out there.

 

Mr. Ambur:  Yes, I don’t want to get tied up with the process of producing a “final” or “official” document, so long as people are getting the value they need from the draft.   However, I do wonder whether there are any volunteers who want to integrate new and additional lessons learned into the draft.

 

Mr. Royal:  I’d welcome volunteers.

 

Mr. Ambur:  Tim, are you there?

 

Mr. Tim Bray:  [On the telephone] Yes.

 

Mr. Ambur:  We’ve just finished updates from Marion and Ken Gill of Justice, so we’re going to take about a 10-minute break, and then we’ll be ready for you.

 

Mr. Bray:  I’ll be here.

 

 

Mr. Tim Bray

Antarctica

RDDL Makes Namespaces More Useful

 

Mr. Ambur:  OK Tim, the floor is yours.

 

Mr. Bray:  I’m happy to be there. I assume there are one or two people in the room I know, so howdy to them. It looks on the agenda as if I have a whole hour to talk about this. I think that’ll be more than we need. Since RDDL [Resource Directory Description Language] itself is fairly small, I think that means that, should someone want to go interactive, it would be just fine. Let me dive into this. The next slide is the Roadmap.

 

Slide 2  [Roadmap]:  This is why we cooked up RDDL, what it is, what its real purpose is, and then a forward look.

 

Slide 3  [The Problem]:  So when we wrote the XML recommendation in 1999, the idea was fairly clear that we were trying to make names unique, so I could have a title tag, and you could, and they wouldn’t collide. The title tag from Tim Bray’s company would be distinct from others, and it would make markup unambiguous. A quick way to do it was to associate tags with URLs [Universal Resource Locators]. Since you can’t put URLs in a document, we came up with a prefix-mapping trick. I assume people are familiar with URLs and prefixes, right?

 

Mr. Ambur:  Yes.

 

Slide 4  [Q: Should a Namespace URI Point at Anything?]:  There was a glaring lack of consensus on what kind of schema language we should use, thus the namespace recommendation has wiggle words in there that “it is not a goal that the namespace URI be used to retrieve a schema.” Then we put out the recommendation in 1999, and it was rapidly adopted. I’m not talking out of school when I say that Microsoft was pushing us on getting this done, because they were bringing XML products to market. The point was that this was a name disambiguation facility.

 

About 15 seconds after the namespace recommendation hit the market, everyone said, “What do these point at?”  We said, “Nothing.” They said, “No, really, what do they point at?” This went on for about a year. The point is that when people see an XML namespace name, they see a URI, etc. This forced people in the community to address the question, “Should a namespace name point to something?” Overwhelmingly, what people had put at the end of namespace names were HTML documents explaining what this namespace was about. People liked this, saying “This is the browser, it points to a DTD [Document Type Definition], etc., so the conclusion that arose in the community is that it’s clearly the case that, while a namespace name need not point to anything, it seems to be pretty useful, and there’s a clear notion of when something is on the Web. It’s on there if it has a URI, and it’s not there if it doesn’t have a URI, so it appears that the cost of having a namespace document is small, and the benefits are substantial, so there’s no good reason not to have something where a namespace URI points. So now I’ll refer to these as namespace documents.

 

If we acknowledge the possibility that a namespace URI should point at something, then what should it point at?

 

Slide 5  [Q: What Should a Namespace URI Point At?]:  We get a lot of different answers. The first thing would be a schema. Historically, <!DOCTYPE> declarations  pointed to DTDs. That created an expectation. So if we buy the notion that it’s good to point at a schema, then which kind of schema? W3C alone has three supported schema languages. There’s RelaxNG (http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=relax-ng), and other guerillas in the weeds. There’s also the fact that many popular XML dialects have more than one schema. The most known is HTML, which has three. So pointing at a schema is problematic, but schemas aren’t the only things you might want to point to. You might  use a stylesheet, but I might ask, “Which kind of stylesheet? CSS1, CSS2, CSS3, XSL, etc.” So the notion of stylesheet is questionable.

 

In the object-oriented world, we have the notion of packaging and sending things off together. So we might want to point to some computer code that might do something with it at some level. That might be terribly important. It’s obviously the same with schema and stylesheet.

 

Last, but not least, is human-readable documentation. It’s a common occurrence these days that developers encounter a new namespace they haven’t dealt with. How do they deal with it?

 

Maybe the most important thing to deliver about a namespace is a human-readable document. So pretty much all these are going to be on the Web, so you can think of them as resources. Given an XML namespace, it’s clear that there are going to be related resources. So the answer to what it should point at is, “All of the above.”

 

Slide 6  [Resource Directory Description Language]:  RDDL is meant to provide a way to use a namespace name to point at human readable documents and other stuff.

 

Slide 7  [RDDL Credits]:  This has been floating around for a long time. It all blew up in late 2000, on the XML-dev mailing list. When people tired of saying, “They’re just names,” and “I don’t believe you,” that’s when RDDL got cooked up.

 

Unknown Telephone Participant:  What about URLs vs. URNs?

 

Mr. Bray:  URLs don’t exist anymore. If  you ask a programmer to write the code and retrieve documentation on URLs, it’s expired. There are only URIs now. Often people mean “URI” in the HTTP space, but they could mean FTP [File Transfer Protocol] space, etc.. The notion of URL is not well defined anymore. I’m going to talk about the pros and cons of URNs later.

 

Mr. Royal:  That’s when I’ll have questions.

 

Mr. Bray:  RDDL development led by a fellow named Jonathan Borden—a leading neurosurgeon who does XML protocols by night, writes codes that allow someone in Boston to work on a brain in New York, etc. 

 

Slide 8  [RDDL History]:  The first version was based on XLink. There were two complaints about it. One was, it was ugly syntax. One was, people said, “We should use RDF.” So we looked at a way to use RDF. We didn’t sweep the primaries and ride to victory. The W3C technical architecture group got into this. I’m a member of the group, and one of the first items we took up when we formed in 2002, which was “what should a namespace document be?” is still there on our “to-do” list. Recently there’s been a proposal for a minimal form of RDDL, which seems to have a lot of traction. We’re having discussions now on how it should be blessed. “Should it be a note? An advisory?” They’re meeting right now in Budapest on what form to take it to W3C in. I would guess there’ll be some form of RDDL specification coming out of the W3C.

 

Mr. Royal:  What’s the time frame?

 

Mr. Bray:  It’s hard to say. With their release of Office 11, Microsoft introduced XML format for all files except PowerPoint. Specifically for Word, they released about six different vocabularies. They have expressed a desire to use RDDL, so the market pressure is pretty high.

 

Slide 9  [RDDL Goals]:  There are three goals for RDDL. First of all, the major goal is to provide a place to learn about a namespace. It should be human-readable. There’s a lot of feeling that the most important function a namespace can provide is to give us a place to learn about it. The third goal is, software should be able to use. It.

 

Slide 10  [Human Readability]:  The second goal is easy. We put them in HTML to make them human-readable. Here’s an example. It says: “An XML Schema fragment is available which constrains the syntax of xml:lang and xml:space.”

 

Slide 11  [Machine-readability]:  So the way RDDL works is, an RDDL document is XHTML. The only thing RDDL adds is two attributes on the HTML <a> element. They’re in a different namespace, called http://www.rddl.org. When you want to point to an RDDL source, use an HTML element, and put those two attributes on it.

 

Slide 12  [Example With Source]:  So here’s the source code behind what I showed you. Ordinary HTML, except for the two attributes I mentioned before, so if you look at the URI in the nature, that is the namespace name for the XML schema. So you have a pointer to something; you know it’s an XML schema, and you know it’s designed to be used for schema validation.

 

Mr. Ambur:  It strikes me that the notion of adding attributes of “nature” and “purpose” is significant, but it’s going to take  me awhile to think through the ramifications.

 

Mr. Bray:  I have a few more slides on nature and purpose. Let me go to them. Two slides forward [Slide 13 was a title slide, titled “Natures and Purposes”] we do the nature of a related resource.

 

Slide 14  [The “Nature” of a Related Resource]:  It describes what an object is: if it’s XML, use a namespace name; if it’s not XML, we use a URI built out of the media type, because anything built on the Web has its own media type. So if you want to point to something, it’s not hard to pick the right nature. There’s a predefined list of well-known natures available at the RDDL URI address, so the nature is easy.

 

Slide 15  [The “Purpose” of a Related Resource]:  Obviously, you can have more than one related resource of the same nature, so we also have the purpose. If the only related resource is of type XML schema, it’s obvious it’s there for validation, so you don’t need a purpose. There’s a file called “”Purposes” that has a ton of precooked purposes.

 

Mr. Royal:  Who operates RDDL.org? What’s the sustainability?

 

Mr. Bray:  That’s a good question. It may be the case that if the W3C takes on RDDL (RDDL.org is owned by Jonathan Borden), it might be good for W3C to take it on. It’s not necessary for it to be there for this to work.

 

Mr. Royal:  But if it’s established in your documents, there may be a need for longevity.

 

Mr. Bray:  For nature and purpose, there’s no requirement to use one from RDDL.org. The way the software is used, you fetch a RDDL while using namespace name, and say, “Is there a W3C resource available that does the validation?”

 

That’s RDDL. Just these three slides. There’s not much to it. Microsoft plans to use it for Office, so it’s guaranteed to have widespread market acceptance.

 

Slide 16  [Where To From Here?]:  So where do we go from here? I think the feeling of the community (and I’m not saying it’s a consensus) is that people are moving in this direction.

 

Slide 17  [If You’re Building an XML Language]:  We recommend that anyone building an XML language give it a namespace. Where it gets controversial is in using a URL rather than a URN. The first thing to do is, make a human-readable description of what the namespace is about and put it there. When you create an RDDL file, describe it as a resource directory, so if you’re creating a namespace name, give us the human-readable document, and as the namespace is available, add it to the directory. I tend to feel it’s important to go to a namespace and get some handles.

 

The whole thing of URL vs. URN has lots of varying opinions. Let’s look at the pros and cons. The advantage of a URL is, it can be dereferenced. Most people have dereferencing code on their computer. The disadvantage is its limited longevity. Notice that when the W3C creates a namespace name, they always use an HTTP URI, and it always has a year as part of it. The reason is that even should the W3C go to something else, the chance of a collision is very low. The feeling is that, if the W3C is going to take dramatic steps to publishing, there should be a commitment involved. So longevity is not a function of technology, but rather an attitude.

 

URNs have the disadvantage that few people have software that can dereference them.

 

Mr. Le Maitre:  Tim, URL or URI versus URN is becoming an increasing battle. Are you familiar with the work I’m involved in OASIS to create XRI to make both possible--dereferenceable, as well as a persistent identifier?

 

Mr. Bray:  Only a little.

 

Mr. Le Maitre:  Given your obvious understanding of the issues, maybe I could ask you if we could take this past a cursory visit and get your specific thoughts on the work underway there. We’re past the requirements and starting the syntax now. Now might be the right time to give us your thoughts on it.

 

Mr. Bray:  My position is apt to be relatively conservative, given the fact that existing software that can dereference URIs in the HTTP scheme is deployed on desktops numbering in the hundreds of millions. My attitude is that before you decide not to use it, you better prove you can’t.

 

Mr. Le Maitre:  The sheer number says it can’t be wrong. Our problem is to resolve issues of persistence and human readability, given semantic drift of language. The target is not to replace URI, but to extend it out to solve problems preventing adoption of URN namespaces.

 

Mr. Bray:  I’m convinced that persistence is a social function. The combination of an assertion and a management policy on the part of a person owning the DNS [Domain Naming System] behind it. Once a name becomes well known, the W3C could go away and it wouldn’t matter.

 

Mr. Le Maitre:  It’s the notion of relative persistence, like in database portals. Persistence is assigned, but not global--in that you use HTTP URI in the first place.

 

Mr. Bray:  A determination was made that relative URIs should never be used to name a namespace name. The whole point of a namespace is that it should be globally unique.

 

Mr. Le Maitre:  I disagree. If you’re talking about namespace, I agree that global uniqueness is important, but for a lot of existing websites, there’s use of relative persistence, where you have local uniqueness, but it’s used elsewhere in a different context, so we’d like your thoughts on what we call an XRI--so I can reference something you think is unique in your namespace.

 

Mr. Royal:  Tim, your comment that there are millions of resolvers out there for URIs or URLs doesn’t hold water when you consider the number of applications that need to resolve a namespace. In fact, it’s a limited set of tools and applications attempting to resolve these namespaces. I’m sure you looked at the draft namespace recommendation from this group, which uses URNs, but provides a method to resolve that URN. I don’t think the idea that browsers use URLs is an important factor.

 

Mr. Bray:  There’s certainly no consensus on the issue. Many, including me, believe the most important mechanism to resolve namespace names are humans--namely coders that have to figure out how to use the namespaces. The ubiquity of devices that can be used to retrieve something that’s in the HTTP space seems important.

 

Mr. Royal:  But the RDDL specification provides a mechanism to resolve these namespaces.

 

Mr. Bray:  The fact you can doesn’t mean you must. If you’re going to publish a namespace, you want to be careful about creating an expectation that anybody at runtime can dereference your schema. People who build, typically don’t dereference at runtime. There’s usually a huge effort to make it available locally, because it’s too much of a hit at runtime. I think your idea is that RDDL creates a heavy load on namespace. I agree, but I think it’s something that can be worked with.

 

Mr. Royal:  I agree, but there are other mechanisms to identify the location of a schema.

 

Mr. Bray:  What are those?

 

Mr. Royal:  Target schema.

 

Mr. Bray:  Embedded in the schema?

 

Mr. Royal:  Yes.

 

Mr. Bray:  There’s lots of disagreement on whether it’s safe to embed it in there. I agree with you. I think it’s OK for a document to use a target, etc., but there are others who do not want to do that.

 

Mr. Royal:  I’m sure there’s a methodology for caching the schema as well.

 

Mr. Bray:  I agree. I have a personal web log that gets 30,000 - 40000 hits a day. The stylesheet is fetched a couple hundred times, because the caching takes care of it.

 

Mr. Royal:  You mentioned Microsoft earlier; they’re using URNs.

 

Mr. Bray:  They changed their mind. If you look at the betas for 2003 coming out now, the new namespaces for Word and Excel use HTTP URIs. 

 

Mr. Royal:  Do you see anything wrong with our namespace recommendation providing the pros and cons of both methodologies, for example “If you’re using URNs for a unique assignment, this is a way, and if you’re using URIs for a dereferenceable namespace, this is how you’d do it?

 

Mr. Bray:  I can’t understand why you’d ever make a namespace that’s not dereferenceable, so if you can use HTTP, you should.

 

Mr. Royal:  I expect that it provides a way for using URNs in our recommendation.

 

Mr. Le Maitre:  Adds a way of delegating each segment of a URN to the next authority, rather than predefining all iterations of an iteration in a single point. 

 

Mr. Bray:  In answer to your question, I do have a bit of discomfort, because in my advice to people, it’s always incumbent upon me to point out that you’re a better citizen of the Web if you use HTTP URI space rather than URN. There’s also the issue that getting URNs registered can be slow and bureaucratic. For example, you said Microsoft used URNs, but in fact they didn’t, because the URNs were never registered properly.

 

Mr. Royal:  Our intent was to register them properly.

 

Mr. Bray:  If you’ve ever done that, it’s a bureaucratic nightmare.

 

Mr. Royal:  I’ve talked to authors of the RFP, and the Department of Commerce would be the owners of the  DNS.

 

Mr. Bray:  Right. There’s no consensus. I think I’m in the majority, but there’s no consensus.

 

Mr. Royal:  In looking at your slides, RDDL can be maintained separately from namespaces anyway.

 

Mr. Bray:  The whole notion of looking up things by nature and purpose has many potential uses.

 

There are a couple more slides.

 

By the end of this month, we’ll have revised the RDDL draft to cleanup any shortcomings. At the meeting in Budapest, they’re taking up the issue of how to move forward. I don’t want to be too cynical, but when Microsoft gets behind something, it makes a difference.

 

Mr. Ambur:  I have to do some assimilation and thinking before I can ask intelligent questions, but we do have folks like Ken Gill and his Justice colleagues who have urgent needs to define namespaces.

 

Mr. Bray:  I recommend URIs in your HTTP space.

 

Mr. Gill:  Just looking up our data reference model and the namespaces we imported, they’re all HTTP. Do you have any use case scenarios on your website or elsewhere, describing your thoughts or others’ about applications of this technology?

 

Mr. Bray:  We’ve presented at various conferences. If we skip ahead two slides to References, the first is the most useful, because it points to an “issues” list, which summarized a lot of issues swirling around the subject.

 

Slide 19  [References]:

     W3C TAG Issue on Namespace Documents:

http://www.w3.org/2001/tag/ilist - namespaceDocument-8

     Theses on Namespaces:

http://www.textuality.com/tag/Issue8.html

     Latest RDDL proposal:

http://www.textuality.com/xml/rddl3.html

 

Mr. Royal:  The folder names behind the “.com”--is there any standardization on that?

Mr. Bray:  Just the general practice of making sure you have a year fairly high up in the path.

 

Mr. Royal:  If there were standardization, one could browse and look for these resources.

 

Mr. Bray:  The idea is, you shouldn’t have to, because the namespace name points to them. I think there is wide consensus in community. Just make sure you get a date in the string.

 

Mr. Royal:  Date or version.

 

Mr. Bray:  I think “date” is safer. If your company is in the private sector, it might not be there next year but the language might be. There’s general consensus that it’s probably a bad idea to encode version information in a namespace name. It’s a whole can of worms. When I’m prefacing an HTML image element, do I really care which version it is? In many cases, I do not want it.

 

Mr. Royal:  I agree on major versions, but by putting the date in there, you’re versioning already.

 

Mr. Bray:  I’ll buy that.

 

Mr. Gill:  This time we explicitly made sure we put versions in our release, so I’d be very interested if you could provide some additional feedback on the downside of that.

 

Mr. Bray:  If someone will send me an email, I’ll bounce back the discussion on it. There are two levels of granularity:

·          Is it HTML or XBRL?

·          The other is, in most languages, the semantics don’t change much from version to version, so putting version levels in namespace names may be asking for trouble.

 

Mr. Ambur:  Tim, you said you didn’t think we could talk for hour about RDDL, but we’ve come pretty close. I could talk about it longer if I were more educated about the implications.

 

Mr. Bray:  I care about this, and you folks are an important gathering when it comes to this and going to report back to the TAG [W3C RDDL Technical Advisory Group].

 

Mr. Ambur:  We’ll have to decide what to do with our namespace policy draft.

 

Mr. Royal:  I’d like to keep the dialogue open on that.

 

Mr. Bray:  I’ll certainly want to continue talking about it.

 

Mr. Ambur:  Are there any other questions or comments?

 

Ms. Yee:  Thank you very much, Tim.

 

Mr. Ambur:  I’m going to try to better understand the nature and purpose attributes.  If there are no other questions for Tim…

 

Mr. Royal:  I have one more question. I’d like to know whether you think it’s a good idea or not: we have an XML.gov area that belongs to us, with the purpose of advancing XML in the government.  I can see Owen’s wheels turning in his head.  He’s probably thinking of other things we might use for “purpose” and resources that we could locate in the XML namespace.  Do you think it’s a good idea to begin building on intrinsic government purposes in this area?

 

[The teleconference connection became disconnected.]

 

Mr. Ambur:  I will be thinking along those lines …

 

Mr. Royal:  Lets’ dial back in…

 

[The teleconference connection was restored.]

 

Mr. Royal:  Tim, are you still there?

 

Mr. Bray:  Still here.

 

Mr. Royal:  Did you get the question?

 

Mr. Bray:  Yes. The question is more social and political than technical. I suspect XML.gov may not need to exist that long.  For example, we no longer have ASCII.gov. I suspect it might be the same with XML.gov.  On the other hand, if you put the year in your namespace name, it might not matter.

 

End presentation.

 

 

Mr. Ambur:  Bruce [Bargmeyer], are you still there?

 

Mr. Bargmeyer:  Still here.

 

Mr. Ambur:  Do you have anything you want to report?

 

Mr. Bargmeyer  I think it’s complete. We haven’t gotten the website part of it, but other than that, it’s pretty complete.

 

Mr. Ambur:  Does anybody else have anything?

 

If not, then we’ll call adjournment. I just want to mention that we have the registry team meeting at 1:15 this afternoon, also in this room. With that , thank you everyone.

 

End meeting.

 

 

 

Attendees:

 

Last Name

First Name

Organization

Ambur

Owen

FWS

Bargmeyer

Bruce

LBNL

Bellack

Dena

LMI

Billups

Prince

DISA

Bray

Tim

Antarctica

Cox

Bruce

USPTO

Gill

Ken

DOJ

Jacek

Steve

PureEdge

Le

Dyung

NARA

Le Maitre

Marc

OneName

Miller

Michael

Antenna House

Napoli

Frank

LMI

Pittman

Ken

BAE

Royal

Marion

GSA

Weber

Lisa

NARA

Yee

Theresa

LMI