Federal CIO Council

XML Working Group

 

Wednesday, March 19, 2003 Meeting Minutes

 

GSA Headquarters

18th & F Streets, N.W, Room 5141

Washington DC 20405

 

Please send all comments or corrections to these minutes to Glenn Little at glittle@lmi.org.

 

[Editor’s note: In informal discussions before the meeting, Mr. Owen Ambur mentioned Mr. Marion Royal’s unavailability due to a scheduling conflict, and several phone participants introduced themselves.]

 

Mr. Ambur:  We might as well get started. We generally start with introductions. I’m Owen Ambur, co-chair of this Working Group. As I mentioned, Marion Royal, my co-chair from GSA, won’t be able to be with us because he’s at a Solutions Architects Work Group meeting.

 

[Introductions all around]

 

Mr. Ambur:  We have a good turnout—better than I would have expected since there are at least four other meetings taking place that some of our participants may need to attend. For the people on the phone, some of you have already introduced yourselves, but why don’t you go ahead and introduce yourselves again, and it’ll also provide a chance for any who have just joined us to introduce themselves.

 

[Additional phone-in introductions]

 

Mr. Ambur:  OK, then let’s get started. Matthew, the floor is yours.

 

 

Mr. Matthew McKennirey

Conclusive Technology

“Trustworthy” Electronically Signed XML &

Data Confidentiality in Electronic Processes

 

Mr. McKennirey:  First I’d like to thank you for the opportunity to talk to you. As Owen said, there were a lot of other possibilities today. For those on the phone, every once in awhile I’ll give you a slide number for what we’re covering here in the room.

 

Slide 2  [Agenda]:  I’m going to talk to two aspects of XML security—digital signature and encryption. For digital signature, “How does one sign an XML document so that it’s consistent with guidelines from NARA [National Archives and Records Administration] and DOJ [Department of Justice] for trustworthy records?” Then, in XML encryption, “How can one have secure workflow when XML documents are moving between users, organizations, and domains, and maintain security and confidentiality?

 

Slide 3  [Presentation Resources]:  For those of you on the phone, the presentation is available at Conclusive’s website [http://www.conclusive.com/home.jsp] or at XML.gov.  [http://xml.gov/presentations/conclusive/trustyXML.htm]

 

Slide 4  [Part 1 Objectives: GPEA Requirements]:  With the elimination of paper, we know we need electronic records that have to be signed. The guidance is there from DOJ and OMB [Office of Management and Budget]. This looks at the guidance from NARA to build documents as closely as possible to their guidance for trustworthy records in XML, so… [skip to Slide 5]

 

Slide 5  [Approach to Today’s Presentation]:  During the presentation, we’ll create a document in a workflow scenario where someone is filling out a form and going through a process where people sign it and check signatures. We’ll sign it using standards from W3C [World Wide Web Consortium] and PKI [Public Key Infrastructure]. It’s important to remember that the parties in the demo have certificates from different sources

 

Slide 6  [Characteristics of a “Trustworthy” Record]:  From NARA, what we need for a trustworthy document is, the content must be an accurate representation of what happened, it must be authentic, and it must be usable. Even if an XML document is readable, it might not make much sense, particularly for signature and encryption, where it is impossible to look at it and know if it’s valid.

 

Mr. Ambur:  There’s a raging debate as to what is and is not considered to be a record under the Federal Records Act.  That debate is not particularly productive.  I would ask people, when designing IT systems, if they’re not designing them to produce information having the attributes outlined in ISO 15489, which of those attributes do they intend to ignore?  And how do they justify spending the taxpayers’ money on systems producing information that lacks those attributes, that is, the attributes of a record?  My other point is, in the context of the Federal Enterprise Architecture and the Component Service Reference Model, the smallest component that makes sense in any business process is a record.  It should be the aim of all IT systems to create records having sufficient degrees of the four attributes outlined in ISO 15489.

 

Slide 7  [Preserving Trustworthy Records]:  There are three concepts that are key for the rest of the presentation, so I’ll spend a few seconds on them. With respect to NARA, there are three parts of a record that you have to preserve: one is content. On our graphic, we have a piece that’s content, which should be signed. Then there’s the context—what was going on when the transaction occurred with respect to whether the record is a trustworthy record of what happened. “What happened, who were the parties, what role did the user have, what authority is cited?” etc.. There may be any number of attributes that have to do with context. It depends on the application as to what’s captured. Finally, there’s the structure. I’ll refer to it as “presentation.” It’s the rendition of the document—the users’ experience. “What did they hear or see?” They may have an interactive form, paper, or voice response. The context is XML, but the rendition is different. It’s as important as the context.

 

Slide 8  [Elements of a Trustworthy XML Document]:  In the XML we create today we will give it a structure that corresponds to the NARA guidelines. We have the three parts—content, context, and presentation. Inside the content, we’ll have a signature, and because we’re doing encryption today, we’ll have encrypted data.

 

Slide 9  [Context]:  Drilling down into context, you might capture the user’s ID and a time stamp, session ID, who issued the certification, what application was running, who provided authentication, etc.

 

Slide 10  [Sign Context]:  Because the context is critical, we’ll sign it as well. It’s not required, but it’s useful. It helps with session integrity. To structure the document, we’ll put in a header and put the context in the header.

 

Slide 11  [Content]:  The next element is content. We’re creating a wrapper for the actual document. For today’s presentation, the document is called “SecureForm.” Any code could be in there. The content will be encrypted. In terms of preserving content, as we thought about what has to happen, a couple things seemed clear. They don’t always happen, but they’re worth talking about briefly.

 

Slide 12  [Sign Content]:  The first one is to ensure that the content is signed. We’re aware of projects where digital signature is used, but a token—or part—is signed. The document isn’t really signed. The content should be signed. Make sure the means of signature conforms to a recognized standard. There’s lots of stuff going on—proprietary file formats—it’s not recognized that there’s a standard approach. A signature that does not conform to a recognized standard may not have any validity.

 

If PKI is used, make sure the signature is valid. Again, there’s a certain amount of signing going on, but no one’s been checking whether the certificate has been revoked or was valid at the time. Make sure the signatory knows what they’re signing. I’ve been asked to sign things in applications where I have no idea what I’ve signed. Sometimes I sign and can’t find out. We need to capture why the person is signing. In the European directive, it’s required that if the digital signature is to be valid, the reason the person signed is captured. Is it to say, “I saw this, I tested it, and it’s true?” Or, “I saw another’s signature?” etc. Often when you sign legal documents, all the above are true. That has to be captured. The person can say, “I didn’t agree to it.” The signatory should only sign what the signatory intends to sign. It’s not happening in all cases. The signatory may only wish to sign the data they’re providing. If you have an event where they’re signing for someone else, it’s very awkward. Make sure that the signatory knows who else has signed it. It may affect whether I want to sign. Those are self-evident. They happen in the every day environment.

 

Mr. Ambur:  Are you going to talk any more about the separation of content from presentation?

 

Mr. McKennirey:  Yes, we’re going to get to a lot of it

 

Slide 13  [Presentation]:  It’s critical that the presentation or rendition of the document is as the user experienced it when captured. If you can’t create that, then you have a serious problem with validity.

 

Slide 14  [Separation of Presentation and Content]:  Today we’re going to do an interactive form that we can generate a printed document from—but applications now are using multiple renditions of technology in the same document. One example is a healthcare presentation where the content is a prescription. The physician’s experience is on an interactive PDA [Personal Digital Assistant] or workstation. The pharmacist’s experience is a printed document. The payer’s experience is an [ANSI] X12 document generated from the pharmacy chain. All are valid renditions of XML, but one has to be sure that the correct rendition appears for the users. The trust record of the pharmacist’s actions is the same content, but different rendition and context, so the three we have to keep in mind have to be consistent. That’s what makes it a trustworthy record.

 

Does it make sense with respect to forms, to separate the presentation from the content? We’ll show you one way today. I think it’s an efficient way. I’ll come back to how it would happen. In an XML form, you have a presentation element.

 

Slide 15  [Presentation Integrity]:  For those familiar with signature or encryption, you’ll recognize some elements—URI [Universal Resource Indicator], DigestMethod Algorithm, DigestValue. It’s the same approach as with signature and encryption. When they sign the encrypted document to be external to the XML document, they allow for the XML at the presentation layer to be external to the document. We’re hashing that external document. At run time, we verify the hash so we know it’s the presentation that belongs to this piece of content. The whole thing is signed so that when you reference the URI in the application, it’s also been signed by the user and can’t be changed. You know you’re dealing with the presentation that was used at the time.

 

Slide 16  [Preservation: NARA Guidelines]:  Now we have all the elements; the next step is preserving it over time. There are two approaches:

1.      Either document the entire process, or

2.      Have the means to revalidate.

They’re not mutually exclusive.

 

Slide 17  [Preservation Requirements]:  We’re suggesting doing both—revalidate the signature and a good deal of the context. We’re trying to ensure document integrity over time. That hasn’t been modified. We also worry about the integrity of the sequence of events that happened beforehand. That’s as important as internal integrity. You have to capture the integrity of the presentation, and you have to be able to reproduce it the way the user experienced it—whether voice response, printed paper, or interactive form. Last bullet—obviously it makes sense to avoid things that aren’t standard.

 

Ms. Glenda Hayes:  Are these your objectives, or regulations, per DOJ?

 

Mr. McKennirey:  These are our objectives.

 

Slide 18  [Integrity of Preserved XML]:  You have metadata that describes the document, you have the document itself, and you do some encryption to ensure it’s not modified. You hash it. It’s tamper-evident, so if anything’s changed, you’re aware of it. At any point in time, if it says the document’s not changed, you can be certain it’s not changed.

 

Slide 19  [Usability of “Preserved” XML]:  The interface is the usability argument. “How can we experience this document?” It allows us to experience the XML, or it allows us to verify signatures on the XML, or you can ask it to be reproduced in the forms it was created in. In this case, it’s an interactive form, so we’ll recreate it. We can also view it as an audited record, where it throws in the metadata.

 

Slide 20  [Demo: Signature and Preservation]:  I’ll do a quick demo, then come back to encryption.

 

Slide 21  [Demo]:  We create a document. The scenario is a simple payment request and issue process. We request a payment, and get it signed and issued, then someone approves it. It’s very simple.

 

Slide 22  [XML Form in HTML]:  Today, as far as the presentation is concerned, we’re using XSL and JavaScript, and a standard commercial browser. The JavaScript is to present the XML and provide for interaction.

 

Slide 23  [Application of Policy Rules]:  The Policy Manager is applying rules for what users are allowed to do in the application. Some related to PKI are the issue of certifications, recognition, what assurance level is assigned, and what’s going on in the background.

 

Mr. Clay Robinson:  If you’re looking to see what role a person plays, are you looking at the certification, or are you going to the Certificate Authority [CA] and looking it up?

 

Mr. McKennirey:  The latter. The application defines the role. A person can have different roles. The application says, “We’re going to make certain of the identity. Once we’re certain, we’ll determine what roles they can have.”

 

Slide 24  [Signature Confirmation]:  I’ll do some screen shots live for the “studio audience.” [Mr. McKennirey referred to a number of screen shots from here that were included in the presentation document.] We have this application. It’s on my laptop. I’m logged on as someone. You can see the name and current role in the upper right side. This person starts the process. He wants to do a payment request. What happens is, the application will create an XML document, then send it to the user to capture their input. In the logon process, there’s lots of non-XML stuff related to PKI. I won’t spend too much time on that. For this user it looks at the certificate, what level of assurance the issuer has, and whether it’s sufficient to work in this application. This person now requests a payment. The application creates a piece of XML that’s blank, asks the application to provide encryption instructions, and the user is asked to fill out a form.

 

[Mr. McKennirey keyed sample data into a Web form.]

 

This is an expense form. We’ll attach a document to it (a spreadsheet) and submit it.

 

Slide 25  [Off Line Operations]:  What this person has done is edited an XML document, presented in a browser, and embedded a spreadsheet, embedded in XML as the value of an XML element. When the person signs, he gets a signature confirmation dialogue box. This isn’t a Web application. This is software on the person’s machine. It’s always present, regardless of application and format, so the user knows what to expect. He’ll sign, comment, record it, and make an attachment. They know what they’re signing. They can save it locally when they sign. So they’ll sign the form, and there’s a lot of encryption going on.

 

Mr. Roy Morgan:  I have a question on the attachment. You’ve told us that the user has attached a spreadsheet—I assume it’s a file of a list of things to be paid for. What ensures that 10 years from now the application that lets them see it is still available?

 

Mr. McKennirey:  That’s an excellent question. We don’t keep a copy of every application.

 

Mr. Ambur:  NARA is working to address the problem, from a research standpoint but, obviously, avoiding the use of proprietary formats is important.  I’ll also offer a comment on the signature process.  Some folks suggest that email is the ideal application for digital signature.  From my perspective, digital signature is overkill. Email is for quick, informal messages, which, by definition, are lower-quality records.  I believe digital signature should be used for higher-value, higher-quality electronic records managed in document/records management applications rather than email system.

 

Slide 26  [Signature Verification]:  Now I’ve moved on to the second step. I’m logged on as someone else with a different role. I will approve the payment process. I’m presented with a list of what to approve. I see a signature presentation. It tells what signatures exist in the background. I can see it was signed by the previous person, what was signed, and why. I can retrieve that attachment. I say, “Fine.” Then I’m presented with XML in the browser and I do what I do, which in this case is approve, so I’ll put in some data here and go ahead and approve it. It’s the same story—I’m signing it, so I see what I’m signing, I see the data I entered. The reason I signed is that I’m the “approver,” and so on. The final person could say “Issue it,” and that’s all we need. Let’s log off for a minute and look at some things going on in the background. I’ll bring up a server side tool that will allow us to see what’s happening with respect to record preservation. First we’ll look at the record as XML, then in the way it was initially created. This is a view of the database. The record is stored in two locations—by the application in a database that the application has designed, and copies are put here for preservation. Here we have all the events of this morning—logons, etc.. If we look at this record, we can ask to see the signatures on it. This is the one that just occurred. We have two signatures.

 

Slide 27  [Usability: Retrieve the XML with Presentation]:  We can see what they’ve signed, and what attachments are included. That persists indefinitely. We can also look at the XML itself—the content and context of what’s happened.  It’s a large document because of embedded spreadsheets. We’re looking at it in Internet Explorer. What we have is, in the header we have context, and in the body of the context, which is signed, we have a series of things we’ve captured—the User ID, time stamp, session ID, role, issuer of the certificate, assurance level of the issuer, token approval, identity, etc.. Then we have the body of the document. There we have data (payment form) and some have been encrypted. We can refer later in the same XML to the encrypted data section for the content of those. We have the cryptographic security with data for encryption, and finally we have the presentation down here saying “That was the presentation that was used.” If the user doesn’t have a copy locally, that’s where he would go on the Web server. You have the content, context, and presentation, all signed as a totality.

 

That’s as an XML form, or you can look at it as the form that the user experienced. So it has a presentation layer. It will drag it up and present it in the browser the way the user saw it. Here, it’s the form. The other stuff can be preserved if we want to.

 

Mr. Ambur:  What is required?  What are the relevant standards?

 

Mr. McKennirey:  XSL and JavaScript provide the presentation layer in this demonstration. We create an XML file in which we embed the JavaScript and XSL, then pull them out and use them. You may notice some data appears to be missing from the form [referring to the demonstration on the screen], that data has been encrypted, and I am not currently logged on to a role authorized to see it. We’ll come back to that.

 

Mr. Bruce Cox:  If someone is filing a patent application, it would include payment information. They would sign a declaration that this document is a true representation of the patent application—it would include the application and externities, including image files, style sheets, and character entity files, so the patent application might have 325 applications associated with it. Does that Audit Vault track it appropriately?

 

Mr. McKennirey:  We’re suggesting that it’s all part of one XML document created by the application.

 

Mr. Cox:  That’s not the way we’ve chosen to do it for purchases and for managing resources?

 

Mr. McKennirey:  In which case we would treat the external documents the same as we do for the presentation layer—we do a hash of the item, and we store the hash and the location of the external item within the document, so when it’s retrieved, we know what it is supposed to get.

 

Mr. Cox:  And it knows how to put it together again?

 

Mr. McKennirey:  Each time we add support for a document type, it adds overhead—but yes, we can do that. We focus on the standards—VoiceXML, SVG, etc. If it’s some other type of document, there’s a question as to whether we need to render the document in a proprietary format, in which case some work. If it’s an XML based data format, as may be defined by a style sheet, schema, or DTD [Document Type Definition], there’s no problem.

 

Ms. Hayes:  The IRS is using XML for its next generation of “E File.” The approach has been SOAP with attachments in order to capture the main body of corporate income tax, then other different MIME [Multipurpose Internet Mail Extensions] parts reflect the attachment—perhaps PDF [Portable Document Format] or spreadsheets…requiring digital signature in other sections. Would your approach work in that situation?

 

Mr. McKennirey:  Yes. Today we’re showing an application-to-user type scenario. The reason I mention it is, we’ve mimicked the ebXML and SOAP approach to the header and the body. Even though it’s going against one document, there’s no difference supporting that. Conceptually that’s what we do.

 

Slide 28  [Summary Part 1: NARA Compliance]:  [Skipped]

 

Slide 29  [Part 2 Objectives: Data Confidentiality]:  In Part Two, we talk about data confidentiality—“How do we comply with legislation and regulations with respect to privacy and confidentiality of data?” Whether it’s financial or healthcare information, or the consumer’s private information, how do we do it in the workflow process, where people put information into documents that others would have access to?

 

Slide 30  [W3C XML Encryption]:  Here we’re talking about XML encryption. This is what one looks like in XML: you have an algorithm, Key [PKI] information, and data. An important note is at the bottom of the page—the W3C standard doesn’t address how signature and encryption happen together in the document. There are lots of things involved. They’re not addressing it for the moment, so if you sign and encrypt, nothing tells you how it happens. The encryption standard is dealing with a static event (you take part of the XML, whether an element or something else, and it’s going in to encryption, then you’re going to deliver it to someone else.) It doesn’t deal with the Key infrastructure behind it to make it work. “How can the other party do anything with it?” You’re supposed to send them something, which they then encrypt and send to you. How does it happen?

 

Slide 31  [Encryption Sequences in an Interactive Application]:  There are three sequences: #1 is internal—inside the document, “What is the sequence of events—sign, encrypt?” “Sign, encrypt sign?” What is the sequence? This describes the sequence. If you sign, then encrypt and send off, they can’t verify it because of the signature encryption.

 

The second sequence is the interactive sequence—one party interacts with one other party, and they work out the “How I do it so that you can understand?” question. It’s how we describe the chronology of events such that both can unwrap. The third one is the workflow sequence. It’s the most interesting, and it gets lots of conversation. If multiple parties are involved, how do we encrypt so that the appropriate parties see what they’re supposed to see? If we fail in any one of them, the whole thing falls down.

 

Slide 32  [Internal Encryption Sequence]:  We have various elements on an XML document. How do we know what to sign or encrypt? We need a way to describe which is first, next, etc.. That’s the internal sequence.

 

Slide 33  [Interactive Encryption Sequence]:  Then we have the interactive sequence. Start anywhere—assume we start with the application. It sends stuff today, which is going go to encryption, then to be signed. The user does some stuff, then sends it back to the application. The application has to decrypt it and verify it. That’s the classic scenario, but it could also be different.. What’s important is not the sequence, but that you keep it coherent.

 

Slide 34  [Workflow Encryption Sequence]:  To allow the users to change, we use the concept of roles. When we send data to “User A,” like we did a while ago, we sent an instruction that said “Encrypt the following roles,” and we sent the Keys. So when the user sends back, various data elements encrypt under different roles, so when we couldn’t see certain data, it was because we had no authorization to. When “User B” comes along, it looks at what role he has, and looks at the encrypted elements to see what they’re encrypted for. If they’re for his role, they send it to him.

 

Mr. Ambur:  I view this slide in the context of the FEA Service Component Reference Model.  To me, each one of services you’ve identified is a component that should be reflected in the Service Component Reference Model.  I’m not sure whether people are grasping that.  Maybe John [Dodd] knows.

 

Mr. John Dodd:  No, I don’t think they are. This thinking is not included in the process, and it should be. The business process, the roles, even the location needs to be looked at. If it’s in a public place, if they have a PDA, if they’re in the office they can look, but if they’re in a mall somewhere maybe they can’t look. There are levels of sophistication. Even the thought process has to be brought in, and we’re not there yet. Even the mindset has to be brought in.

 

Mr. McKennirey:  So that’s why we capture the kind of token being used—so you can apply the role, the rule, such that even if you know the identity, if they use the wrong medium, like a smart card which is disallowed, you can deny it.

 

Slide 35  [Encryption Sequencing Requirements]:  [Skipped]

 

Slide 36  [XML Structure of <Actions>]:  We’ve come up with a piece of XML that says, “OK, first of all we’re working from the inside-out.” We have an internal sequence of events wrapped into an interactive sequence. We know we have the server—one party to another, in four steps back and forth, that’s wrapped into part of the workflow process. We have the Workflow Sequence, the Interactive Sequence, and the Internal Sequence. 

 

Slide 37  [Protection of Actions]:  This is that piece of XML that defines the data security on the XML document—encryption and signature, what’s encrypted, what’s signed, what Keys are used, and who it was signed by—so in the database you have the name of the form, and a piece of  XML describing the encryption operations of the form. Everything we touch, we sign and check the signature before we run, so if someone hacks it, we’ll refuse to run it.

 

Mr. Ambur:  Is this what you mean when you talk about an “encryption-enhanced database”?

 

Mr. McKennirey:  We have two. This is the TrustLogic metadata database where the TrustLogic server-side application stores all the information it needs: profiles, definitions of cryptographic events, etc. It always signs every one of these aspects of the security configuration of the application and checks the signature before it runs.

 

Slide 38  [Object Manager Interface to <Actions>]:  The Object Manager is the interface for this.

 

Mr. Morgan:  You mentioned that the system checks its signatures of a record before it takes stuff out. Does that require communication with an outside agency?

 

Mr. McKennirey:  It’s its own signature, so it’s checking itself.

 

Mr. Morgan:  That’s secure, so that someone you don’t want can’t modify it?

 

Mr. McKennirey:  They’d have to have the TrustLogic server’s private Key.

 

Mr. Ambur:  People in this group like to see the XML.  Do you want to show what it looks like?

 

Mr. McKennirey:  This is another server-side utility [Mr. McKennirey displayed an example.] This is the set of objects that we collectively refer to as the metadata that supports this application.

 

Mr. Joe Carmel:  Are you interoperable with other applications, particularly in terms of cryptography?

 

Mr. McKennirey: The cryptography is standard.

 

Mr. Joe Carmel:  So we could use files from other systems?

 

Mr. McKennirey:  Yes... [Mr. McKennirey is commenting about a server side utility that holds the cryptographic instructions throughout the workflow process]  For this application, called EPay, we have our payment form, where we capture all the things that are going to happen. Here’s the workflow process. We have various activities—sign and encrypt, we have approval, when it was issued, we have more of the same, and here we have one where we have two “sign and encrypt” operations, because someone will be signing it late. It’s actually a piece of XML. This is what it looks like. This is a Request for Payment. We have an operation of things to be signed, and another of things to encrypt. That XML that we put into it from the security application database is embedded in the XML document. It stays with it forever as part of the document, so any time in the future that it’s processed, they all occur. So online, the same thing happens as offline. We also talk about roles—so a manager, for instance, can encrypt here using AES128, but can define for each role the algorithm he wants to use. A role could use AES 256 or something else.

 

Slide 39  [Rationale for <Actionset>]:  This is also where we keep track of which XML elements are encrypted in which sequence. For instance, a manager says, “Those are the users that have that role,” so you intentionally collect elements that have to be processed together. Rather than go through the document and say, “This is signed, this is encrypted”—you can’t do that. Sequencing is difficult to manage. Some elements at the bottom have to be signed first, so that’s why we have encryption instructions, and process it that way.

 

Slide 40  [<Actions> and <Keys> in XML Structure]:  Now we include our Actions, the encryption instructions, and the Keys that will be needed. That completes our document.

 

Slide 41  [Putting it Together]:  This puts it all together for you. For our demonstration, someone logs on. That’s when we create context—their role, what application they’re using—and we check their identity. They then make a data request. We put the context in the document, pull the content out of the database, put it all together, create the document, sign it, and send it to the user. That becomes the document that’s preserved.

 

Slide 42  [Attachments]:  Attachments go in as Base-64 encoded, and become signed and encrypted as a value of an XML element.

 

Slide 43  [Redaction]:  In terms of redacting documents, we’re putting them in PDF. They output in XML or PDF. The output is only going to provide for the data that the user has access to. If the document  has data encrypted for various roles, when the person pulls the data and prints or saves it as XML, he only has the elements he’s supposed to have. So you can have a collection specific to roles, and the people retrieving it only see what their roles allow them to.

 

Am I over my time?

 

Mr. Ambur:  I don’t know whether we’ll have any Task Team Leader reports at the end of the meeting, so I think we have time for you to finish your presentation.

 

Slide 44  [Demo: Data Confidentiality]:  So we’ll just log on as different people for the demonstration and look at the document. We’ll log on as the last person in the process—the person who’s authorizing payment to be issued. They’re authorized to the role of issuer; they go ahead and issue payment. In this application, we’re using a SQL [Structured Query Language] database to hold the data, so we created a relational database, where the XML document is one file in the record. The other files are information we extract from the XML document to pick the right one. We have the right one, because we have a file that lets us know the status of the document, etc….it’s found a few signatures. We’ve seen this before in the audit log. We’ve seen the other signatures. Let’s just finish this off with the approval—this particular one at the bottom of the page, which the comment…

 

Ms. Theresa Yee:  What does “HR” stand for?

 

Mr. McKennirey:  Human Resources, I think. This demo was put together based on a requirement suggested by a systems integrator for a project they were working on at the time.

 

Unidentified Participant:  If the final approving official is a micromanager, and wants to look at the MS Excel Spreadsheet, can he open it up? What if he finds an error and makes a change? Is that a second attachment, or is it not allowed in your workflow process?

 

Mr. McKennirey:  I’m going to frustrate you here. We make no decisions for the application. The application decides all of it. This application said, “We’re going to sign that attachment by the requestor.” If someone now modifies it, the signature will fail in the future.

 

Same Participant:  That’s why I asked if it was a second attachment to it.

 

Mr. McKennirey:  The application could include in that application a possibility to sign another attachment. TrustLogic is just the policeman. It enforces what the application designer has chosen to do. How the application does it is up to the application designer. I’m now using a view process. It allows us to step outside the workflow and let anyone view the document. We’ve now collected four signatures; we have two signatures from one participant. We have two because the first signature was encrypted. Since we want people to verify some of his other data, they’re signed separately. Now I’ll log off and log on as another user.

 

Mr. Marc Le Maitre:  When I create a document, is it implicit that it’s a single copy of that document. What’s the thinking regarding adequate record keeping and dispersal—(let’s assume two parties need copies of the document)? The second question is about updating the document dynamically—for both parties to agree to a change, either can make it, and it’s reflected at either end?

 

Mr. McKennirey:  There are as many copies as appropriate. Today we have three: one by the user—local—one in the application, and a third one goes into the audit record. If you want access to encrypted data, you must go through the application and be authorized to a role beforehand.

 

For the second question, it’s an application design issue. If you want to allow people to modify signed or encrypted data, it’s no problem. You make it available, and when they’re finished, they sign or encrypt again. [Mr. McKennirey referred to the demo appearing on the screen] So here is the same document. This user sees three signatures. She can’t see the fourth, because it’s encrypted. She doesn’t know it’s there. She also doesn’t see the comment at the bottom, because it’s encrypted. She doesn’t know it exists, and won’t see the signature on it. If it’s PDF, it comes out just like that. Let’s do that, then I’ll finish.

 

The last bit of this…the gentleman asked about copies. I’m going to start an interface that’s available to users. They don’t need to use it at run time, but it gives them access to the document. Since we just have the signed XML, the XML is not very user friendly. This client application allows us to look at it again. It’s going to go into a personal archive (the ones she saved). She can check signatures on this—this is what she saved when submitted—it only has her data. She can review it offline, and modify it if she wants to. She can look at any records any time; they’ll come up in the form she created. She can modify, save, and upload to get back to the application later if she wishes. She can also create a PDF of the file, called “Barb” because that’s the name of the person who created it. That’s using XSL/XFO. We can go into the PDF document that she just created, and browse to it on my machine. This is the PDF document she printed from her version of the XML.

 

Unidentified Participant:  How do you manage the aging of signatures, since any may expire at any time?

 

Mr. McKennirey:  Signatures will expire. Our view is that the important thing is that the signature is valid at the time it’s used. By definition, it won’t be valid in PKI terms in the future. The issue is, was it valid at the time of signing? That’s what we keep track of—that you checked the certification status at the time. 20 years from now, it’ll clearly be expired..

 

Same Participant:  But can you read the document, if it’s encrypted?

 

Mr. McKennirey:  Yes. It’s W3C standard, so anything using the standard can read it in the future. You don’t need our software to do it. The same goes for the signature. You do have Key management if you have role-encrypted data, and that’s an additional burden on the archivist to keep Keys for documents.

 

Unidentified Participant:  In an earlier comment I thought you said those Keys are tied into the application itself.

 

Mr. McKennirey:  No—someone could write an application in the future…

 

Mr. Ambur:  You need a service component that takes care of it. You can preserve the style sheet with a presentation component.

 

Unidentified Participant:  As long as the other guy is backward compatible, you’re OK…

 

Mr. McKennirey:  We’re not attempting to resolve that.

 

Unidentified Participant:  There’s an implied commitment that when you store, you do it forever, as long as you use that document.

 

Mr. McKennirey:  Transfer it into XML, then save it as XML.

 

Unidentified Participant:  Your example was the URL [Universal Resource Locator]—as long as you embed the URL, if it’s to be accessible, you have to maintain the document at that URL.

 

Mr. McKennirey:  That’s assistance for the user. It doesn’t have to be at that location. We take that URL and there’s a copy of the presentation at that location. The user pulls a copy down locally. You can have versions locally.

 

Unidentified Participant:  So it’s pointing to a style sheet, and if the style sheet is not at the URL, it’ll work?

 

Mr. McKennirey:  Yes. It says, “I know the name of it. If I don’t find it there, then look for it at this URL,” so the URL doesn’t have to persist over time.

 

Mr. Cox:  How does this scale? Suppose you receive 500,000 documents per year that are signed and encrypted?

 

Mr. McKennirey:  We haven’t tried 500,000 documents. The server technology has been designed to support load balancing. You can have as many instances as you want.

 

Unidentified Participant:  It’s not just the service, but those are a matter of precedence.

 

Mr. McKennirey:  The databases are not our databases. We’re using SQL databases. The capability of the database is defined by which vendor’s product you choose to use. We’re just doing the encryption. In terms of the capability of data storage, it’s Oracle, MySQL, Microsoft, etc.

 

Mr. Ken Sall:  Roles versus visibility—by default, it’s not visible if it’s signed such that it’s not part of your role. Is there a way to override that? I imagine a situation where the data is dynamic—a situation where someone should have access, but didn’t at the time it was configured. Can you can set it up so that if they contact some administrator, they can get access?

 

Mr. McKennirey:  Yes. The roles are arranged hierarchically, so anyone with a superior role can access these or roles below, so if you need to you can change the role hierarchy, put that role in there, and have access to data for any subsidiary role.

 

Mr. Ambur:  That’s very interesting.  I’ll close this by saying I’m glad we don’t deal with information that needs this degree of security and protection.  Various government agencies do have a lot of information, though, that does require such treatment.  Part of the role for this group and for the group to which we report is to come up with a process to manage the need for these kinds of innovative technologies.

 

Mr. Larry Henry:  Has the speaker given thought to how this technology is used in a federated world?

 

Mr. McKennirey:  It’s designed to operate in a federated world. I focused on XML aspects today rather than PKI and identity. There are hours more of presentations on that. The technology manages identities across domains and authorizations across domains. It can apply a coherent role structure across domains. You can have multiple uses across agencies in the same application.

 

Mr. Terry Alford:  Your definition of context is a richer notion than typical. As I understand it, context is used to shape the meaning of information associated with a document. We have it normally expressed in terms of community or namespace. This is richer. When we talk about federation of namespaces, that richer notion of context plays an important part. You didn’t go into the meaning relative to those contexts. We need to document the meaning relative to the context. It raises the issue of the taxonomy of context.

 

Unidentified Participant:  That’s where the EGov is going to. We talk in terms of space, but it’s context, because we go across the board.

 

Mr. Ambur:  Let’s take a break and start up again at 10:45.

 

Mr. Ambur:  I think we better get started to keep on track.  I first learned about HiSoftware through an EGov conference at which Dana Simberkoff was presenting.  The title of her presentation attracted me because, with reference to OMB Information Dissemination Quality Guidelines, it would be useful for agencies like OMB, NARA, and GPO to do an analysis of the information agencies are presenting on their websites.  So I contacted Dana to see if she would be interested in making an XML-focused presentation to the XML Working Group.  Subsequently, I was pleased to learn that one of her clients is the Biological Resources Division of USGS, one of the sister agencies of my own agency in the Department of the Interior.  So accompanying Dana is Mike Frame, representing USGS/BRD, with which my agency has a close relationship.

 

 

 

Ms. Dana Louise Simberkoff,  HiSoftware Company

Mr. Mike Frame,  USGS/NBII

Automated Analysis and Reporting of Web Records Quality
 

Slide 2  [Agenda]:  I’m Dana Simberkoff from HiSoftware. I’m very pleased to have Mike Frame with me. We’re a dog-and-pony show. We’ll go back and forth a little in our presentation. Theses are the topics we’ll talk about in our presentation. Mike will talk about NBII, and I’ll talk about HiSoftware, then we’ll get into metadata, then some of the work that Mike and his agency are doing in validating the quality of XML records, and the Dublin Core metadata they’ve incorporated with them. I want to start by saying that we’ve been on this for several years. It’s evolved over time. Mike will fill you in on what they’re doing. They’re one of the largest metadata clearing houses in the world, so Mike?

 

 Slide 3  [NBII:  What is it?]:  I have just two or three slides to set the context. Owen mentioned this NBII. I am in USGS, but it’s [NBII is] multi-agency. The bottom line is trying to get access to biological information, regardless of who owns it, wherever it happens to be. You can imagine some of the complexities it presents us.

 

Slide 4  [Contributors and Users from Diverse Sources]:  We’re dealing with multiple institutions and academic institutes; we’re dealing with multiple types of data and data quality—dealing with those issues in this distributed environment.

 

Slide 5  [NBII Node Types]:  Two to three years ago, we got a budget increase in USGS. For eight years, we had been working primarily on metadata. A couple years ago we said, “It needs to be regional,” so we got funding to establish regional nodes around the country. There are six right now. We also got money for thematic nodes—things like birds—that’s huge, not one that’s limited to regions. I work on infrastructure issues, “How can this be incorporated regionally?”

 

Slide 6  [NBII Regions]:  You can see on this next slide how the regions around the country are constructed. State boundaries are artificial. Appropriations are state-based, though, so it’s heavily state-influenced. There are about eight regions. They vary grossly in funding—in the range of about $3-$6 million. We provide infrastructure, whether it’s archiving, hosting, etc. in the regions, so you can see why this is important to us.

 

Ms. Yee:  Is Alaska funded?

 

Mr. Frame:  No. It might be in next year’s budget.

 

Ms. Simberkoff:  I wanted to ask, Mike, if you could mention the other agencies who work on NBII?

 

Mr. Frame:  A lot of environmental agencies. Owen mentioned [Department of the] Interior, and EPA of course.

 

Ms. Hayes:  Is this being done with the concept of the Open GIS Consortium?

 

Mr. Frame:  We do a lot of them. A lot of applications around the country are doing mapping. We’re a big proponent of the GIS registries.

 

Ms. Hayes:  Are you also working with the catalogue specification in Open GIS?

 

Mr. Frame:  [Department of] Agriculture is a big player with us. NASA is heavily involved for things like weather programs.

 

Mr. Dodd:  Do you keep a lot of weather data?

 

Mr. Frame:  Coastal issues—NOAA [National Oceanic and Atmospheric Administration] is involved. They’re not a huge partner. They’re in several regions, but they’re not significant in providing funding to the program.

 

Slide 7  [About HiSoftware]  [Ms. Simberkoff]:  HiSoftware began our work with NBII four years ago. One of the challenges for NBII was to manage the quality of records across organizations with many different parts—to manage records in a central data repository, but from all different contexts. Our solutions assist organizations in validating and implementing policies across Web records, whether HTML, XML, or any element or text-based content. Today, we’re talking about XML records, but I like to put it in the context of issues that face Web record managers, because the more you can address multiple issues that affect many people and areas, the more effective you’ll be.

 

Slide 8  [To Whom is This Important? Corporate and Federal]:  These are corporations and federal entities we work with. HiSoftware has done a large amount of work with each of these. One issue that many of our customers face is Section 508 [of the Rehabilitation Act]. It deals with accessibility of records. It’s a requirement, and it’s not at the top of everyone’s list. It’s one of the things our agencies are addressing.

 

Slide 9  [States and Universities]:  We also work with states and institutions around the world dealing with issues about metadata, usability, and accessibility.

 

Slide 10  [Defining the Problem]:  Whether large, centralized or small, decentralized, one of the wonderful things about the Web is that everyone can be a content creator, but with desktop Web editors like Front Page, etc., everyone can create content for the Web, and a number of our customers have such large Enterprise Web presences that they don’t even know how many or what kind of pages they have. We’re constantly hearing from customers about their discovery of their own pages.

 

Mr. Dodd:  Even whole websites. It’s not limited to pages.

 

Ms. Simberkoff:  One of our customers thought they had about 600 websites to monitor. When I last talked with them, they had 1,600 they’d discovered. The point is how important policies are across websites. When you don’t know what Web content you have out there, it’s impossible to maintain those sites adequately. This applies also for Web records or any information you need management across, because when you can apply policies across the information, it’s far more valuable. When you can’t, it can be lost. I think of accessibility not just in terms of [Section] 508, but usability of information. It should be used in the broadest possible context. That’s in contrast to encrypting data, but even in its case, a policy is necessary.

 

Mr. Ambur:  It seems to me that the two would not be in conflict.

 

Ms. Simberkoff:  They work side by side.

 

Mr. Ambur:  For example, if we have data that’s covered by the Privacy Act, perhaps it should be encrypted.

 

Ms. Simberkoff:  Absolutely. Also, maybe you have information that you don’t want out there. Even now, agencies are looking at pulling data from their public websites that they don’t want to have out there.

 

Slide 11  [Designing the Solution]:  So we’re looking at solutions that automate the process to look at records to see if they do or don’t need to validated. When you say “QA” [Quality Assurance] people cringe, but this is really QA in motion. It can be implemented by administrators and policy managers, not necessarily technical people, because the people making policies are not necessarily developers or technicians. Maybe, but not necessarily. It’s important for the solution to be for the technical and non-technical audience.

 

Mr. Ambur:  Whether people like QA or not, it’s the law.  Each agency is required to have its own information quality guidelines, effective October of last year. So this can be a test of the effectiveness of those guidelines.

 

Ms. Simberkoff:  It’s like setting a curfew on teenagers and going away for the weekend. If we have a law we can’t enforce, we have a problem. The goal of our products is to provide entities with a mechanism to monitor the implementation of policies, because when a policy is not fully implemented, it’s not necessarily flawed, but maybe needs more outreach to the implementers.

 

Ms. Hayes:  Do your products deal with static content, or…

 

Ms. Simberkoff:  Static, dynamic, database-driven. You can use our solutions through content management portals that serve entirely dynamic content. We also, of course, test static content. In fact, Owen learned of us through a presentation we co-presented with Microsoft at “Web Enabled Government” about the content management process—entirely dynamic.

 

Mr. Frame:  I have examples of what we’re doing, both dynamically and statically.

 

Ms. Simberkoff:  So the process is on the developer’s side and on the back end, so the more you can check off on your list of standards that you comply with through the same set of tools, the more effective it is. One of the challenges for our customers is, the law says to comply with [Section] 508, but there’s no formal funding. When you can also address privacy, searchability, and other critical issues, it’s better for the business case.

 

Mr. Ambur:  I’ll mention the Federal Enterprise Architecture again, particularly the Service Component Reference Model; the intent is to identify reusable components.  As we said in the earlier presentation today, we need to identify service components that should be addressed in the Federal Enterprise Architecture.

 

Mr. Mike Todd:  Owen, do you have any metrics on what components are being used?

 

Mr. Ambur:  No, not at this point.

 

Mr. Dodd:  It’s eventually linked to the Performance [Reference] Model—that you’ve incentivized the line shop to go across agencies.

 

Slide 12  [Keep up With Ever-Changing Web Technologies and Standards]:  One way to address it is that one technology deals with static content, but also with emerging technology, so we designed a solution with an open architecture and interface so that customers are coming up with their own ways to use our product that we didn’t envision. They’re designed to test CSS [Cascading Style Sheets], XSL [Extensible Stylesheet Language], SVG [Scalable Vector Graphics] Web implementations, etc., but we have customers using our solutions to test email, corporate users who test JSP code, a variety of things. The more open your technology is, the more longevity it may have.

 

Slide 13  [Site Quality Factors]:  So in my list, this is a starting point of site quality factors. Accessibility and usability are at the top of the list, because putting a record out there that’s not usable is meaningless…Searchability—as records grow exponentially, the ability to retrieve them is mission critical. Privacy and exposure—as agencies collect confidential information, with the upcoming HIPAA [Health Insurance Portability and Accountability Act Of 1996]—the healthcare legislation that’s coming into affect with collecting this information from individuals—a variety of standards affect this.

 

Site quality user experience—Consider information access to the public. For the thousands of pages you didn’t know existed, your constituents might find missing images, slow loading pages, etc.. Maybe we should consider it as part of the QA practices. Metadata—which is critical information about a document to enable indexing for search and retrieval—then custom checks may equated with internal organizational policies or perhaps the common look and feel guidelines for your agency. For example, where the agency seal appears, you can have ‘alt text’ associated with it, or when a name appears on a website, no information that relates to confidential or private information should appear with it. This type of validation may be used so that on some websites—for example DoD—you could not find out that a person would be at a certain place at a certain time.

 

Mr. Alford:  Are you going to talk about quality factors in more detail?

 

Ms. Simberkoff:  Absolutely. We’ll focus on XML.

 

Mr. Alford:  Just a comment—recently DoD came up with a new set of prioritized data management goals. I’m doing a mental correlation with your site quality list. It’s a rough match, but there are differences. Their #1 and 2 goals are visibility and access, so it matches up. They also have a goal of trustworthiness, which is privacy and exposure. They also have a couple of others; one is understandability; one is user feedback. Maybe it’s quality of experience, but it’s a quality improvement notion…

 

Ms. Simberkoff:  I think in terms of the first standard, the usability, I’d equate it to usability. You can build a site that’s 100% [Section] 508 compliant that’s still completely unusable, so 508 is a starting point. It doesn’t talk about whether it makes logical sense.

 

Mr. Alford:  I would comment on the difference between understandability and usability.

 

Ms. Simberkoff:  Right. Those are good points. I can add them to the list. Those are the types of things that can be addressed out-of-the-box by the ability to incorporate custom checks.

 

Mr. Alford:  In the interim, you would want quality management metrics associated with that, like “How good are you doing,” and “Where do you want to be?”

 

Ms. Simberkoff:  Yes. I’ll be talking about the audit trail of how sites progress over time. If you’re dealing with one site, it’s not a big issue, but if you’re in the CIO offices, managing hundreds of sites or thousands of Web teams, how can you have a central system for managing the process over time?

 

Slide 14  [Metadata Policy]:  This slide talks about how NBII has implemented solutions for metadata across their catalogue of information. We can equate metadata to cards in the library card catalogue, where it provides critical search-and-retrieval information.

 

Slide 15  [What is Meant by the Quality of a Web Record?]  [Mr. Frame]:  This goes to what Dana said, though from our perspective—the quality of the record—how do we ensure that there are standard policies that everyone’s following, whether tags, shared vocabulary, access, whatever? How do we monitor it? We mentioned conformance; then there’s stuff about the integrity of the document.

 

Slide 16  [Types of Metadata - NBII]:  When we started with HiSoftware four years ago, the first thing we did was, within NBII, “How can we work with our partners to make their information more accessible to us and the general Web? So we said we’ll start using metatags. We developed standards in NBII, and said “If you’re creating a page, here are four or five things you need to do.” We worked on descriptions, keywords, and titles, then a few years ago we went to dynamic content. All of our nodes are cataloguing records. We went to the team and said, “How should we do this?” They said to use Dublin Core. The point is, getting to a point where we can catalogue something that’s resource relevant and then get to a Web representation.

 

If researchers are in the field studying a particular ecosystem, they have to document the conditions relative to that. We came up with a system specific to this biological data. There’s a system out there now providing access to this metadata. Now we’re going back to, in some cases, 300 - 400 elements. We’re able to pull those layers and conditions back.

 

Unidentified Participant:  For your 20 groups…

 

Mr. Frame:  I’d break into different sections. We’re using Dublin Core to catalogue into different sites. These are SGML records that say, “I’m in Virginia studying this land. What are the conditions? What did I find?” All these attributes. Those records go through the QA/QC process, get parsed, and then dumped to whatever clearinghouse makes sense.

 

Unidentified Participant:  Your search tool will search on the metatags?

 

Mr. Frame:  There’s not a slide of it, but a couple years ago, in NBII at one of our public sites there’s a search tool. It goes into the metadata database, dumps them out of that, and indexes with an overall search umbrella …

 

Unidentified Participant:  So I can take your tag and say, “Author equals…?”

 

Mr. Frame:  Yes. You can do it on a public site—wherever the data exist.

 

Ms. Hayes:  How do you handle the flexibility of Dublin Core?

 

Mr. Frame:  It is a big issue for us. We worry a lot about taxonomy. It’s controversial in biology because of species names.

 

Mr. Alford:  It’s a good argument for the need for ontologies—multiple thesauruses or taxonomies that can map different viewpoints and interrelate.

 

Mr. Frame:  A couple years ago we worked a lot on, “How do we relate all these vocabularies and still provide the user something manageable?” We can’t give them a list of amphibians. Also—the Integrated Taxonomic Information System is the U.S. authority for species names, so you can go to that system and pull it in.

 

Mr. John Kane:  Topic Maps, or just RDF [Resource Description Framework]?

 

Mr. Frame:  No Topic Maps. We’ve used them for some of the geospatial stuff. We ‘re starting to get into it, using some of the Alexandria stuff that was done.

 

Mr. Frame:  You mentioned about custom checks for look and feel—a couple of years ago, there was the issue of branding—“How do you tell it’s related to who’s network?” At the time, we were setting these state-based entities up. We knew it would be Web-enabled through the whole region. It had to be by region, and then tie it together.

 

Slide 18  [NBII XML Dublin Core File]:  This is an example of a Dublin Core record. We said, “Let’s check for title tag and character limit, and ensure they’re using it.”

 

Slide 19  [Dublin Core (example)]:  If we walk through it, we can look at information about the checks—check on title, referencing a schema, etc.

 

Slide 20  [Dublin Core (example, continued)]:  Here we’re checking to make sure it’s true, and less than 256 characters. We can also look for particular elements.

 

Slide 21  [Dublin Core (example, continued)]:  In this instance, if we pass this test, say, if we fail, what will it say in its reporting?

 

Slide 22  [Dublin Core (example, continued)]:  Here I picked one file. You can pick multiples. This is just a client piece. I just pick this one file and run a report.

 

Slide 23  [Dublin Core (example, continued)]:  You can see that this is the report format I’m using. I can customize it, do some filtering, dump it onto the Web, or put it into XML.

 

Slide 25  [Validation Can Be Automated]:  I can say, “Yes, I found this at 96.” I can say, “Here’s a link to define the file and how to implement the policy…”

 

Unidentified Participant:  Can you just go through there, if you don’t want to see the names of the files, how many failed, etc.?

 

Ms. Simberkoff:  Yes, absolutely.

 

Mr. Frame:  You can choose to just see the failures, and/or a specific section.

 

Ms. Simberkoff:  What Mike’s shown so far is user-driven, interactive tests created by any policy person in the organization, distributed across the organization to users. The same testing can be automated on the back end and run as a service. In this case, it was a custom test created ahead of time, automated to a schedule, and the results were auto-emailed to a predetermined recipient list. So if I were running an automated test in Arizona for a Web team-based customer management group based in NJ, results go to a group in NJ, Arizona gets a copy, the CIO Office gets a copy, and so on, so it automates the process of alerting owners of Web records that records are out of compliance.

 

Mr. Dodd  So this can be a component of a thought process for people if they know what data they had to manage, the enterprise quality reports.

 

Mr. Frame:  We had a new person start, and we said, “Look at these node sites. Tell us if they’re following this process. We’re now running those in an automatic fashion instead of looking for time to do it manually. Now we can spit them out automatically. If we’re not doing that, let’s distribute them back out.”

 

Unidentified Participant:  Are they?

 

Mr. Frame:  They are. We have one in California that consistently won’t do it. We’re going to broadcast the mail, and hold our proposal process until they come around.

 

Mr. Ambur:  If you make results transparent, stakeholders will deal with it if it is important enough to warrant action.  However, if performance is not made transparent, no one can manage it.  In response to John’s comment, it’s a perfect opportunity for GSA. Through FirstGov, they’ve advertised for content management features to be provided on a government basis.  The first step would be to assess the quality of records made available today.  To the extent that standards for metadata are established by the agencies themselves, actual performance against those standards can automatically be assessed. Based upon the standards of the Web, the information is available to be harvested.  Unlike many good ideas, it is readily doable.

 

Mr. Frame:  This is easy.

 

Ms. Simberkoff:  Easy.

 

Mr. Ambur:  It constitutes potentially low-hanging fruit.

 

Ms. Simberkoff:  A lot of agencies have it in their processes. One of our customers has a big employee education system component, where they host training and education resources across their agency. It’s entirely Web based. They employ a large population of disabled federal employees, so access is a big priority in their work. They have a process to evaluate access of records and content, but also a lot of content is generated by contractors. Their challenge is, “How do you manage contractors?”

 

Mr. Dodd:  Make it performance-based and take away the fee. It’s easy.

 

Ms. Simberkoff:  But if there’s no tool to test content, then you can’t address it. This provides that ability.

 

Ms. Hayes:  Has the Intel community addressed this?

 

Ms. Simberkoff:  There’s a metadata policy, but it’s not implemented as part of the acquisition process, so linking to acquisition is necessary to make it work

 

Slide 26  [Monitoring the Web]:  Another thing compelling to the customers who are decentralized is, it can be administered through a Web-based interface, so Mike in Reston [Virginia] can monitor a system in Denver through his browser. It can be behind any firewall.

 

Slide 27  [Developing a Strategy for Content Quality]:  In addition to being a service with automated reports, it can be a “test on demand” for users. It gets to the discussion for developing a strategy for quality. A critical piece of the application should be the ability to think of content quality from a project management perspective, so you can allocate resources appropriately, integrate quality into the review process, keep audit trails, and identify problems. The audit trail is important for tracking a record or for compliance to policies. It’s critical for funding issues to show the work you’ve done. It’s also an important piece of designing the education process, because when you’re managing content, all with different backgrounds, you need appropriate education resources because passing a law is one thing, but showing what it means to people in daily work is another.

 

Slide 28  [The Web Project Life Cycle]:  Many agencies develop internal work groups to provide feedback to developers. It’s equally important to empower managers and administrators with tools to see what ‘s going on, because sometime it’s that separation between policy makers and people in the field, and because sometime the flaw is with the policy. It’s a good way to measure that.

 

Slide 29  [Overview of HiSoftware Solutions]:  Here’s an overview. Our products allow to you test compliance, and the back end automated service solution does the testing automatically. It does integrate out-of-the-box with several solutions. Also, you can integrate it into proprietary existing IT infrastructure.

 

Slide 30  [Verifying Compliance]:  Files can be selected a number of ways. 

 

Slide 31  [Easy Access to Reports]:  The accessibility of the reports is through the interface of the product.

 

Slide 32  [Summary Reports]:  This is the summary: so if I have a file with a million pages, I want to look at one report to see how many passed and how many failed.

 

Slide 33  [Detail Reports]:  Then I have the ability to drill down, get detailed information about files that failed, and make corrections.

 

Slide 34  [Additional Reports include Alt Text Quality]:  We include things like alt text quality, so tags are not just default tags assigned by your application.

 

Slide 35  [Metadata Policy]:  Pre-configured metadata policy looks at keywords and standard tags. It can also be configured by your organization.

 

Slide 36  [Searchability]:  Searchability—it looks to make sure tags relate to the document you’re searching.

 

Mr. Ambur:  The overriding focus of the eGov Strategy is to make applications citizen-centric.  Can you tell us anything about that?  I have a sense that most of the eGov project managers don’t have a sense of what it means and that they’re focusing on their internal databases and bureaucratic structures, rather than focusing on citizens and what is meaningful to them.

 

Ms. Simberkoff:  My experience is that most customers are focusing on getting their own houses in order. There are volumes and volumes of information. I saw an advertisement in a magazine a few years ago—a little woman with her feet on a file cabinet saying, “What do you do when your document record manager is no longer there?”  People are in a panic over what to do if the key person leaves.

 

Mr. Ambur:  Thinking out loud—if agencies do figure out what citizen-centricity means, then your product clearly can help them assess the quality of the information they’re making available.  Do you have any capability to help agencies identify whether records are searchable?

 

Ms. Simberkoff:  It’s a question of interoperability of metadata schemas…suppose I look in England, and someone else looks in Italy…we’ve created the ability to map metadata schemas. It doesn’t matter what you call it if there’s a way to map the files back through a controlled vocabulary. It makes information more easily retrievable in a search, and allows files in the search.

 

Ms. Hayes:  What specification do you use for the mapping?

 

Ms. Simberkoff:  One way is through a numeric token—a number that can be mapped across disparate metadata…and the ability to create a vocabulary as well—it provides the ability to search across lists.

 

Ms. Hayes:  You’re not following the RDF or Topic Map approach?

 

Mr. Dodd:  The only place in industry doing the quality checks is financial services on the post-Enron impact and laws that are in place, and in the health care area. Another place that may be doing it is the university researchers, driving it as community data quality—science, weather…Another place is intelligence data, and sharing between multiple communities—metadata and quality checks on names…

 

Mr. Alford:  Most of what you’ve talked about is objective quality standards, but if you think about what people are interested about in quality from a pragmatic view, it’s customer satisfaction quality, so at Amazon.com, for example, users can write feedback about how valuable a review is. I would think one integral part of metrics is to provide user feedback, so users form the role of a quality assurance team. Then you need to provide searches that pull the user feedback and report on it.

 

Ms. Simberkoff:  I know several sites give the ability to rate the site.

 

Slide 37  [Recorder Capabilities]:  The input is like a questionnaire. That information is recorded and stored as part of the test in a database that can optionally create a permanent audit trail to have a track record of the status of files at a point in time.

 

Mr. Dodd:  CDC [Centers for Disease Control] had a national environmental data information system keeping health data monitoring around things like these virus outbreaks. They have that feedback, and clinical trials on the same thing. We have that very much data-driven. The doctors aren’t as involved as much, as they have the nurses do it.

 

Mr. Ambur:  Thinking in terms of the Data Reference Model, we could have a standard schema for citizen feedback.

 

Mr. Alford:  About the university environment—someone might want to report on how well a university’s websites facilitate the students. One could be, “Does it provide quality instruction?” A second could be what that quality was.

 

Ms. Simberkoff:  Things can be tested objectively, but also subjectively. It requires user input.

 

Slide 38  [AccMonitor]:  What’s user-driven on the desktop side is automated on the server side. No user intervention is required.

 

Mr. Todd:  Did you have any clients in the World Trade Center? Are you going to use any of these tools to reconstitute their activity?

 

Ms. Simberkoff:  We didn’t have any clients there, but information stored in non-proprietary databases can be regenerated. With open architecture, a customer can take it into SQL databases, etc..

 

Slide 39  [Monitoring the Web]:  Customers can do whatever they choose to with it. It can be processed and scheduled to run automatically on the back end. To me, the first thing is, “Does the site provide a user feedback mechanism?” Then if it doesn’t, we need an application that can interpret and report on it.

 

Slides 40 & 41  [Skipped] 

 

Slide 42  [Conclusions]:  You should be able to regenerate reports across your system for compliance reasons, etc.. You should be able to go back in time to see what the status of the site was at the time. It’s challenging, because “How much of this do I want, what should be archived?” It’s really a decision of the agency as to what level of detail they want. We believe the tool should at least provide that option to keep track of the process and let users collaborate on the process. An interesting study by Forrester Research in December, 2001 on retrofitting for accessibility found that the cost was as much as 10 times the cost of building it accessible from the ground up. It’s critical for people to know when budgeting, because when developers can develop compliantly out of the box, it saves a lot of money in the long run.

 

Mr. Sall:  With Section 508 compliance for Web access, can you turn off priority 1,2,3 control at that level, and tell it to show what exceptions there are to certain things?

 

Ms. Simberkoff:  Absolutely. The user has the option of checking and unchecking rules to validate against. Some of our customers customized the report language. The report can say, “This is flagged, but here is our policy,” and provide more information on the website for folks who want it. Our goal is flexibility in using the tool to work within your infrastructure.

 

Mr. Sall:  Do some of the screens for validation get populated based on some XML schema? Do you get a bootstrap on the extra validation you provide?

 

Ms. Simberkoff:  I’d be happy to show you the applications themselves. When you test, you can see the results, but also the files you’re testing. Mike had on his slide one of the tagging tools. The tags are based on metadata that already exist. Tagging creates metatags for Web documents, so in their particular case, if a page already has metadata, you have the option of reading in existing metadata and the option to go with it, augment it, or overwrite it, so software keys in metadata exist. In the NBII case, you have a metadata schema. It’s accessible through a different screen of our application

 

Mr. Frame:  We used eight of these, and unique species information. We were generating common metadata cross a lot of our sites. We’d load up about 1,000 files and populate all that. We didn’t have to touch every page.

 

Mr. Morgan:  This implies that the person with the tool has the right to overwrite.

 

Ms. Simberkoff:  They’d need that permission.

 

Mr. Ambur:  I’d like to conclude this discussion by saying I’m glad to know someone is doing something with such a good idea.  There are so many good ideas that lack for action. 

 

Bruce [Bargmeyer], do you want to give us a quick report?

 

Mr. Bargmeyer:  On the letter to try to get OASIS and the UDDI folks to talk to the Registry/Repository group and resolve their differences—we delivered that message before the Open Forum to Patrick Gannon and Karl Best at OASIS. I think they’ve heard the message and are tying to resolve it. I don’t think the letter is necessary anymore.

 

Mr. Ambur:  If there are no more questions or comments, I thank you all.

 

 

End meeting.

 

Attendees:

 

Last Name

First Name

Organization

Alford

Terry

Mitre

Ambur

Owen

FWS

Bargmeyer

Bruce

LBNL

Barr

Annie

GSA

Bellack

Dena

LMI

Billups

Prince

DISA

Cox

Bruce

USPTO

Dodd

John

CSC

Ellis

Lee

GSA

Frame

Mike

USGS

Hayes

Glenda

Mitre

Henry

Larry

CSC

Kane

John

NARA

Le Maitre

Marc

One Name

Lewis

Diane

DOJ

McKennirey

Matthew

Conclusive

Morgan

Roy

NIST

Poot

Lex

DTS

Robinson

Clay

DoD

Sall

Ken

Silosmashers

Simberkoff

Dana Louise

HiSoftware

Todd

Mike

OSD

Troutman

Bruce

8020 Data

Weber

Lisa

NARA

Yee

Theresa

LMI