EAD (Encoded Archival Description ; Version 2002 Official Site)

Development of the Encoded Archival Description DTD


This paper presents general background information and a status report on the development of the Encoded Archival Description Document Type Definition (EAD DTD).

Choosing an Encoding Standard

Development of the EAD DTD began with a project initiated by the University of California, Berkeley, Library in 1993. The goal of the Berkeley project was to investigate the desirability and feasibility of developing a nonproprietary encoding standard for machine-readable finding aids such as inventories, registers, indexes, and other documents created by archives, libraries, museums, and manuscript repositories to support the use of their holdings. The project directors recognized the growing role of networks in accessing information about holdings, and they were keen to include information beyond that which was provided by traditional machine-readable cataloging (MARC) records. The development of the EAD DTD was a cooperative venture from early on, with specialists at Berkeley working in consultation with experts at other institutions. Daniel Pitti, the principal investigator for the Berkeley Project, developed requirements for the encoding standard which included the following criteria: 1) ability to present extensive and interrelated descriptive information found in archival finding aids, 2) ability to preserve the hierarchical relationships existing between levels of description, 3) ability to represent descriptive information that is inherited by one hierarchical level from another, 4) ability to move within a hierarchical informational structure, and 5) support for element-specific indexing and retrieval.

At the start of the project, candidates for meeting the requirements for a standard encoding technique included Gopher presentation of flat (i.e., unmarked) ASCII text, ASCII text marked up using HTML (HyperText Markup Language) tags, MARC tagging using either existing or new implementations of the MARC (Z39.2/ISO 2709) records structure, and markup conformant to SGML (Standard Generalized Markup Language, ISO 8879). SGML emerged from the analysis as being a technique able to meet all of the functional requirements as well as one supported by a large and growing number of software products available for a variety of operating systems. Pitti and his colleagues at Berkeley chose to experiment with the use of SGML in encoding a variety of archival finding aids from Berkeley and other institutions.

Application of SGML

Standard Generalized Markup Language was chosen over other possible solutions because of certain characteristics it possesses. SGML is a set of rules for defining and expressing the logical structure of documents thereby enabling software products to control the searching, retrieval, and structured display of those documents. The rules are applied in the form of markup (tags) that can be embedded in an electronic document to identify and establish relationships among structural parts. Because consistent markup of similarly structured documents is key to successful electronic processing of them, SGML encourages consistency by introducing the concept of a document type definition (or DTD). A DTD prescribes the ordered set of SGML markup tags available for encoding the parts of documents in a similar class. Archival finding aids, which share similar parts and structure, form a class of documents for which a DTD could be and was developed.

Pitti began development of the finding aids DTD by analyzing numerous examples forwarded to Berkeley by archivists who had responded to requests for cooperation. He found the greatest similarities in structure among those finding aids commonly referred to as inventories and registers. The structural similarities between finding aids helped to construct the model that was the basis of his draft finding aid DTD.

The March 1995 version of the Berkeley Finding Aid Project (BFAP) DTD, also known as the FINDAID DTD, defined a class of documents that, in general, consisted of an optional title page, the description of a unit of archival material, and optional back matter. A title page conforming to the draft DTD could include a number of elements, such as the identification of the repository or type of finding aid. A unit description conforming to the DTD could include a brief description of the unit (incorporating taggable elements analogous to those of a MARC catalog record), a longer narrative description of the unit and any segregable parts (including such taggable elements as title, dates, and scope and content), and formatted container lists.

As the original Berkeley DTD took shape, it was tested by encoding finding aids and then manipulating them electronically. By March 1995, a critical mass of encoding had been achieved, and the results (involving nearly two hundred finding aids from fifteen repositories) were shared with a group of fifty archivists and manuscript librarians. The participants in this initial test were invited to attend a three-day Finding Aids Conference (April 4-6, 1995) jointly sponsored by the Library of the University of California, Berkeley, and the Commission on Preservation and Access. Conference attendees observed that the SGML encoding of finding aids accessed locally or online via networks could simplify, improve, and expand access to archival collections by making it possible to link catalog records to finding aids. It also enabled searching among pools of networked finding aids and keyword access to locate folders or items previously buried in container lists. Attendees encouraged Pitti to pursue adoption of the approach as a standard by the archival profession.

Bentley Fellowship, July 1995

Hoping to strengthen the case for adoption by the archival community of an SGML-based encoding standard, Pitti sought the assistance of a team of experts in archival descriptive standards augmented by an expert in SGML encoding techniques, who agreed to critique and refine the BFAP approach. Members of the team whom Pitti assembled to analyze and evaluate the Berkeley work were Steven J. DeRose (Inso, Inc.; formerly Electronic Book Technologies), Jackie M. Dooley (University of California, Irvine), Michael J. Fox (Minnesota Historical Society), Steven L. Hensen (Duke University), Kris Kiesling (Harry Ransom Humanities Research Center), Janice E. Ruth (Library of Congress), Sharon Gibbs Thibodeau (National Archives and Records Administration), and Helena Zinkham (Library of Congress). Under the auspices of the Bentley Library Research Fellowship Program for the Study of Modern Archives, funded by the Andrew W. Mellon Foundation, the Division of Preservation and Access of the National Endowment for the Humanities, and the Bentley Historical Library, the Pitti-led team gathered for a week-long meeting (July 22-29, 1995) in Ann Arbor, Michigan. At this meeting, the group (later known as the Bentley team) agreed to collaborate in the production of 1) finding aid encoding standard design principles; 2) a revised finding aid data model; 3) a revised finding aid document type definition; 4) finding aid encoding guidelines and examples; and 5) an article describing the team's understanding of the structure and content of finding aids.

The Bentley team reached early agreement on the principles that would underlie their design of an encoding standard. With these principles in mind (originally designated the Ann Arbor Accords), the group proceeded to review the structure of the document to be encoded. They agreed that at the most basic level, a finding aid document consists of two segments: a segment that provides information about the finding aid itself (its title, compiler, compilation date) and a segment that provides information about a body of archival material (a collection, a record group, or a series). Following the example of the Text Encoding Initiative (TEI), the group designated the segment about the finding aid itself as the "header." Within the segment providing information about the described material (the actual finding aid), two types of information could be presented: 1) hierarchically organized information that describes a unit of records or papers along with its component parts or divisions and 2) adjunct information that may not directly describe records or papers but that facilitates their use by researchers (e.g, a bibliography). The hierarchy of descriptive information, reflecting archival principles of arrangement, generally begins with a summary of the whole and proceeds to delineation of the parts as a set of contextual views. Descriptions of the parts inherit information from descriptions of the whole.

Agreement on this overall structure enabled the developers to evaluate the encoded elements that had been incorporated in the BFAP model. Those elements that survived the evaluation process formed two categories: elements that would be tagged at specific, predictable points in the description of units or component parts (descriptive elements) and those elements that could be tagged anywhere within the document (generic elements). Generic elements usually are embedded within a descriptive element. The group agreed that when elements had a close analog in the TEI guidelines, the element name and, when appropriate the element content model, should be taken from the TEI guidelines.

A characteristic of SGML is the possibility of defining attributes and associating them with particular elements. The developers of the EAD concluded that the finding aid DTD should take full advantage of this possibility. Attributes could provide options to make an element more specific. A small set of basic elements could be expanded through attributes in lieu of creating a large set of specific elements. For example, an attribute associated with the personal name element can specify the role of the person as creator or collector, sender or recipient.

By combining descriptive and generic elements with attributes in a simplified document structure, the Bentley team was able to distill from the BFAP model the essential finding aid tag set. Within a few days of the July 1995 meeting in Michigan, Pitti began to recast the Ann Arbor Accords into a revised data model and finding aid DTD. It was at that time that the name "Encoded Archival Description", or EAD, was coined. The key changes introduced in Ann Arbor were: 1) the separation of information about the finding aid into a header; 2) the distinction between the hierarchically presented unit description information and adjunct information; and 3) the replacement of the BFAP model's collection divisions and materials lists with the more open-ended concepts of recursive "component description" and a "display group" element to bind pieces of text for display in tabular form.

The group working on the EAD DTD emphasized the importance of documentation, such as a tag library and application guidelines, to make the implementation of SGML viable. Such documentation needed to be "friendly" enough to enable users barely acquainted with SGML to apply the DTD both routinely and intermittently in their work. While the group focused on elements to ease conversion of traditional finding aids, it also reached for SGML techniques that could begin to improve the delivery of register and inventory information, particularly in an online environment. The team speculated about future possibilities, involving attachment of online "help" scripts to explain descriptive practice as reflected in finding aids, links to central glossaries and shared administrative histories, and presentation of new views that might transform hierarchical data into archival family trees.

Involvement of the Society of American Archivists

Among the topics discussed by the Bentley group were several associated with prospects for profession-wide adoption and maintenance of an encoding standard for finding aids. Recognizing that successful development of the DTD would require the participation of a broad community of archivists and archives users, the group planned to circulate widely both the Ann Arbor Accords and the revised data model based upon them. The annual meeting of the Society of American Archivists, held in late August 1995 provided an excellent forum for the presentation of concepts and ideas relating to EAD. The Society's Committee on Archival Information Exchange (CAIE) agreed to assume some responsibility for involving interested archivists. The CAIE established an EAD Working Group chaired by Bentley team member Kris Kiesling and consisting of all members of the original Bentley team (except Steven DeRose) as well as the following additional individuals: Randall Barry (Library of Congress Network Development and MARC Standards Office), Wendy Duff (University of Toronto), Ricky Erway (Research Libraries Group), Anne Gilliland-Swetland (University of California, Los Angeles), William E. Landis (University of California, Los Angeles), Eric Miller (OCLC Online Computer Library Center), Meg Sweet (Public Record Office, United Kingdom), Robert Spindler (Arizona State University), and Richard Szary (Yale University). The EAD Working Group accepted responsibility for monitoring and supporting the ongoing development of the EAD DTD, tag library, and application guidelines. Immediately following its annual meeting, the SAA Council agreed to submit a formal request to the Library of Congress Network Development and MARC Standards Office to serve as the maintenance agency for the EAD DTD.

Prelude to the EAD Alpha Release

In mid-October 1995, Daniel Pitti released for review an "early implementors' version" of an EAD data model and prototype DTD based on the work accomplished in Ann Arbor by the Bentley team. It was distributed to a small group for review and testing. Two weeks later, on November 1-3, 1995, a three-day meeting to refine the data model and DTD was held in Washington, D.C., under the sponsorship of the Library of Congress National Digital Library Program. Participants at this meeting included most of the original Bentley team, representatives from several Library of Congress divisions, Anne Gilliland-Swetland, and Debbie Lapeyre, an SGML expert with ATLIS Consulting Group. Based on decisions made at this meeting, ATLIS Consulting Group, under contract to the Library of Congress, began making revisions to the DTD. ATLIS also began work on the creation of an EAD tag library, a key piece of support documentation for implementors of the new DTD.

In early December, the Society of American Archivists received funding from the Council on Library Resources (later Council on Library and Information Resources) to create application guidelines for EAD. To that end, a subset of the Bentley team met on January 4-6, 1996, in Los Angeles, California, with Anne Gilliland-Swetland and Tom La Porte of Dreamworks SKG, who had been hired to write application guidelines for the DTD. The purpose of this meeting was to evaluate the ATLIS-revised EAD DTD, review the draft tag library, and outline the content of the application guidelines. Additional changes needed to EAD were identified and subsequently incorporated into what was to become the "alpha" version of the EAD DTD.

While the alpha version of the EAD DTD and support documentation was being finalized, the Library of Congress Network Development and MARC Standards Office formally agreed to serve as the maintenance agency for the EAD in a letter to Susan Fox, executive director of SAA. As the maintenance agency, the Library would make the DTD and support documentation available and act as a clearinghouse for communications on EAD, chiefly through the establishment of a listserv and World Wide Web site. SAA would be responsible for ongoing oversight of the standard.

Alpha Release Made Available

On February 26, 1996, the prototype EAD DTD was declared ready for release to early implementors as an "alpha" version. As with the alpha releases of computer software, this version of the EAD DTD was not advertised as perfect but was considered good enough to yield valuable results when applied to a variety of finding aids in diverse institutions. The alpha version DTD and a revised alpha tag library were made available at two sites, one at the University of California, Berkeley, and the other at the Library of Congress. The ability to obtain copies of the DTD and related documentation electronically helped to speed testing and sharing of test results.

Within the first few months of alpha testing, numerous archives and libraries marked up selected finding aids. These encoded documents generated valuable feedback to the EAD Working Group and strengthened the overall belief that the DTD could emerge as a standard for encoded archival descriptions. Interest internationally in the development of an SGML implementation for finding aids also appeared, particularly in Europe where use of SGML has been on the rise. A number of foreign archives and libraries obtained copies of the EAD DTD and subscribed to the EAD listserv. A translation of an earlier version of this report even appeared in an Italian archival journal.

Development and Release of the Beta DTD

The EAD Working Group originally planned for alpha testing to last approximately six months before analyzing feedback and preparing a beta version release during the second half of 1996. Work on the beta release started sooner than expected, however, when EAD developers convened in Berkeley, California, on April 27-29, 1996, for a three-day meeting sponsored by the Council on Library Resources and hosted by the University of California, Berkeley. The primary purpose for the April 1996 meeting was to provide an opportunity for the original Bentley team to meet with Anne Gilliland-Swetland and Tom La Porte to review their draft application guidelines and resolve problems with the DTD that had surfaced thus far during the alpha testing. Joining Gilliland-Swetland, La Porte, and the Bentley team, were Randall Barry of the Library of Congress Network Development and MARC Standards Office and Tim Hoyer and Jack Von Euw of the University of California, Berkeley. Reaffirmation of basic design principles, identification of additional modifications needed to the DTD, and a discussion of the content and style of the support documentation dominated the busy three-day meeting. The meeting participants also explored for the first time potential designs for an EAD logo to be used on printed and Web versions of the planned documentation.

Revisions to the alpha DTD, tag library, and application guidelines began immediately after the April 1996 meeting in California, with the revised goal of making a beta test version available later that summer. In-depth electronic mail discussions among the EAD developers continued throughout the spring and early summer, and by mid-June 1996, a draft of the beta DTD was completed. This draft version was modified slightly after SAA's annual meeting in late August, and a "final" beta version DTD became available in mid-September 1996. Several minor typographical modifications occurred in late November 1996, resulting in a date change to the September EAD files. A beta version tag library appeared in October 1996, and draft beta version application guidelines followed two months later. The EAD Working Group decided that no further changes would be made to the beta DTD for at least a year. The files remained stable to permit implementation and full testing by EAD Working Group members and participating institutions.

Beta Workshops, Orientation Sessions, and Outreach Efforts

Release of the beta DTD and its documentation greatly increased interest in EAD among institutions that had previously worked with the alpha version as well as from newcomers with little prior knowledge of the encoding standard. The EAD Working Group successfully fostered this interest. Kris Kiesling took a six-month leave of absence from her position at the Harry Ransom Humanities Research Center to advise the Research Libraries Group (RLG) on EAD-related issues and to assist RLG in creating a two-day workshop to help its member institutions implement the beta DTD. Kiesling and co-instructor Michael Fox developed the initial course with contributions from Daniel Pitti, Alvin Pollock, and Tim Hoyer at the University of California, Berkeley, and input from an advisory panel of early implementors, including Lisa Browar (New York Public Library), Steven L. Hensen (Duke University), Steven Mandeville-Gamble (Stanford University), Richard Szary (Yale University), Richard Masters (British Library) and Susan von Salis (Schlesinger Library), who joined Kiesling, Fox, Ricky Erway (RLG), and the Berkeley trio for a three-day meeting in Mountain View, California, held on May 23-25, 1996. Since the first workshop in July 1996, Kiesling and Fox have taught more than twenty workshops throughout the United States and in Canada, the United Kingdom, and Australia. These hands-on workshops have introduced more than four hundred archivists, librarians, and systems administrators to EAD, first under the aegis of RLG and since September 1997 as part of SAA's continuing education program.

Complementing the RLG/SAA workshops were a number of other training efforts undertaken by other working group members and early implementors: Daniel Pitti has taught three one-week EAD classes at the Rare Books School operated by the University of Virginia and three two-day workshops at various other institutions in the United States, United Kingdom, and Canada; Helena Zinkham, Janice Ruth, Mary Lacy, and Stephen Miller have offered half-day and full-day EAD workshops at various regional archives conferences; Wendy Duff, Anne Gilliland-Swetland, William Landis, Janice Ruth, and others have delivered lectures about EAD as part of graduate school courses in archives and library science; and numerous Working Group members and implementors have presented papers and talks about EAD at regional, national, and international conferences. In addition, as more repositories have begun to work with EAD, staff from those institutions have developed in-house orientation and training sessions to disseminate information about the DTD and to develop internal standards for local implementation. Within months of the beta release, Web sites quickly emerged to provide access to EAD-encoded finding aids and to share information about specific EAD applications and tools with the larger archival community. Implementors have also exchanged tips, advice, questions, and suggestions via the official EAD listserv maintained by the Library of Congress.

Solicitation and Review of Formal Comments

On June 23, 1997, nine months after releasing the beta version DTD, the EAD Working Group invited the archival community to submit to the EAD listserv formal comments and suggestions about changes to the beta version DTD. Originally intended to span only three months, the comment period was extended until mid-October to allow for fuller experimentation and testing of the DTD. Through the generous financial support of the Delmas Foundation, the SAA sponsored a three-day meeting, October 31-November 2, 1997, of the EAD Working Group to review and discuss the changes recommended on the listserv by EAD implementors. At this meeting, held in Washington, D.C., the EAD Working Group reviewed nearly fifty email messages sent from beta testers around the world, including the National Archives of Sweden, the British Public Records Office, the Bodleian Library, and the Canadian National Archives. The majority of the messages proposed potential changes to the DTD based upon experience with the beta version or analysis of its relationship to other archival data structures or content standards such as MARC, the International Standard for Archival Description General (ISAD-G), and the Canadian Rules for Archival Description (RAD). The merits of each proposed change were considered individually, and group consensus was reached by considering among other factors the global applicability of a proposed change, the amount of retrospective conversion a desired change would require, and whether other changes or existing DTD structures would achieve the same result more effectively.

The initial decisions made in Washington were compiled and in a few instances revisited and revised in the two months following the meeting. The Working Group prepared and released on January 30, 1998, two detailed email messages to the EAD listserv outlining both the changes that it had agreed to incorporate in the next release of the DTD (Version 1.0) and the proposals that it had declined to enact. The rationale for each decision was provided, and reaction from listserv readers was invited.

Preparation and Release of Version 1.0 of the EAD DTD

After notifying the archival community of its decisions, the EAD Working Group set about the task of modifying the beta version DTD and totally revising the existing beta tag library to reflect more accurately the proposed Version 1.0 structure. Delays occurred when competing responsibilities laid claim to team members' time and when the group decided that postponing the release of Version 1.0 might permit greater compatibility with the emerging Extensible Markup Language (XML) standard, which was just entering the final stages of development. At the Washington meeting in early November, the team had decided that if possible, it should make the EAD DTD compliant with XML in order to facilitate easier Internet access to SGML-encoded finding aids. As a more content-aware language than HTML, XML offers the potential for forthcoming versions of Web browsers like Netscape and Internet Explorer to display EAD-encoded finding aids in their native SGML without requiring helper applications like Panorama. Although parts of XML and its related standards XSL (Extensible Stylesheet Language) and XLL (Extensible Linking Language) still remain unclear, the EAD Working Group decided that XML development had reached sufficient stability to proceed with releasing Version 1.0 of the EAD DTD at the end of August 1998 to coincide with the SAA annual meeting, held August 31-September 6, 1998, in Orlando, Florida.

Since the Version 1.0 DTD represents significant changes from the previous beta version, implementors are encouraged to update their encoded documents as quickly as their resources permit. Archivists new to EAD should be mindful that institutions will be converting to Version 1.0 at different rates and that Web sites, other than the official EAD site at the Library of Congress, may contain documentation and finding aid examples that reflect superseded beta and alpha versions of the EAD DTD.

Availability of Tag Library and Other Documentation

Accompanying Version 1.0 of the DTD is a completely revised and updated EAD tag library, which was compiled by Working Group members during Spring and early Summer 1998. A printed copy of this soft-cover, 262-page publication is available for sale from SAA. An online version of the tag library, with links to corrections and updated comments, will be mounted on the official EAD Web site at the Library of Congress in Fall 1998.

Coinciding with the publication of the tag library and the Version 1.0 DTD are two special issues of SAA's quarterly journal, The American Archivist, which are devoted entirely to EAD. The first issue, which carries a publication date of Summer 1997, was mailed to SAA members in late August 1998 and contains "six papers that explore the context within which EAD was developed, the essentials of its structured approach to encoding finding aid data, and the role that EAD is meant to play in individual repositories and for the profession as a whole." The second issue, mailed a week later and identified as Fall 1997, contains six case studies on implementing and testing the beta version of the EAD DTD. Plans are underway to republish these two special American Archivist issues as a single hard-cover monograph, which will be marketed to a wider audience than SAA members and will be sold at a cost lower than the purchase price of two issues of the journal.

EAD 2002

With the arrival of the new millennium, the need to reconsider some existing SGML/XML elements and to examine certain design aspects of the EAD DTD had grown to the point where formal suggestions were solicited from users of the DTD. A series of 67 suggestions for changes and additions were received from users via a web-based suggestions form made public on the EAD Web site. The suggestions were consolidated into a list that was circulated internally and discussed during a special meeting of the EAD Working Group, held in Washington, DC, April 27-29, 2001. The meeting included representatives from Australia, Canada, France, the United Kingdom, and the United States--bearing witness to the international importance of this emerging standard.

The discussions resulted in the deprecation of only eight (8) EAD elements that had been part of the Version 1.0 (1998) EAD DTD. Much of the need to deprecate elements at all was due to a desire to keep the EAD DTD compatible with provisions of the General International Standard Archival Description (ISAD(G)). Changes were also the result of experimentation with the EAD DTD that had occurred since the release of Version 1.0 in 1998. A few new elements were also added to the EAD DTD. The revision of the DTD introduced some structural changes that unbundled certain pieces of information in a finding aid, thus facilitating a more logical arrangement of information. Restructuring also opened up certain existing elements for use inside elements where they had not been allowed before.

The availability of the 2002 version of the EAD DTD comes at a time when more and more users are moving from SGML to XML markup. The entire suite of DTD and entity reference files was reengineered to meet the needs of XML and related technologies that are currently in use. It is hoped that the new 2002 version of the EAD DTD is even more stable and useful than the popular 1998 version 1.0. As with related standards, the EAD DTD will certain continue to evolve to meet the needs of a growing user base.

This document was revised in December 2002 by Randall K. Barry, U.S. Library of Congress, to incorporate information about the EAD 2002 DTD development and release. The revised document incorporates text about the the August 1996 beta version originally written in 1995 by Sharon Gibbs Thibodeau, National Archives and Records Administration, in consultation with Daniel V. Pitti, University of California, Berkeley. Sections on Version 1.0 (1998) of the EAD DTD were written by Kris Kiesling, Harry Ransom Humanities Research Center, and Robert Spindler, Arizona State University.