Development of the Encoded Archival
Description DTD
Introduction
This paper presents general background information and a status report
on the development of the Encoded Archival Description Document Type
Definition (EAD DTD).
Choosing an Encoding Standard
Development of the EAD DTD began with a project initiated by the University
of California, Berkeley, Library in 1993. The goal of the Berkeley project
was to investigate the desirability and feasibility of developing a nonproprietary
encoding standard for machine-readable finding aids such as inventories,
registers, indexes, and other documents created by archives, libraries,
museums, and manuscript repositories to support the use of their holdings.
The project directors recognized the growing role of networks in accessing
information about holdings, and they were keen to include information
beyond that which was provided by traditional machine-readable cataloging
(MARC) records. The development of the EAD DTD was a cooperative venture
from early on, with specialists at Berkeley working in consultation with
experts at other institutions. Daniel Pitti, the principal investigator
for the Berkeley Project, developed requirements for the encoding standard
which included the following criteria: 1) ability to present extensive
and interrelated descriptive information found in archival finding aids,
2) ability to preserve the hierarchical relationships existing between
levels of description, 3) ability to represent descriptive information
that is inherited by one hierarchical level from another, 4) ability
to move within a hierarchical informational structure, and 5) support
for element-specific indexing and retrieval.
At the start of the project, candidates for meeting the requirements
for a standard encoding technique included Gopher presentation of flat
(i.e., unmarked) ASCII text, ASCII text marked up using HTML (HyperText
Markup Language) tags, MARC tagging using either existing or new implementations
of the MARC (Z39.2/ISO 2709) records structure, and markup conformant
to SGML (Standard Generalized Markup Language, ISO 8879). SGML emerged
from the analysis as being a technique able to meet all of the functional
requirements as well as one supported by a large and growing number of
software products available for a variety of operating systems. Pitti
and his colleagues at Berkeley chose to experiment with the use of SGML
in encoding a variety of archival finding aids from Berkeley and other
institutions.
Application of SGML
Standard Generalized Markup Language was chosen over other possible
solutions because of certain characteristics it possesses. SGML is a
set of rules for defining and expressing the logical structure of documents
thereby enabling software products to control the searching, retrieval,
and structured display of those documents. The rules are applied in the
form of markup (tags) that can be embedded in an electronic document
to identify and establish relationships among structural parts. Because
consistent markup of similarly structured documents is key to successful
electronic processing of them, SGML encourages consistency by introducing
the concept of a document type definition (or DTD). A DTD prescribes
the ordered set of SGML markup tags available for encoding the parts
of documents in a similar class. Archival finding aids, which share similar
parts and structure, form a class of documents for which a DTD could
be and was developed.
Pitti began development of the finding aids DTD by analyzing numerous
examples forwarded to Berkeley by archivists who had responded to requests
for cooperation. He found the greatest similarities in structure among
those finding aids commonly referred to as inventories and registers.
The structural similarities between finding aids helped to construct
the model that was the basis of his draft finding aid DTD.
The March 1995 version of the Berkeley Finding Aid Project (BFAP) DTD,
also known as the FINDAID DTD, defined a class of documents that, in
general, consisted of an optional title page, the description of a unit
of archival material, and optional back matter. A title page conforming
to the draft DTD could include a number of elements, such as the identification
of the repository or type of finding aid. A unit description conforming
to the DTD could include a brief description of the unit (incorporating
taggable elements analogous to those of a MARC catalog record), a longer
narrative description of the unit and any segregable parts (including
such taggable elements as title, dates, and scope and content), and formatted
container lists.
As the original Berkeley DTD took shape, it was tested by encoding finding
aids and then manipulating them electronically. By March 1995, a critical
mass of encoding had been achieved, and the results (involving nearly
two hundred finding aids from fifteen repositories) were shared with
a group of fifty archivists and manuscript librarians. The participants
in this initial test were invited to attend a three-day Finding Aids
Conference (April 4-6, 1995) jointly sponsored by the Library of the
University of California, Berkeley, and the Commission on Preservation
and Access. Conference attendees observed that the SGML encoding of finding
aids accessed locally or online via networks could simplify, improve,
and expand access to archival collections by making it possible to link
catalog records to finding aids. It also enabled searching among pools
of networked finding aids and keyword access to locate folders or items
previously buried in container lists. Attendees encouraged Pitti to pursue
adoption of the approach as a standard by the archival profession.
Bentley Fellowship, July 1995
Hoping to strengthen the case for adoption by the archival community
of an SGML-based encoding standard, Pitti sought the assistance of a
team of experts in archival descriptive standards augmented by an expert
in SGML encoding techniques, who agreed to critique and refine the BFAP
approach. Members of the team whom Pitti assembled to analyze and evaluate
the Berkeley work were Steven J. DeRose (Inso, Inc.; formerly Electronic
Book Technologies), Jackie M. Dooley (University of California, Irvine),
Michael J. Fox (Minnesota Historical Society), Steven L. Hensen (Duke
University), Kris Kiesling (Harry Ransom Humanities Research Center),
Janice E. Ruth (Library of Congress), Sharon Gibbs Thibodeau (National
Archives and Records Administration), and Helena Zinkham (Library of
Congress). Under the auspices of the Bentley Library Research Fellowship
Program for the Study of Modern Archives, funded by the Andrew W. Mellon
Foundation, the Division of Preservation and Access of the National Endowment
for the Humanities, and the Bentley Historical Library, the Pitti-led
team gathered for a week-long meeting (July 22-29, 1995) in Ann Arbor,
Michigan. At this meeting, the group (later known as the Bentley team)
agreed to collaborate in the production of 1) finding aid encoding standard
design principles; 2) a revised finding aid data model; 3) a revised
finding aid document type definition; 4) finding aid encoding guidelines
and examples; and 5) an article describing the team's understanding of
the structure and content of finding aids.
The Bentley team reached early agreement on the principles that would
underlie their design of an encoding standard. With these principles
in mind (originally designated the Ann Arbor Accords), the group proceeded
to review the structure of the document to be encoded. They agreed that
at the most basic level, a finding aid document consists of two segments:
a segment that provides information about the finding aid itself (its
title, compiler, compilation date) and a segment that provides information
about a body of archival material (a collection, a record group, or a
series). Following the example of the Text Encoding Initiative (TEI),
the group designated the segment about the finding aid itself as the "header." Within
the segment providing information about the described material (the actual
finding aid), two types of information could be presented: 1) hierarchically
organized information that describes a unit of records or papers along
with its component parts or divisions and 2) adjunct information that
may not directly describe records or papers but that facilitates their
use by researchers (e.g, a bibliography). The hierarchy of descriptive
information, reflecting archival principles of arrangement, generally
begins with a summary of the whole and proceeds to delineation of the
parts as a set of contextual views. Descriptions of the parts inherit
information from descriptions of the whole.
Agreement on this overall structure enabled the developers to evaluate
the encoded elements that had been incorporated in the BFAP model. Those
elements that survived the evaluation process formed two categories:
elements that would be tagged at specific, predictable points in the
description of units or component parts (descriptive elements) and those
elements that could be tagged anywhere within the document (generic elements).
Generic elements usually are embedded within a descriptive element. The
group agreed that when elements had a close analog in the TEI guidelines,
the element name and, when appropriate the element content model, should
be taken from the TEI guidelines.
A characteristic of SGML is the possibility of defining attributes and
associating them with particular elements. The developers of the EAD
concluded that the finding aid DTD should take full advantage of this
possibility. Attributes could provide options to make an element more
specific. A small set of basic elements could be expanded through attributes
in lieu of creating a large set of specific elements. For example, an
attribute associated with the personal name element can specify the role
of the person as creator or collector, sender or recipient.
By combining descriptive and generic elements with attributes in a simplified
document structure, the Bentley team was able to distill from the BFAP
model the essential finding aid tag set. Within a few days of the July
1995 meeting in Michigan, Pitti began to recast the Ann Arbor Accords
into a revised data model and finding aid DTD. It was at that time that
the name "Encoded Archival Description", or EAD, was coined. The key
changes introduced in Ann Arbor were: 1) the separation of information
about the finding aid into a header; 2) the distinction between the hierarchically
presented unit description information and adjunct information; and 3)
the replacement of the BFAP model's collection divisions and materials
lists with the more open-ended concepts of recursive "component description" and
a "display group" element to bind pieces of text for display in tabular
form.
The group working on the EAD DTD emphasized the importance of documentation,
such as a tag library and application guidelines, to make the implementation
of SGML viable. Such documentation needed to be "friendly" enough to
enable users barely acquainted with SGML to apply the DTD both routinely
and intermittently in their work. While the group focused on elements
to ease conversion of traditional finding aids, it also reached for SGML
techniques that could begin to improve the delivery of register and inventory
information, particularly in an online environment. The team speculated
about future possibilities, involving attachment of online "help" scripts
to explain descriptive practice as reflected in finding aids, links to
central glossaries and shared administrative histories, and presentation
of new views that might transform hierarchical data into archival family
trees.
Involvement of the Society of American Archivists
Among the topics discussed by the Bentley group were several associated
with prospects for profession-wide adoption and maintenance of an encoding
standard for finding aids. Recognizing that successful development of
the DTD would require the participation of a broad community of archivists
and archives users, the group planned to circulate widely both the Ann
Arbor Accords and the revised data model based upon them. The annual
meeting of the Society of American Archivists, held in late August 1995
provided an excellent forum for the presentation of concepts and ideas
relating to EAD. The Society's Committee on Archival Information Exchange
(CAIE) agreed to assume some responsibility for involving interested
archivists. The CAIE established an EAD Working Group chaired by Bentley
team member Kris Kiesling and consisting of all members of the original
Bentley team (except Steven DeRose) as well as the following additional
individuals: Randall Barry (Library of Congress Network Development and
MARC Standards Office), Wendy Duff (University of Toronto), Ricky Erway
(Research Libraries Group), Anne Gilliland-Swetland (University of California,
Los Angeles), William E. Landis (University of California, Los Angeles),
Eric Miller (OCLC Online Computer Library Center), Meg Sweet (Public
Record Office, United Kingdom), Robert Spindler (Arizona State University),
and Richard Szary (Yale University). The EAD Working Group accepted responsibility
for monitoring and supporting the ongoing development of the EAD DTD,
tag library, and application guidelines. Immediately following its annual
meeting, the SAA Council agreed to submit a formal request to the Library
of Congress Network Development and MARC Standards Office to serve as
the maintenance agency for the EAD DTD.
Prelude to the EAD Alpha Release
In mid-October 1995, Daniel Pitti released for review an "early implementors'
version" of an EAD data model and prototype DTD based on the work accomplished
in Ann Arbor by the Bentley team. It was distributed to a small group
for review and testing. Two weeks later, on November 1-3, 1995, a three-day
meeting to refine the data model and DTD was held in Washington, D.C.,
under the sponsorship of the Library of Congress National Digital Library
Program. Participants at this meeting included most of the original Bentley
team, representatives from several Library of Congress divisions, Anne
Gilliland-Swetland, and Debbie Lapeyre, an SGML expert with ATLIS Consulting
Group. Based on decisions made at this meeting, ATLIS Consulting Group,
under contract to the Library of Congress, began making revisions to
the DTD. ATLIS also began work on the creation of an EAD tag library,
a key piece of support documentation for implementors of the new DTD.
In early December, the Society of American Archivists received funding
from the Council on Library Resources (later Council on Library and Information
Resources) to create application guidelines for EAD. To that end, a subset
of the Bentley team met on January 4-6, 1996, in Los Angeles, California,
with Anne Gilliland-Swetland and Tom La Porte of Dreamworks SKG, who
had been hired to write application guidelines for the DTD. The purpose
of this meeting was to evaluate the ATLIS-revised EAD DTD, review the
draft tag library, and outline the content of the application guidelines.
Additional changes needed to EAD were identified and subsequently incorporated
into what was to become the "alpha" version of the EAD DTD.
While the alpha version of the EAD DTD and support documentation was
being finalized, the Library of Congress Network Development and MARC
Standards Office formally agreed to serve as the maintenance agency for
the EAD in a letter to Susan Fox, executive director of SAA. As the maintenance
agency, the Library would make the DTD and support documentation available
and act as a clearinghouse for communications on EAD, chiefly through
the establishment of a listserv and World Wide Web site. SAA would be
responsible for ongoing oversight of the standard.
Alpha Release Made Available
On February 26, 1996, the prototype EAD DTD was declared ready for release
to early implementors as an "alpha" version. As with the alpha releases
of computer software, this version of the EAD DTD was not advertised
as perfect but was considered good enough to yield valuable results when
applied to a variety of finding aids in diverse institutions. The alpha
version DTD and a revised alpha tag library were made available at two
sites, one at the University of California, Berkeley, and the other at
the Library of Congress. The ability to obtain copies of the DTD and
related documentation electronically helped to speed testing and sharing
of test results.
Within the first few months of alpha testing, numerous archives and
libraries marked up selected finding aids. These encoded documents generated
valuable feedback to the EAD Working Group and strengthened the overall
belief that the DTD could emerge as a standard for encoded archival descriptions.
Interest internationally in the development of an SGML implementation
for finding aids also appeared, particularly in Europe where use of SGML
has been on the rise. A number of foreign archives and libraries obtained
copies of the EAD DTD and subscribed to the EAD listserv. A translation
of an earlier version of this report even appeared in an Italian archival
journal.
Development and Release of the Beta DTD
The EAD Working Group originally planned for alpha testing to last approximately
six months before analyzing feedback and preparing a beta version release
during the second half of 1996. Work on the beta release started sooner
than expected, however, when EAD developers convened in Berkeley, California,
on April 27-29, 1996, for a three-day meeting sponsored by the Council
on Library Resources and hosted by the University of California, Berkeley.
The primary purpose for the April 1996 meeting was to provide an opportunity
for the original Bentley team to meet with Anne Gilliland-Swetland and
Tom La Porte to review their draft application guidelines and resolve
problems with the DTD that had surfaced thus far during the alpha testing.
Joining Gilliland-Swetland, La Porte, and the Bentley team, were Randall
Barry of the Library of Congress Network Development and MARC Standards
Office and Tim Hoyer and Jack Von Euw of the University of California,
Berkeley. Reaffirmation of basic design principles, identification of
additional modifications needed to the DTD, and a discussion of the content
and style of the support documentation dominated the busy three-day meeting.
The meeting participants also explored for the first time potential designs
for an EAD logo to be used on printed and Web versions of the planned
documentation.
Revisions to the alpha DTD, tag library, and application guidelines
began immediately after the April 1996 meeting in California, with the
revised goal of making a beta test version available later that summer.
In-depth electronic mail discussions among the EAD developers continued
throughout the spring and early summer, and by mid-June 1996, a draft
of the beta DTD was completed. This draft version was modified slightly
after SAA's annual meeting in late August, and a "final" beta version
DTD became available in mid-September 1996. Several minor typographical
modifications occurred in late November 1996, resulting in a date change
to the September EAD files. A beta version tag library appeared in October
1996, and draft beta version application guidelines followed two months
later. The EAD Working Group decided that no further changes would be
made to the beta DTD for at least a year. The files remained stable to
permit implementation and full testing by EAD Working Group members and
participating institutions.
Beta Workshops, Orientation Sessions, and Outreach Efforts
Release of the beta DTD and its documentation greatly increased interest
in EAD among institutions that had previously worked with the alpha version
as well as from newcomers with little prior knowledge of the encoding
standard. The EAD Working Group successfully fostered this interest.
Kris Kiesling took a six-month leave of absence from her position at
the Harry Ransom Humanities Research Center to advise the Research Libraries
Group (RLG) on EAD-related issues and to assist RLG in creating a two-day
workshop to help its member institutions implement the beta DTD. Kiesling
and co-instructor Michael Fox developed the initial course with contributions
from Daniel Pitti, Alvin Pollock, and Tim Hoyer at the University of
California, Berkeley, and input from an advisory panel of early implementors,
including Lisa Browar (New York Public Library), Steven L. Hensen (Duke
University), Steven Mandeville-Gamble (Stanford University), Richard
Szary (Yale University), Richard Masters (British Library) and Susan
von Salis (Schlesinger Library), who joined Kiesling, Fox, Ricky Erway
(RLG), and the Berkeley trio for a three-day meeting in Mountain View,
California, held on May 23-25, 1996. Since the first workshop in July
1996, Kiesling and Fox have taught more than twenty workshops throughout
the United States and in Canada, the United Kingdom, and Australia. These
hands-on workshops have introduced more than four hundred archivists,
librarians, and systems administrators to EAD, first under the aegis
of RLG and since September 1997 as part of SAA's continuing education
program.
Complementing the RLG/SAA workshops were a number of other training
efforts undertaken by other working group members and early implementors:
Daniel Pitti has taught three one-week EAD classes at the Rare Books
School operated by the University of Virginia and three two-day workshops
at various other institutions in the United States, United Kingdom, and
Canada; Helena Zinkham, Janice Ruth, Mary Lacy, and Stephen Miller have
offered half-day and full-day EAD workshops at various regional archives
conferences; Wendy Duff, Anne Gilliland-Swetland, William Landis, Janice
Ruth, and others have delivered lectures about EAD as part of graduate
school courses in archives and library science; and numerous Working
Group members and implementors have presented papers and talks about
EAD at regional, national, and international conferences. In addition,
as more repositories have begun to work with EAD, staff from those institutions
have developed in-house orientation and training sessions to disseminate
information about the DTD and to develop internal standards for local
implementation. Within months of the beta release, Web sites quickly
emerged to provide access to EAD-encoded finding aids and to share information
about specific EAD applications and tools with the larger archival community.
Implementors have also exchanged tips, advice, questions, and suggestions
via the official EAD listserv maintained by the Library of Congress.
Solicitation and Review of Formal Comments
On June 23, 1997, nine months after releasing the beta version DTD,
the EAD Working Group invited the archival community to submit to the
EAD listserv formal comments and suggestions about changes to the beta
version DTD. Originally intended to span only three months, the comment
period was extended until mid-October to allow for fuller experimentation
and testing of the DTD. Through the generous financial support of the
Delmas Foundation, the SAA sponsored a three-day meeting, October 31-November
2, 1997, of the EAD Working Group to review and discuss the changes recommended
on the listserv by EAD implementors. At this meeting, held in Washington,
D.C., the EAD Working Group reviewed nearly fifty email messages sent
from beta testers around the world, including the National Archives of
Sweden, the British Public Records Office, the Bodleian Library, and
the Canadian National Archives. The majority of the messages proposed
potential changes to the DTD based upon experience with the beta version
or analysis of its relationship to other archival data structures or
content standards such as MARC, the International Standard for Archival
Description General (ISAD-G), and the Canadian Rules for Archival Description
(RAD). The merits of each proposed change were considered individually,
and group consensus was reached by considering among other factors the
global applicability of a proposed change, the amount of retrospective
conversion a desired change would require, and whether other changes
or existing DTD structures would achieve the same result more effectively.
The initial decisions made in Washington were compiled and in a few
instances revisited and revised in the two months following the meeting.
The Working Group prepared and released on January 30, 1998, two detailed
email messages to the EAD listserv outlining both the changes that it
had agreed to incorporate in the next release of the DTD (Version 1.0)
and the proposals that it had declined to enact. The rationale for each
decision was provided, and reaction from listserv readers was invited.
Preparation and Release of Version 1.0 of the EAD DTD
After notifying the archival community of its decisions, the EAD Working
Group set about the task of modifying the beta version DTD and totally
revising the existing beta tag library to reflect more accurately the
proposed Version 1.0 structure. Delays occurred when competing responsibilities
laid claim to team members' time and when the group decided that postponing
the release of Version 1.0 might permit greater compatibility with the
emerging Extensible Markup Language (XML) standard, which was just entering
the final stages of development. At the Washington meeting in early November,
the team had decided that if possible, it should make the EAD DTD compliant
with XML in order to facilitate easier Internet access to SGML-encoded
finding aids. As a more content-aware language than HTML, XML offers
the potential for forthcoming versions of Web browsers like Netscape
and Internet Explorer to display EAD-encoded finding aids in their native
SGML without requiring helper applications like Panorama. Although parts
of XML and its related standards XSL (Extensible Stylesheet Language)
and XLL (Extensible Linking Language) still remain unclear, the EAD Working
Group decided that XML development had reached sufficient stability to
proceed with releasing Version 1.0 of the EAD DTD at the end of August
1998 to coincide with the SAA annual meeting, held August 31-September
6, 1998, in Orlando, Florida.
Since the Version 1.0 DTD represents significant changes from the previous
beta version, implementors are encouraged to update their encoded documents
as quickly as their resources permit. Archivists new to EAD should be
mindful that institutions will be converting to Version 1.0 at different
rates and that Web sites, other than the official EAD site at the Library
of Congress, may contain documentation and finding aid examples that
reflect superseded beta and alpha versions of the EAD DTD.
Availability of Tag Library and Other Documentation
Accompanying Version 1.0 of the DTD is a completely revised and updated
EAD tag library, which was compiled by Working Group members during Spring
and early Summer 1998. A printed copy of this soft-cover, 262-page publication
is available for sale from SAA. An online version of the tag library,
with links to corrections and updated comments, will be mounted on the
official EAD Web site at the Library of Congress in Fall 1998.
Coinciding with the publication of the tag library and the Version 1.0
DTD are two special issues of SAA's quarterly journal, The American
Archivist, which are devoted entirely to EAD. The first issue, which
carries a publication date of Summer 1997, was mailed to SAA members
in late August 1998 and contains "six papers that explore the context
within which EAD was developed, the essentials of its structured approach
to encoding finding aid data, and the role that EAD is meant to play
in individual repositories and for the profession as a whole." The second
issue, mailed a week later and identified as Fall 1997, contains six
case studies on implementing and testing the beta version of the EAD
DTD. Plans are underway to republish these two special American Archivist issues
as a single hard-cover monograph, which will be marketed to a wider audience
than SAA members and will be sold at a cost lower than the purchase price
of two issues of the journal.
EAD 2002
With the arrival of the new millennium, the need to reconsider some
existing SGML/XML elements and to examine certain design aspects of the
EAD DTD had grown to the point where formal suggestions were solicited
from users of the DTD. A series of 67 suggestions for changes and additions
were received from users via a web-based suggestions form made public
on the EAD Web site. The suggestions were consolidated into a list that
was circulated internally and discussed during a special meeting of the
EAD Working Group, held in Washington, DC, April 27-29, 2001. The meeting
included representatives from Australia, Canada, France, the United Kingdom,
and the United States--bearing witness to the international importance
of this emerging standard.
The discussions resulted in the deprecation of only eight (8) EAD elements
that had been part of the Version 1.0 (1998) EAD DTD. Much of the need
to deprecate elements at all was due to a desire to keep the EAD DTD
compatible with provisions of the General International Standard Archival
Description (ISAD(G)). Changes were also the result of experimentation
with the EAD DTD that had occurred since the release of Version 1.0 in
1998. A few new elements were also added to the EAD DTD. The revision
of the DTD introduced some structural changes that unbundled certain
pieces of information in a finding aid, thus facilitating a more logical
arrangement of information. Restructuring also opened up certain existing
elements for use inside elements where they had not been allowed before.
The availability of the 2002 version of the EAD DTD comes at a time
when more and more users are moving from SGML to XML markup. The entire
suite of DTD and entity reference files was reengineered to meet the
needs of XML and related technologies that are currently in use. It is
hoped that the new 2002 version of the EAD DTD is even more stable and
useful than the popular 1998 version 1.0. As with related standards,
the EAD DTD will certain continue to evolve to meet the needs of a growing
user base.
This document was revised in December 2002 by Randall K. Barry, U.S.
Library of Congress, to incorporate information about the EAD 2002 DTD
development and release. The revised document incorporates text about
the the August 1996 beta version originally written in 1995 by Sharon
Gibbs Thibodeau, National Archives and Records Administration, in consultation
with Daniel V. Pitti, University of California, Berkeley. Sections on
Version 1.0 (1998) of the EAD DTD were written by Kris Kiesling, Harry
Ransom Humanities Research Center, and Robert Spindler, Arizona State
University.
|