Organizing the Global Digital Library Conference
Library of Congress
Digital Library Visitors' Center
Madison Building
December 11, 1995
Sarah Thomas, chair of the conference, welcomed the participants by
thanking the cosponsors of the event, the Council on Library Resources
and the National Digital Library Program of the Library of Congress. The
purpose of the conference was to form a consensus on a list of principles
and assumptions regarding the nature of organization in the digital library
of the future. The use of the word "organizing" rather than
"cataloging" was a deliberate choice when titling the conference.
With the critical mass of digital resources, we will have to make decisions
regarding which resources warrant bibliographic "control" in
the traditional sense and which can be organized in ways that users can
access through gateways to internet resources. Other goals of the meeting
were to identify the challenges and difficulties which impede progress;
promote a coalescence of current efforts regarding the organization of
digital resources by drawing on the collective expertise of those gathered;
identify the expectations for digital libraries in the future; and ideally,
develop an action plan for moving efforts forward. Although one of the
goals of the conference was to develop the librarian/technologist partnership
and identify areas for collaboration, the action items which developed
are intentionally librarian-centric in order to provide avenues for the
library community to become more involved in this arena.
The twenty-three participants represented a diverse group with numerous
connections to other task forces and organizations which are actively
engaged in resolving issues closely related to the topic of the meeting,
whether they are computer scientists, social scientists, consultants,
representatives of funding agencies, or librarians associated with research
and national libraries. Among the groups or projects to which the participants
had close links were the Association for Library Collections & Technical
Services Task Force to Define Bibliographic Access in the Electronic Environment,
the Coalition for Networked Information's group preparing a white paper
on "Networked Information: Discovery and Retrieval," the Encoded
Archival Description (EAD) project initiated by the University of California
at Berkeley, the IFLA Study Group on the Functional Requirements of the
Bibliographic Record, the Internet Engineering Task Force, Joint Steering
Committee for the Revision of AACR (i.e. the Anglo-American Cataloguing
Rules), the LC National Digital Library Program, the National Digital
Library Federation Planning Task Force, the National Science Foundation
digital library projects, the OCLC InterCat project, and the Research
Libraries Group's Digital Image Access Project. In addition, participants
had organized, attended or been speakers at the American Library Association's
preconference "AACR 2000," the Center for Electronic Texts in
the Humanities Text Encoding Initiative May 1994 conference, and the OCLC
Metadata Workshop. Representatives from the British Library and the National
Library of Canada provided an international perspective.
Assumptions become Principles
In preparation for the conference, the participants had been invited
to submit lists of "digital assumptions" which would challenge
the thinking of the group and stimulate discussion. Some of these assumptions
appeared often enough to be considered consensus statements. Upon further
discussion and clarification of terminology, the following statements
were accepted as principles:
- Libraries exist to provide value-added services to a wide variety
of materials, including:
- selection
- organization
- access
- location information
- delivery, and
- preservation.
- Libraries will include a mix of traditional materials (print and
non-print) and digital resources indefinitely.
- Library collections will continue to be only subsets of the universe
of publications, resources, and information.
- Like traditional materials, digital resources will have more value
and utility if they are organized, making resources known and available.
- Libraries should integrate access to digital resources with access
to conventional materials.
- Genre is a more useful organizing principle than format.
- Information seekers benefit from self-indexing resources, producer-generated
access, and librarian-generated access.
- Librarians will continue to use judgment in applying varying levels
of description and access, as appropriate to each resource, in order
to provide retrieval of relevant resources in a cost-efficient manner.
Important Themes
Several major themes emerged during the course of the conference. Some
of the discussion points are identified below, although it should be noted
that there may not have been a consensus among the participants regarding
these points. Many elements of the themes overlap. They have been divided
into the following categories for ease of presentation:
Integration
Although the title of the conference included the words "global
digital library," it was noted that libraries in the future will
not be digital only, but an integration of traditional library materials
(print and non-print) with digital resources, and that the catalogs of
the future should integrate access to all materials. The links and relationships
between traditional and digital resources which librarians can provide
should enable users, be they sophisticated or naive, to retrieve all relevant
materials. These links are also essential in the context of developing
a dynamic bibliographic model (under investigation by IFLA's Bibliographic
Control Study Group on Functional Requirements of the Bibliographic Record)
and informing users of the existence of multiple versions. Integration
can also work in reverse, as users may also find segregation of resources
to be an important approach. The library "catalog" of the future
will be a collection of traditional bibliographic records and a gateway
to networked information. What tools need to be developed to allow for
integration of existing MARC-based records with networked information?
Selection
Just as libraries have only "collected" a subset of the universe
of conventional library materials, they can reasonably be expected to
select only a subset of available digital resources to organize and integrate,
not only because economic resources to organize are limited, but so is
the relevance of these materials to users and the collections of different
libraries with which they are meshing. As is true for print materials,
libraries will need to rely on producers of digital information in order
to identify resources for selection. Although concern is frequently heard
about the tremendous volume of resources on the Net, many of these resources
are of the type traditionally collected by archives while others are more
library related, and some are intentionally ephemeral and not intended
to be preserved for future use. However, we do not know what the digital
resources of the future will look like, as new genres will continue to
develop. Librarians will need to select and organize some resources individually,
while "organization" for other materials may mean providing
links from the catalog to network tools, such as indexes of home pages.
There are likely to be more network resources than we had traditional
materials, therefore, cataloging by a skilled cataloger may be a decreasing
portion of the total materials.
Organizing for Access
Three major approaches for organizing emerged in this discussion:
- Augment cataloging data through human intervention. This value-added
service which librarians can provide is most closely related to traditional
cataloging, by augmenting the catalog with added fields and/or links
and connections to resources. This approach enhances the traditional
catalog by making it an entry point, or gateway, which will allow for
stitching together, or meshing, traditional catalog records with finding
aids, tiered access, new software tools, and services. There is growing
consensus that the current cataloging tools (especially the Anglo-American
Cataloguing Rules, 2nd ed., rev.) may not be adequate to describe all
digital resources, but whether these tools will continue to "evolve"
to cover digital resources or whether they will need to be heavily re-engineered
is not yet apparent.
- Maximize use of Internet software tools. Libraries could improve
catalog-type access to Internet resources by providing input to the
software developers who are building tools such as those used to create,
manage, and monitor links. We should find out what is missing (such
as classification and controlled vocabularies) and provide expertise
in these areas. It will be a challenge to recognize and integrate these
tools.
- Find ways to expand the use of metadata that forms part of the digital
object. To increase the self-indexing data available for manipulation,
librarians should include metadata in digital resources and develop
mechanisms for integrating different forms of metadata (MARC, TEI, EAD,
etc.). Libraries should identify incentives (e.g., copyright, patent,
revenue, prestige) for creators to produce useful metadata and provide
feedback to those who develop and apply metadata. Although metadata
efforts are more advanced for digital text material (such as those employing
the TEI header), other digitized resources (such as text bit-mapped
images) could also benefit from metadata schemes.
Collection/Archiving
The traditional concept of "collection" is changing in the
digital arena. Archiving digital resources is another important value-added
service libraries must provide to guarantee a future of enduring access
and to develop a culture of stability. The idea of cooperating in this
collection task is more important in the digital environment, as we may
need to entrust our digital future to the good will of those in the Internet
community to preserve and store digital resources. These "pools of
quiescence" where standards are emerging and practices are stabilizing
are already being developed by libraries, organizations, and producers,
but the issue of delegation of archival responsibilities is one which
libraries need to resolve in cooperation with other stakeholders. As with
conventional materials, some will have value over time, and the archives
need to be available to mine in the future, although current and future
uses may be different. Storage costs for digital archiving are dropping,
making it reasonable to collect "copies" of some resources.
Reformatting and refreshing of digital resources, however, is an expensive
operation and tools are scarce.
Volatility/Stability
The often-discussed volatile nature of the Internet and its resources
causes great alarm among librarians charged with cataloging these resources.
Volatility can occur as changes to information resources (frequent iterations),
although this can be beneficial in fields where researchers need to view
and communicate about research in progress or where the research front
advances rapidly or unevenly. Another area of volatility is the changing
location of information resources. One of the value-added services which
libraries should provide via selection, collection, and organization,
is stability in the sense of a mechanism to resolve addresses of valued
internet sites to ensure long-term ready access. Volatility is nothing
new to librarians (e.g., serial publications are constantly changing names
and publication patterns), however with on-line information, the previous
versions are sometimes deleted without warning and lost forever. Stability
and change are two separate qualities (e.g., a newspaper is a stable publication,
but it changes every day), and libraries should not eschew responsibility
for digital materials merely because of their dynamic nature.
Libraries must also make use of methods to archive iterations in some
digital resources in order to preserve these resources over time, and
should also cooperate with the producers of digital resources to maximize
stability. Libraries will have to take the lead in defining standards
for stability and working with producers to achieve them. Producer notification
whenever a resource changes would be a useful addition to capabilities/tools
which currently exist (such as taking and storing snapshots of data, web
pages, etc.) which have not been used extensively by libraries but should
be examined for their application. Fundamental issues related to "granularity"
and "versioning" need to be examined in order to determine what
constitutes a "work" or an "edition" in the digital
environment.
Issues for the Future
Those involved in digital library efforts need to plan for the digital
library of the future, in addition to addressing immediate challenges.
We don't necessarily know what digital resources will look like in the
future, as "genres" are still developing and a culture of unstable
transition needs to be considered in the context of the things libraries
will continue to do, such as collect and organize print materials. Given
that we will continue to have limited resources, how will we make the
best use of these resources? Libraries are going to need to make choices--is
there a right choice and a wrong choice? There was certainly consensus,
often repeated, that the "wrong" choice is to do nothing.
Most of the discussion focussed on information created originally in
digital format ("digital resources" in the sense of Internet
information sites), but libraries are also playing a large role in "digitizing,"
or producing conversions from hard-copy to digital formats, or CD-ROM
digital resources. There exists an unproven assumption that conversion
to digital form will increase access. Selection procedures to determine
what to digitize need to be developed, although they will vary institution
by institution depending on specific priorities. One suggestion is to
digitize certain resources "on demand" so as not to waste scarce
resources.
Action Items and Recommendations
Libraries should:
- Lead by working with non-library groups and producers of digital resources
to assure that developments progress with the needs of libraries in
mind, particularly as related to self- indexing concepts. To begin with,
libraries should contact the Internet Engineering Task Force (IETF)
and find out how they can be more involved. [http://www.ietf.cnri.reston.va.us/home.html]
- Review the work of the six projects in the NSF/NASA/ARPA Digital
Library Project. [http://www.grainger.uiuc.edu/dli/national.htm]
- Attend and participate in the Digital Libraries '96 conference sponsored
by the Association for Computing Machinery, to be held March 20-23,
1996, Bethesda, MD. [http://fox.cs.vt.edu/DL96]
- Consider ways to make contact with the IETF planning groups and the
technical community. Plan a workshop (under the auspices of the Corporation
for National Research Initiatives [http://www.cnri.reston.va.us/])
which would involve librarians, those involved with the D-Lib group
[http://www.dlib.org], producers of
digital resources, and researchers, which would examine how the traditional
skills of catalogers and developing computer tools can be coordinated.
- Participate in one of the Uniform Resource Name (URN) testbed projects
which are commencing in the next few months (LC- NDL is participating
in one of the testbeds). [contact urn@mordred.gatech.edu]
- Continue work to identify data elements which aid in machine-retrieval
for automated metaindex searching, such as those projects at the University
of Michigan [http://http2.sils.umich.edu/UMDL/HomePage.html],
Stanford University [http://diglib.stanford.edu/diglib],
and uses of the Dublin Core metadata set. Are new projects needed?
- Track the feedback on the field tests of current metadata schemes,
such as OCLC's Spectrum project, and work at the National Library of
Australia.
- Encourage libraries to experiment with the inclusion of metadata
in their electronic publications and projects, such as LC's National
Digital Library. Develop models as requirements for digitizing contractors.
- Identify areas where libraries can capture data to apply metadata
schemes and/or other tools, such as:
- finding aids
- dissertations
- university press publications.
- Make use of OCLC's InterCat [http://www.oclc.org:6990]
database as a testbed to examine such things as URLs/URNs, MARC and
non-MARC mixes, and mechanisms for capturing user-supplied data. Structure
and implement user tests.
- Examine creative applications for displaying and/or mapping to MARC
format (e.g., capture Lycos records and create rudimentary MARC records
for display in catalogs, or SGML to MARC mapping).
- Agree upon methods for carrying metadata in digital resources (as
the TEI header does for some SGML documents). Possibly issue informational
Request for Comment (RFC) on the topic and shepherd through IETF or
NISO. Also need to develop tools for application of the metadata schemes
and make them available to those willing to test them.
- Explore continued relevance and application of cataloging practices
and rules at meetings such as the OCLC Intercat Project Colloquium at
ALA Midwinter in San Antonio [http://www.oclc.org/oclc/man/catproj/announce.htm]
and the international conference of cataloging experts being planned
for 1997 by the Joint Steering Committee for the Revision of AACR.
- Monitor D-LIB magazine [http://www.dlib.org]
as a good source of information on current digital library research.
Participants
Sarah Thomas (Chair)
Acting Director for Public Service Collections
Library of Congress
Duane Arenales
Chief, Technical Services Division
National Library of Medicine
Bill Arms
Corporation for National Research Initiatives
Ross Atkinson
Associate University Librarian
Cornell University
John Byrum
Chief, Regional & Cooperative Cataloging Division
Library of Congress
Alan Danskin
Head, Authority Control
British Library
Beth Davis-Brown
National Digital Library
Library of Congress
Peter Deutsch
President, Bunyip Information Systems Inc.
Stephen James
Chief, Humanities and Social Sciences Division
Library of Congress
Erik Jul
Manager, Customer Services
OCLC, Inc.
Glenn LaFantasie
Program Officer
Council on Library Resources
Sandy Lawson
Assistant to the Director for Public Service Collections
Library of Congress
David Levy
Member, Research Staff
Xerox Palo Alto Research Center
Clifford Lynch
Director, Library Automation
University of California
Carol Mandel
Deputy University Librarian
Columbia University
Susan Morris
Assistant to the Director for Cataloging
Library of Congress
Ingrid Parent
Director General, Acquisitions and Bibliographic Services
National Library of Canada
Brian Schottlaender
Associate University Librarian for Collections & Technical Services
University of California, Los Angeles
Barbara Tillett
Chief, Cataloging Policy and Support Office
Library of Congress
Linda West
Director for Member Services and Support
Research Libraries Group
Beacher Wiggins
Acting Director for Cataloging
Library of Congress
Jennifer A. Younger
Assistant Director for Technical Services and Liaison to the Regional
Campus Libraries
The Ohio State University
Helena Zinkham
Head, Processing Section, Prints and Photographs Division
Library of Congress
revised 01-16-95
|