ASSESSMENT OF FORMATS AND STANDARDS FOR THE CREATION, DISSEMINATION, AND PERMANENT ACCESSIBILITY OF ELECTRONIC GOVERNMENT INFORMATION PRODUCTS

ASSESSMENT OF FORMATS AND STANDARDS
FOR THE
CREATION, DISSEMINATION, AND PERMANENT
ACCESSIBILITY OF ELECTRONIC GOVERNMENT
INFORMATION PRODUCTS

PHASE I DELIVERABLES

SUBMITTED BY:

Marjory S. Blumenthal
Alan S. Inouye

Computer Science and Telecommunications Board
National Research Council

16 July 1997

This is a staff working paper. It has not been reviewed by the National Research Council and does not reflect the institutional views of the NRC in any way.


TABLE OF CONTENTS

Introduction

List of Acronyms and Terms

Study Framework

Statement of Work for Phase II

Resources


INTRODUCTION

This “Assessment of Formats and Standards for the Creation, Dissemination, and Permanent Accessibility of Electronic Government Information Products” is being conducted by the U. S. National Commission on Libraries and Information Science under an Interagency Agreement with the U. S. Government Printing Office that was approved by the Joint Committee on Printing. The Computer Science and Telecommunications Board (CSTB) of the National Research Council has been selected to participate throughout the study.

Information gathered in this Assessment is to be used to facilitate improved public access to electronic Federal Government information made available through the Federal Depository Library Program and could be used to facilitate improved public access to electronic Federal Government information in general. According to the Interagency Agreement, this Assessment is expected to identify formats most appropriate for dealing with electronic information products throughout their life cycles, evaluate the plans of agencies, assess the cost-effectiveness and usefulness of various electronic formats, and assess formats most conducive to maintaining permanent accessibility.

The study is divided into three phases. In Phase I, CSTB develops a detailed statement of work (presented in this document) that defines the data collection process required to conduct this Assessment. Contractors other than CSTB will be engaged to perform the research and data collection in Phase II. For Phase III, CSTB will draw upon experts to review the data and develop conclusions and recommendations.

As indicated, the detailed statement of work is contained within this document. However, the larger framework for this study also needed to be sufficiently developed so that the appropriate data is defined for collection in Phase II. This preliminary framework, which will be elaborated upon and refined in Phase III, is also contained in this document. The preliminary framework will be used to develop a prospectus for Phase III, a necessary step towards establishing a formal CSTB project. The list of resources identified throughout Phase I, including all consultations, concludes this document.

This document is a staff product. It is not an NRC report.


DEFINITION OF ACRONYMS AND TERMS

ASCIIAmerican Standard Code for Information Interchange
CD-ROMCompact Disk-Read Only Memory
CENDICommerce, Energy, NASA/NLM, Defense, Interior
DTICDefense Technical Information Center
ERICEducational Resources Information Center
FED-STDSFederal Telecommunications Standards
FIPSFederal Information Processing Standards
GILSGovernment Information Locator Service
HTMLHyperText Markup Language
NTISNational Technical Information Service
OMBOffice of Management and Budget
PDFPortable Document Format
SDTSSpatial Data Transfer Standard
SGMLStandard Generalized Markup Language
TIFFTagged Image File Format
WWWWorld Wide Web

dBASEa relational data base format
Electronic Government
Information Products
discrete sets of Government information, either conveyed
through tangible (i.e., physical) electronic media, or made
publicly accessible via a Government electronic
information service. Electronic Government information
products comprise of one or more information containers
and may be made up of multiple formats.
Formata specification for organizing data or information, such
as TIFF or ASCII
GovernmentThe Federal Government of the United States--
Three Branches
Information Containerinformation that is organized in some way so that it may be
interpreted by others.
Print examples include articles, brochures, and technical
reports. Electronic examples include an interactive map
on a WWW site and an article in PDF. Information
containers will generally have one or few formats.
Media the physical or electronic means by which information is
communicated--CD-ROM, floppy disk,
telecommunications channel, etc.
WWW SiteFile or group of files organized under a home page that is
accessible through browser software on the WWW. The
home page is typically an index, welcome, or menu WWW
page for a distinctive activity or service. A WWW site
could be an agency’s entire presence on the WWW.


ASSESSMENT OF FORMATS AND STANDARDS FOR THE CREATION, DISSEMINATION, AND PERMANENT ACCESSIBILITY OF ELECTRONIC GOVERNMENT INFORMATION PRODUCTS

STUDY FRAMEWORK

This is a preliminary concept paper. It is not an NRC report.

Introduction

Advances in computer and telecommunications technology fundamentally alter information dissemination. Established concepts such as the definition of a book or directory become unclear in the context of analogues on the WWW. Users in the electronic environment need to know a lot more about the processes of information dissemination than in the print world. In the print world, you can a purchase a hardcover book with quality paper and binding, or an inexpensive paperback version produced with lower quality materials. Your “browser” are those optical wonders located just above your nose and they are very easy to use--you don’t need any documentation or training.

In the electronic world, however, a user needs to know a great deal about the technology associated with information dissemination. Can I read the format? Is it in PDF? Microsoft Word 6.0? Do I have enough memory? Hard disk space? Do I have the optimal WWW browser? Is my Internet connection fast enough? Why is my computer crashing when I try to read the document? Where can I print it out? Do I need special conversion or printer driver software? How much does it cost? And so on.

The purpose of this study is to examine some of the issues concerning access to electronic Government information and to derive conclusions and recommendations to improve public access What are the different information containers (ways to package information such as a magazine article), media, and formats that are used and why? What are the issues concerning the evolution towards standards or other means to simplify access thereby easing the burden on information users?

The context for this study are those electronic Government information products available to federal depository libraries. Because the Federal Depository Library Program represents a diverse range of libraries--small and large, rural and urban, public and research/academic, those with minimal and extensive information technology resources--the findings are also likely to be generally applicable to the users of electronic Government information products.

There are a number of institutional issues. A key concern is cost. What are the costs involved in electronic dissemination over the life cycle of information? How do these costs vary with different types of information containers, media, and formats? What is the cost impact in the transition to a primarily electronic environment on the users of Government information.

There are also a number policy issues regarding the roles of Government and private sector information intermediaries in the electronic environment. An important issue that requires clarification is ensuring the permanent accessibility of electronic Government information. What are the issues? How is permanent accessibility achieved? Who should be responsible? And at what cost?

What is an Electronic Government Information Product?

Familiar artifacts populate the world of paper-based or other non-electronic Government publications, such as books, newsletters, journals, articles, and technical reports. Government records constitute the evidence of governance and include both publications and other records that are typically not intended for general dissemination to others. An example of the latter could be the files describing the grant recipients of a federal matching grant program for cities.

In the transition to electronic information--tangible (i.e., physical information products such as CD-ROMs) or intangible (e.g., WWW)--some of the customary definitions of publications continue to be useful. For example, the concept of an article remains intact as an exploration of a specific topic that can be read within an hour or so (loosely speaking). Sections of an article might appear on individual WWW pages linked to the table of contents page, but the essence of the article as a conceptual whole persists.

Other publications from the non-electronic regime, however, may make less sense in electronic formats because of the improved alternatives that electronic technology make possible. These new means of electronic information dissemination and the existing print and electronic publications may all be described more generally as information containers. An electronic Government information product comprises of one or more electronic information containers.

The transition to electronic information has other implications. For example, the distinction between information dissemination (an intention to make information available to the public) and information disclosure (an agency response to a specific request) can become unclear when both activities may be conducted at the same WWW site. Those agencies that sell information to a limited constituency may find that electronic technology provides both new capabilities and challenges (an example of the latter is the ease of electronic copying). In addition, the act of print publication implies a previous assessment that the information is worthy of dissemination because the publication process requires significant effort. From a technological point of view, less effort is usually required to put up the same information on the WWW. Electronic information can also be easily revised and so preliminary or draft versions, which would not have been distributed in paper form, may often be made available on the WWW, which also raises concerns about what information should be publicly available.

Question #1: What are the different types of non-electronic information products and information containers used by the Government?

Question #2: Of the information containers in a non-electronic world that are used by the Government, which ones have continuing applicability in the tangible electronic information world? In the intangible electronic information world?

Question #3: Are there new information containers in the electronic information world (tangible or intangible) that do not exist in the non-electronic world?

Question #4: What would be a taxonomy of information products and information containers for tangible and intangible electronic Government information?

Question #5: How much information is disseminated via each of the information containers?

Question #6: Describe the editorial process for on-line / WWW based electronic information dissemination and whether that process differs from the editorial process for print publications.

The Optimal Media Mix for Government Information Products

It is a not uncommonly held view that Government information will and should be largely disseminated via on-line technology (e.g., WWW) in the not-so-distant future. The implication with this view is that paper, microfiche, CD-ROMs, and other tangible information products should be used less and less often for Government information dissemination. This proposition deserves closer scrutiny.

A transition to primarily electronic dissemination may affect access to Government information. Individuals and organizations obtain Government information from a variety of sources from articles in newspapers and segments on television news programs to visits to agency regional offices or federal depository libraries, and many more. Access is also dependent on the resources available to an individual or organization, such as the proximity to libraries and other information institutions and nature of computing resources available. Some federal depository libraries will be able to offer improved services that technological innovation make possible. Other federal depository libraries may encounter difficulty in securing the resources to provide reasonable access to electronic Government information products to their users. Thus, the increasing electronic dissemination of Government information may facilitate access for some and may deter access for others.

There is a paucity of analyses of the life cycle costs of electronic dissemination and even less when seeking cost comparisons with paper or other dissemination alternatives. There is, however, no shortage in the belief that electronic dissemination has much lower life cycle costs than print dissemination. The introduction of computers into organizations--whether as data processing, office automation, or management information systems--was often based upon cost savings arguments. It is unclear whether these cost savings have been realized. In the present instance, therefore, we should be wary of the cost savings argument in the absence of a robust analytical model and supporting data--which will be addressed in greater depth in a subsequent section of this report.

The caveats above are not intended to minimize the considerable advantages of on-line access to Government information, but rather to frame the question so that other potentially viable media are considered. Media selection should facilitate access to Government information and so the needs and technological capabilities of the intended audience should be in harmony with the medium of dissemination. Insofar as it is feasible, media selection should also facilitate unintended use of Government information.

Question #7: What are the different kinds of media and the mix of media used for information dissemination by Government agencies?

Question #8: How much information is disseminated for each media type?

Question #9: What is the expected future mix?

Question #10: What are the criteria used in determining the mix?

Question #11: How do individuals and organizations gain access to Government information?

Question #12: What are the factors that affect an individual’s or organization’s easy access to Government information?

Question #13: How does the transition to primarily electronic dissemination affect access to Government information?

Information Formats and Standards

A wide variety of Government information--data/statistics, text, geo-spatial, graphics, multimedia, audio, video, and combinations thereof--is disseminated using a multitude of formats--such as HTML, PDF, or ASCII. It is left to the user to determine how to access the information of interest. Reducing the number of different formats will serve to facilitate access to Government information by users and streamline information exchange among agencies of the Government.

OMB Circular A-119, which addresses the use and adoption of voluntary standards defines a standard as:

“A prescribed set of rules, conditions, or requirements concerned with the definition of terms; classification of components; delineation of procedures; specification of dimensions, materials, performance, design, or operations; measurement of quality and quantity in describing materials, products, sytems, services, or practices; or descriptions of fit and measurement of size.”

Circular A-119, which applies to the Executive branch, encourages agencies to adopt voluntary or defacto standards already established by industry. In the absence of established relevant standards, agencies may engage in setting their own standards. The Government itself has a number of standards managed by the National Institute of Standards and Technology (FIPS--Federal Information Processing Standards) and National Communication Systems (FED-STDS--Federal Telecommunications Standards), which are mandated for use by agencies.

Technology standards have beneficial effects because fewer formats across information containers reduce costs for the users of information, thereby allowing for common solutions to common problems. However, too many technology standards can inhibit innovation by agencies and result in suboptimal performance through the use of inappropriate technologies. In addition, because of the scope and influence of the Government, standards adopted by the Government can affect the behavior in other sectors of society, for better or worse.

There may be layers of standards. Media may have physical standards and specifications for how data is organized (e.g., tracks, sectors, blocks). Data may be represented via certain standards (e.g., ASCII). Information may conform to standards such as PDF. Other types or levels of standards may also be pertinent.

Standards for locating information (i.e., metadata) may prove to be as important as standards for information formats. Well developed cataloging systems exist for the static and well-defined information containers of paper publications. Search tools such as the popular search engines on the WWW are primitive by comparison, although there have been some recent initiatives in the federal government (e.g., GILS) to try to improve locator services. Standards for coding data (so that the same data item has the same meaning in multiple sources) also facilitate data access.

In comparison to paper publications, electronic versions can be easily modified with the resulting changes virtually undetectable. Therefore, there is the concern for assuring the integrity of an electronic publication to protect from inadvertent changes, intended well-meaning changes, or deliberate attempts to mislead. Various technologies such as encryption and watermarks can serve to ensure the authenticity and integrity of an electronic publication.

Question #14: Describe the formats used in the creation and dissemination of electronic information by agencies.

Question #15: Are certain formats mandated for use throughout the agency? Accepted as standard agency practice? Recommended or suggested? How did these formats become formal standards or agency practice? On whose initiative? How successful are these formal standards or practices in facilitating access to Government information?

Question #16: What facilities are available to identify and locate an agency’s electronic information? What facilities are available to identify non-agency electronic information that is likely to be pertinent to a user of the agency’s electronic information?

Question #17: What are the issues in ensuring the authenticity and integrity of Government electronic information?

Question #18: What, if any, mechanisms are used to ensure the authenticity and integrity of publications disseminated electronically?

Question #19: What are the FIPS and FED-STDS that are relevant to this study? Do agencies comply with them? Are these standards helpful?

Question #20: What are agency plans with respect to Questions #14, 15, 16, and 18?

Question #21: What are the trends in standard formats in other sectors of society? Particularly in state and local government, the computer and telecommunications industry, higher education, and the communications industries (e.g., media, publishing)?

Question #22: What are the characteristics and models of successful standards?

Question #23: When the Government adopts a particular standard, which sectors of society are most affected? What is the impact on these sectors?

Question #24: Are there certain types of information content or media selection for which standards have (or could have) a more beneficial effect?

Question #25: Based upon the findings from questions #14 through #24 above, what are the implications for federal depository libraries?

Performance Criteria for Formats

An alternative to specific technology standards is the adoption of performance criteria, which are desirable characteristics for information dissemination that are encouraged or mandated by agencies. Performance criteria are more generalized than specific information formats and are intended to serve as guidelines through numerous product life cycles. As with technology standards, the objective is to simplify information access for the users of Government information. Examples of performance criteria could include:

Except for the distribution of bulk data, the WWW is fast becoming the medium of choice for Government electronic information dissemination. Particular characteristics of WWW sites can facilitate ease of use--such as simple and consistent look and feel and link navigation.

Question #26: Describe the performance criteria that the agency uses in its information dissemination. How are these criteria communicated within the agency?

Question #27: Describe agency plans for the development of performance criteria.

Question #28: Are there typical performance criteria in other sectors of society? Particularly in state and local government, the computer and telecommunications industry, higher education, and the communications industries (e.g., media, publishing)?

Technological Aspects of Permanent Accessibility

The preservation and permanent accessibility of Government paper publications and microfiche is well established through the holdings of the National Archives and federal depository libraries and their efforts to make information available to the public. Ensuring that electronic information is also permanently accessible is in its infancy and is not a straightforward application of the mechanisms for paper publications to electronic information because electronic information is qualitatively different.

The basic objective with paper-based archives is to keep what you have. For electronic information, this is a challenging objective because of the evolution in technology--all technologies become obsolete and seemingly at an ever increasing rate. One strategy is to retain the various old technologies indefinitely so that archived information can be properly interpreted.

There are other alternatives that involve technological migration. Information can be copied from old media to new media. Information can be converted from old formats to newer formats. Or new software applications can be designed to read old formats. Any conversion or re-interpretation of information allows for the possibility that the new version may differ from the original, raising the question of whether the new form of the information is officially or legally equivalent to the original.

Question #29: What are the specific issues regarding the migration of Government electronic information to support permanent accessibility?

Question #30: How do we ensure that electronic information that is converted to a new format is equivalent to the original information--in terms of content and legal/official status?

Question #31: How are agency WWW pages and other on-line information managed to ensure permanent accessibility? What are agency plans?

Question #32: How does an agency determine or ascertain which subset of its electronic information merits permanent accessibility?

Question #33: What are the electronic information formats most conducive to permanent accessibility?

Question #34: What are the alternate mechanisms that could be used to ensure that electronic documents that are made to be permanently accessible are authentic when disseminated and the integrity of the document may be easily determined after dissemination?

Managing Access to Electronic Government Information

As previously discussed, standards for formats can facilitate access to Government electronic information. However, if it is not feasible to implement standards, an alternate process to arrive at the same outcome from the user’s perspective is to have an information intermediary convert information from various formats to one of several preferred formats. This process already occurs within some agencies (e.g., Department of Education’s WWW page--a central unit converts documents to one of a few standard formats before posting documents on the Department’s WWW page). The conversion proposal raises the issue of how much publishing activity should be undertaken by a Government information intermediary, rather than by private sector publishers. Where does “conversion” end and “value-added” begin?

Accessing information implies identifying relevant information and ascertaining where it may be obtained. Insofar as agencies make their own decisions with respect to electronic dissemination, what mechanisms are needed to identify and locate information among agencies?

The electronic environment provides opportunities for new models of dissemination. For example, the Department of State, University of Illinois at Chicago, and the Government Printing Office entered into a partnership whereby the University of Illinois at Chicago manages the Department of State Foreign Affairs Network, while fulfilling the requirements for the Federal Depository Library Program. The National Library of Education has an Internet reference desk, askERIC. Reference questions are submitted via email and answered by staff at Syracuse University, or referred to one of fifteen content specialty clearinghouses.

Question #35: The conversion of electronic formats by an information intermediary has been suggested as an alternative to developing format standards among information producers. What are the key issues in such a proposal?

Question #36: Besides the development of format standards for information producers and the format conversion alternative, what are other ideas (with respect to formats) to simplify access to electronic Government information?

Question #37: What are the different models for cooperative arrangements?

Question #38: What are the various cooperative arrangements in existence? Could cooperative arrangements such as the Department of State/University of Illinois at Chicago/Government Printing Office serve as the model for permanent accessibility?

Question #39: How can locator services be further developed? What mechanisms within agencies are necessary to ensure that relevant materials can be reasonably identified and located?

Costs of Electronic Information Products

As suggested earlier, it is a popular belief that the electronic dissemination of information is less expensive than other modes, particularly print publication. In a simple analysis in the short run, such a conclusion may seem plausible, if for example one compares the cost of putting up a document on the WWW with the cost of printing a publication and distributing it to federal depository libraries. However, life cycle costs are what is important and their computation is complex.

Although all of the elements of life cycle costs for intangible (or tangible for that matter) electronic information dissemination cannot be articulated at this point, some of them may be. We shall use the WWW as an example. First, there are the costs associated with converting the information to one of the preferred WWW formats for the agency. There are the costs for the server and its associated software, maintenance, and staff support. There are also facilities, telecommunications, and overhead costs.

After a number of years, the server technology and information formats may become obsolete and need to be migrated to a new system, thus incurring new costs for hardware, software, and so forth. At some point, the agency may decide that the information does not require direct on-line access and so archival to a system for permanent accessibility would take place. This system of permanent accessibility has its own set of costs, both initial and ongoing, and in time will become technologically obsolete itself, necessitating a migration to a successor system.

The assumptions of a life cycle cost analysis can greatly affect the outcome. One of the critical assumptions is the rate of decrease in the cost of technology. In addition, there are assumptions about the inflation rate and useful life of technology, and average use of the collection per year.

The life cycle cost model will vary with the different kinds of media. The model for WWW-based dissemination will differ from initial dissemination via CD-ROM.

So far only life cycle costs for the suppliers (i.e., the Government) have been considered. Costs for the user or intermediary (e.g., a depository library) also vary based upon how electronic information is disseminated. For example, using a CD-ROM may not require Internet access, but because CD-ROM software is not standardized, staff resources may need to be allocated to supporting CD-ROM products, which as tangible information products imply costs for cataloging, storage, and ultimately media and/or format migration. Information disseminated via the WWW obviously requires access to a computer with Internet access. However, the ongoing storage costs in the CD-ROM case are absent in the WWW instance.

In the context of the Federal Depository Library Program, there is a long tradition of cost sharing between the Government and depository libraries. The Government provides the information for no cost, while the depository libraries assume the costs of storage, preservation, access, and reference services. There is evidence to suggest that the expenditure by depository libraries is considerably greater than that of the Government. In the transition from paper-based dissemination to electronically-based dissemination, the Government saves the cost of printing publications while the user incurs the costs of printing-on-demand. It is unclear whether these aggregate costs are higher, lower, or unchanged.

The cost of access to Government information also depends on an individual’s or organization’s particular circumstances. The factors that influence the access to Government information have costs associated with them (e.g., computing resources available, proximity to a library with relevant resources, time, effort and knowledge required).

Question #40: What are the elements for life cycle costs for the various kinds of media used in electronic dissemination?

Question #41: Of the various elements in life cycle costs, which ones are fairly stable or can be predicted with high confidence?

Question #42: What are the specific costs for the elements in life cycle costs for the various kinds of media used in electronic dissemination?

Question #43: What are the elements and costs associated with user access to electronic Government information?

Question #44: What is the cost impact to federal depository libraries in the transition to primarily electronic Government information dissemination?

The Larger Policy Context

The transition from a primarily print world to a primarily electronic world implies changes in the definition of information containers, information formats, models for information dissemination, permanent accessibility, and the structure of costs. These extensive changes in the process of information dissemination suggest that there may be new roles and responsibilities for Government information intermediaries--those entities that facilitate the dissemination of Government information from information producers to information users--and agencies that produce electronic information--particularly for intangible electronic information products. The data and analyses from this study should serve to inform the debate on the larger policy issues concerning Government electronic information dissemination, although it is unlikely that these questions can be fully answered within the context of this study.

In some respects, the role of federal depository libraries could be unchanged for the foreseeable future. There is a large store of existing paper and other tangible information products in federal depository libraries. Additionally, although the Government’s production of tangible information products may decline in the coming years, there is still expected to be a significant amount of tangible information products. For these information products, federal depository libraries could continue to serve as repositories and provide service to citizens who wish to access Government information. However, in the context of electronic Government information, especially intangible electronic information, what is the role of federal depository libraries?

Question #45: What are the Government information intermediaries? What are their missions and current activities?

Question #46: Which agency(ies) should have the responsibility to ensure that Government electronic information is reasonably locatable across agencies? What alternatives are there to achieve this outcome?

Question #47: Which agency(ies) should have the responsibility to ensure that the appropriate set of information is maintained for permanent accessibility? What are the roles of agencies that produce information as compared to Government information intermediaries?

Question #48: How does the role of the federal depository library change in the electronic environment?

Question #49: Based on the findings in this study, what are the potential changes in the role of private sector publishers in the electronic environment?

go to the next section

go to the beginning of document