This is the accessible text file for GAO report number GAO-02-586 
entitled 'Information Management: Challenges in Managing and Preserving 
Electronic Records' which was released on June 17, 2002.



This text file was formatted by the U.S. General Accounting Office 

(GAO) to be accessible to users with visual impairments, as part of a 

longer term project to improve GAO products’ accessibility. Every 

attempt has been made to maintain the structural and data integrity of 

the original printed product. Accessibility features, such as text 

descriptions of tables, consecutively numbered footnotes placed at the 

end of the file, and the text of agency comment letters, are provided 

but may not exactly duplicate the presentation or format of the printed 

version. The portable document format (PDF) file is an exact electronic 

replica of the printed version. We welcome your feedback. Please E-mail 

your comments regarding the contents or accessibility features of this 

document to Webmaster@gao.gov.



Highlights: Report to Congressional Requesters:



June 2002:



Information Management:



Challenges in Managing and Preserving Electronic Records:



GAO-02-586:



June 2002:



Information Management:



Challenges in Managing and Preserving Electronic Records:



Highlights of GAO-02-586, a report to Congressional Requesters:



Why GAO Did This Study:



In the wake of the transition from paper-based to electronic processes, 

federal agencies are producing vast and rapidly growing volumes of 

electronic records. The difficulties of managing, preserving, and 

providing access to these records represent challenges for the National 

Archives and Records Administration (NARA) as the nation’s recordkeeper 

and archivist. GAO was requested to (1) determine the status and 
adequacy 

of NARA’s response to these challenges and (2) review NARA’s efforts to 

acquire an advanced electronic records archiving system, which will be 

based on new technologies that are still the subject of research.



What GAO Found: 



NARA has taken action to respond to the challenges associated with 

managing and preserving electronic records. In 2001, NARA completed 

an assessment of the current federal recordkeeping environment. This 

study concluded that although agencies are creating and maintaining 

records appropriately, most electronic records (including databases 

of major federal information systems) remain unscheduled (that is,

their value has not been assessed nor their disposition determined), 

and records of historical value are not being identified and provided 

to NARA for archiving. As a result, valuable electronic records may 

be at risk of loss. Part of the problem is that records management 

guidance is inadequate in the current technological environment of 

decentralized systems producing large volumes of complex records. 

Another factor is the low priority often given to records management 

programs and the lack of technology tools to manage electronic records. 

Finally, NARA does not perform systemic inspections of agency records 

management, and so it does not have comprehensive information on 

implementation issues and areas where guidance needs strengthening. 

Although NARA plans to improve its guidance and address technology 

issues, its plans do not address the low priority generally given 

to records management programs nor the inspection issue.



Recognizing the limitations of its technical strategies to support 

preservation, management, and sustained access to electronic records, 

NARA is planning to design, acquire, and manage an advanced electronic 

records archive; however, this project faces substantial risks. 

Although the electronic records archive project is in its initial 

stages, it is already falling behind schedule. Further, to acquire a 

major system of this kind, NARA needs to improve its information 

technology (IT) management capabilities, and although it has made 

progress in doing so, its efforts are not yet complete.



What GAO Recommends:



GAO recommends that the Archivist of the United States develop 

documented strategies to raise awareness of the importance of records 

management programs and for conducting systematic inspections of these 

programs. In addition, to reduce risks, GAO recommends that the 

Archivist reassess the schedule for acquiring the new archival system 

so that the agency can complete key planning tasks and address IT 

management weaknesses. In commenting on a draft of this report, the 

Archivist agreed with our recommendations and offered clarifications, 

which we have incorporated as appropriate.



Figure: Master Copies of Electronic Records in NARA’s Archives:



[See PDF for image]



Source: NARA.



[End of figure]



This is a test for developing highlights for a GAO report. The full 

report, including GAO’s objectives, scope, methodology, and analysis is 

available at www.gao.gov/cgi-bin/getrpt?GAO-02-586. For additional 

information about the report, contact Linda Koontz, 202-512-6240. To 

provide comments on this test highlights, contact Keith Fultz (202-512-

3200) or email HighlightsTest@gao.gov.



Contents:



Letter:



Results in Brief:



Background:



NARA Is Responding to Challenges of Electronic Records Management:



NARA’s Effort to Acquire Advanced Electronic Archival System Faces 

Risks:



Conclusions:



Recommendations for Executive Action:



Agency Comments and Our Evaluation:



Appendixes:



Appendix I: Objectives, Scopes, and Methodology:



Appendix II: Approaches to Archiving Electronic Records Provide Partial

Solutions:



Appendix III: NARA’s Electronic Records Guidance Has Evolved:



Appendix IV: Agencies Are Managing Large Volumes of Important 

Elecrtonic Records:



Appendix V: Comments from the National Archives and Records 
Administration:



Glossary:



Table:



Table 1: Timeline for ERA Program:



Figures:



Figure 1: Removable Hard Drives and Backup Devices Used by Independent 

Counsel Staff:



Figure 2: Master Copies of Electronic Records in NARA’s Archives:



Figure 3: OAIS Model and Its Components:



Figure 4: Sample of XML Version of State Department Telegram:



Figure 5: The Long Now Foundation Rosetta Disk Language Archive:



Figure 6: Internet Archive Collection of Presidential Candidate Web 

Sites:



Figure 7: Google’s Usenet Archive:



Abbreviations:



ASCII: American Standard Code for Information Interchange:



DARPA: Defense Advanced Research Projects Agency:



DOD: Department of Defense:



EAST: Examiners Automated Search Tool:



ERA: Electronic Records Archive:



GAO: General Accounting Office:



GIS: Geographic Information System:



GRS: General Records Schedule:



GSA: General Services Administration:



HTML: Hypertext Markup Language:



HUD: Housing and Urban Development:



IG: Inspector General:



IT: information technology:



NARA: National Archives and Records Administration:



NASA: National Aeronautics and Space Administration:



OAIS: Open Archival Information System:



OMB: Office of Management and Budget:



PMO: program management office:



POP: persistent object preservation:



PTO: U.S. Patent and Trademark Office:



SAS: State Archiving System:



SF: standard form:



VERS: Victorian Electronic Record Strategy:



WEST: Web Examiner Search Tool:



XML: Extensible Markup Language:



Letter June 17, 2001:



The Honorable Stephen Horn

Chairman, Subcommittee on Government Efficiency, 

    Financial Management and Intergovernmental Relations

Committee on Government Reform

House of Representatives:



The Honorable Ernest J. Istook, Jr.

Chairman, Subcommittee on Treasury, 

    Postal Service and General Government 

Committee on Appropriations

House of Representatives:



Agencies are increasingly moving to an operational environment in which 

electronic--rather than paper--records provide comprehensive 

documentation of their activities and business processes. Although this 

transformation has improved the way federal agencies work and interact 

with each other and with the public, it has also created the new 

challenge of managing and preserving vast and rapidly growing volumes 

of electronic records. Because these records document essential 

government functions and provide information necessary to protect 

government and citizen interests, their proper management is essential 

for ongoing government activities; further, the preservation of 

significant documents and other records is crucial for the historical 

record.



Overall responsibility for the government’s electronic records lies 

with the National Archives and Records Administration (NARA), which 

carries out a dual mission for the nation: oversight of records 

management, which governs the life cycle of records (creation, 

maintenance and use, and disposition), and archiving, which is the 

permanent preservation of documents and other records of historical 

interest. In carrying out these missions, NARA and agencies use a 

process known as scheduling to assess the value of records and 

determine their disposition.



The challenges associated with managing and preserving electronic 

records have long been recognized throughout government. Because of 

concern about these issues, you requested that we review electronic 

records management and preservation activities at NARA. Our objectives 

were to:



* determine the status of NARA’s efforts to respond to governmentwide 

electronic records management problems and the adequacy of its planned 

actions and:



* assess NARA’s efforts to acquire an archival system for electronic 

records.



As part of our assessment of NARA’s efforts to acquire an electronic 

records archiving system, you also asked that we identify alternative 

technologies under consideration for the long-term preservation of 

electronic records.



To address our objectives, we reviewed applicable guidance and other 

documentation; surveyed NARA’s appraisal archivists working with 

federal agencies; reviewed records management activities and obtained 

the views of record managers in selected federal agencies managing 

large volumes of electronic records; and reviewed legal challenges to 

federal electronic recordkeeping practices. We reviewed agency and 

contractors’ documentation for the electronic records archive program 

and assessed NARA’s effort to develop or enhance its information 

technology capabilities. Further details on our objectives, scope, and 

methodology are provided in appendix I.



Results in Brief:



NARA has taken action to respond to the challenges associated with 

managing and preserving electronic records. In 2001, NARA completed an 

assessment of the current federal recordkeeping environment; this study 

concluded that although agencies are creating and maintaining records 

appropriately, most electronic records (including databases of major 

federal information systems) remain unscheduled, and records of 

historical value are not being identified and provided to NARA for 

preservation in archives. As a result, valuable electronic records may 

be at risk of loss. Part of the problem is that records management 

guidance is inadequate in the current technological environment of 

decentralized systems producing large volumes of complex records. 

Another factor is the low priority often given to records management 

programs and the lack of technology tools to manage electronic records. 

Finally, NARA does not perform systematic inspections of agency records 

and records management programs, and so it does not have comprehensive 

information allowing it to identify records management implementation 

issues and areas where its guidance needs to be strengthened. NARA 

plans to improve its guidance and to address technology issues. 

However, NARA’s plans do not address the low priority generally given 

to records management programs nor the issue of systematic inspections.



Recognizing the limitations of its technical strategies to support 

preservation, management, and sustained access to electronic records, 

NARA is planning to design, acquire, and manage an advanced electronic 

records archive (ERA); however, this project faces substantial risks. 

NARA is behind schedule for the ERA system, largely because of flaws in 

how the schedule was developed. Further, to acquire a major system like 

ERA, NARA needs to improve its information technology (IT) management 

capabilities, and although it has made progress in doing so, its 

efforts are not yet complete.



Regarding alternative archiving technologies for electronic records, we 

found that archival organizations now rely on a mixture of evolving 

approaches that generally fall short of solving the long-term 

preservation problem. Appendix II provides a detailed discussion of 

these approaches.



In light of the continuing challenge of managing federal records, both 

electronic and otherwise, we are recommending that the Archivist of the 

United States develop a strategy for raising awareness of the 

importance of federal records management programs and for performing 

systematic inspections. In addition, to mitigate the risks associated 

with developing the new archival system, we are recommending that the 

Archivist reassess the schedule for this effort.



In commenting on a draft of this report, the Archivist stated that more 

must be done to address the enormous challenges in managing and 

preserving electronic records and agreed with the report’s 

recommendations. He also offered clarifications concerning records 

management priority, inspections, and the ERA schedule that we have 

incorporated as appropriate.



Background:



Advances in information technology and the explosion in computer 

interconnectivity brought about by the Internet are irreversibly 

changing the way we communicate and conduct business. Office automation 

applications and networked desktop computers are providing the 

capability to rapidly create and share electronic documents, use Web 

sites for executing business and financial transactions, and 

instantaneously communicate with individuals and groups. While the 

transformation from a paper-based to an electronic business environment 

has led to improvements in the way federal agencies do business, both 

with each other and with the public, it has also created the new 

challenge of managing and preserving electronic records, which must be 

approached differently from their paper counterparts. Unlike paper 

records, electronic records are not tangible, come in many formats, and 

depend on the hardware and software with which they were created.



NARA’s mission is to ensure “ready access to essential evidence” for 

the public, the President, the Congress, and the Courts. NARA’s 

responsibilities stem from the Federal Records Act,[Footnote 1] which 

requires each federal agency to make and preserve records that 

(1) document the organization, functions, policies, decisions, 

procedures, and essential transactions of the agency and (2) provide 

the information necessary to protect the legal and financial rights of 

the government and of persons directly affected by the agency’s 

activities. Effective management of these records is critical for 

ensuring that sufficient documentation is created; that agencies can 

efficiently locate and retrieve records needed in the daily performance 

of their missions; and that records of historical significance are 

identified, preserved, and made available to the public. According to 

NARA, without effective records management, the records needed to 

document citizens’ rights, actions for which federal officials are 

responsible, and the historical experience of the nation will be at 

risk of loss, deterioration, or destruction.



Under the act, NARA is responsible for oversight of records management 

and archiving. Records management--that is, the policies, procedures, 

guidance, tools and techniques, resources, and training needed to 

design and maintain reliable and trustworthy records systems--governs 

the life cycle of records from creation, through maintenance and use, 

to final disposition. Archiving is the permanent preservation of 

records documenting the activities of the government. NARA thus 

oversees agency management of temporary records used in everyday 

operations and ultimately takes control of permanent agency records 

judged to be of historic value.[Footnote 2] Of the total number of 

federal records, less than 3 percent are designated permanent.



NARA Is Responsible for Oversight of Records Management:



NARA is responsible for issuing records management guidance; working 

with agencies to implement effective controls over the creation, 

maintenance, and use of records in the conduct of agency business; 

providing oversight of agencies’ records management programs; and 

providing storage facilities for certain temporary agency records. The 

Federal Records Act also authorizes NARA to conduct inspections of 

agency records and records management programs.



NARA works with agencies to identify and inventory records, appraise 

their value, and determine whether they are temporary or permanent, how 

long the temporary records should be kept, and under what conditions 

both the temporary and permanent records should be kept. This process 

is called scheduling. No record may be destroyed unless it has been 

scheduled, and for temporary records the schedule is of critical 

importance because it provides the authority to dispose of the record 

after a specified time period. Records are governed by schedules that 

are specific to an agency or by a general records schedule, which 

covers records common to several or all agencies. According to NARA, 

records covered by general records schedules make up about a third of 

all federal records. For the other two thirds, NARA and the agencies 

must agree upon specific records schedules. Once a schedule has been 

approved, the agency must issue it as a management directive, train 

employees in its use, apply its provisions to temporary and permanent 

records, and evaluate the results.



While the Federal Records Act covers documentary material regardless of 

physical form or media, records management and archiving were until 

recently largely focused on handling paper documents. With the advent 

of computers, both records management and archiving have had to take 

into account the creation of records in varieties of electronic 

formats. NARA’s basic guidance for the management of electronic records 

is in the form of a regulation at 36 CFR Part 1234. This guidance is 

supplemented by the issuance of periodic NARA bulletins and a records 

management handbook, Disposition of Federal Records. NARA’s guidance 

has two basic requirements. First, agencies are required to maintain an 

inventory of all agency information systems. The inventory should 

identify (1) the system’s name; (2) its purpose; (3) the agency 

programs supported by the system; (4) data inputs, sources, and 

outputs; (5) the information content of databases; and (6) the system’s 

hardware and software environment. Second, NARA requires agencies to 

schedule the electronic records maintained in its systems. Agencies 

must either schedule those records under specific schedules, completed 

through submission and approval of Standard Form 115 (SF 115), Request 

for Records Disposition Authority, or pursuant to a general records 

schedule. NARA relies on this combination of inventory and scheduling 

requirements to ensure the management of agency electronic records 

consistent with the Federal Records Act.



NARA has also established a general records schedule for electronic 

records. General Records Schedule 20 (GRS 20) authorizes the disposal 

of certain categories of temporary electronic records. It has been 

revised several times over the years in response to developments in 

information technology, as well as legal challenges. (App. III provides 

a discussion of the evolution of electronic records guidance and legal 

challenges to

GRS 20.):



As it stands now, GRS 20 applies to electronic records created both in 

computer centers engaged in large-scale data processing and in the 

office automation environment. With regard to computer centers, GRS 20 

authorizes the disposal of certain types of scheduled electronic 

records associated with large database systems, such as inputs, 

outputs, and processing files. With regard to the office desktop 

environment, GRS 20 authorizes the deletion of the electronic version 

of records on word processing and electronic mail systems once a 

recordkeeping copy has been made. In addition, it authorizes deletion 

of electronically generated administrative spreadsheets and other 

administrative records that are included in recordkeeping systems that 

have been authorized for disposal by NARA. Since most agency 

“recordkeeping systems” are paper files, GRS 20 essentially authorizes 

agencies to destroy E-mail and word-processing files once they are 

printed. As already noted, records not covered by a general records 

schedule may not be destroyed unless authorized by a records schedule 

that has been approved by NARA.



GRS 20 does not address many common products of electronic information 

processing, particularly those that result from the now prevalent 

distributed, end-user computing environment. For example, although the 

guidance addresses the disposition of certain types of electronic 

records associated with large databases, it does not specifically 

address the disposition of electronic databases created by 

microcomputer users. In addition, while addressing word processing and 

E-mail records, GRS 20 does not address more recent forms of electronic 

records such as Web pages and portable document format (PDF) files.

[Footnote 3]:



NARA Archives Permanent Records of Historical Interest:



As the nation’s archivist, NARA accepts for deposit to its archives 

those records of federal agencies, the Congress, the Architect of the 

Capitol, and the Supreme Court that are determined to have sufficient 

historical or other value to warrant their continued preservation by 

the U.S. government. NARA also accepts papers and other historical 

materials of the Presidents of the United States, documents from 

private sources that are appropriate for preservation (including 

electronic records, motion picture films, still pictures, and sound 

recordings), and records from agencies whose existence has been 

terminated, including Offices of Independent Counsel (see fig. 1).



Figure 1: Removable Hard Drives and Backup Devices Used by Independent 

Counsel Staff:



[See PDF for image]



Source: NARA.



[End of figure]



NARA archives vast quantities of federal records in various formats. 

Its archival facilities (a network of regional archives) hold over 21 

million cubic feet of original textual materials, while its multimedia 

collections include nearly 300,000 reels of motion picture film; more 

than 5 million maps, charts, and architectural drawings; over 200,000 

sound and video recordings; about 9 million aerial photographs; nearly 

14 million still pictures and posters; and over 87,000 computer data 

sets stored on computer tapes and cartridges (see fig. 2).



Figure 2: Master Copies of Electronic Records in NARA’s Archives:



[See PDF for image]



Source: NARA.



[End of figure]



In addition to its archives, NARA also manages the archival holdings of 

10 presidential libraries, the Nixon presidential materials staff, and 

the Clinton presidential materials project. These include over 400 

million paper records, over 15 million feet of film, nearly 10 million 

still pictures, nearly 100,000 hours of audio and video recordings, and 

almost half a million museum objects.



The types of electronic records that NARA currently accepts for 

archiving are limited to those that are independent of specified 

hardware or software and are in text-based formats, such as databases 

and certain text-based geographic information system (GIS)[Footnote 4] 

files. NARA does not accept digital images, Web pages, word processor 

files, relational databases, or any records with complex 

structure.[Footnote 5] (Although NARA does not as yet accept such files 

for archiving, they must still be scheduled.):



Management and Preservation of Electronic Records Pose Major 

Challenges:



During the last four decades, archiving--the permanent preservation of 

information of enduring value for access by future generations--has 

undergone a major change. Before the advent of large bureaucracies 

supported by the now ubiquitous computer, archivists dealt with a 

scarcity of sources, with much of their efforts focused on tracking 

down unique manuscripts or recovering incomplete files.[Footnote 6] The 

archived records were relatively durable--clay tablets, stone, 

parchment, vellum, or rag paper. Albeit scarce and often incomplete, 

these records come down through the centuries relatively intact and 

could be preserved with little or no difficulty. The growth of the 

government, complex organizations, and advent of the electronic age 

have reversed the conditions facing today’s archives: rather than 

dealing with scarce sources, the archives are facing a flood of 

potentially valuable information stored on fragile materials, including 

pulp paper and computer tapes and disks.



While the preservation of information recorded on traditional materials 

such as paper or film requires significant resources, the current major 

archival challenge is the preservation of electronic records. Like 

traditional archival materials--books, papers, or film--electronic 

information is recorded on media that deteriorate with age. However, 

unlike the traditional archival materials, electronic records are 

stored in specific formats and cannot be read without software and 

hardware--sometimes the specific types of hardware and software on 

which they were created.



The rapid evolution of information technology makes the task of 

managing and preserving electronic records complex and costly. Agencies 

are increasingly moving to an operational environment in which 

electronic--rather than paper--records provide comprehensive 

documentation of their activities and business processes. Part of the 

challenge of managing electronic records is that they are produced by a 

mix of information systems, which vary not only by type but by 

generation of technology: the mainframe, the personal computer, and the 

Internet. Each generation of technology brought in new systems and 

capabilities without displacing the older systems.[Footnote 7] Thus, 

organizations have to manage and preserve electronic records associated 

with a wide range of systems, technologies, and formats.



The challenge of managing and preserving vast and rapidly growing 

volumes of electronic records produced by modern organizations is 

placing pressure on the archival community and on the information 

industry to develop a cost-effective long-term preservation strategy 

that would free electronic records of the straitjacket of proprietary 

file formats and software and hardware dependencies. This challenge is 

affected by several factors: decentralization of the computing 

environment, the complexity of electronic records, obsolescence and 

aging of storage media, massive volumes of electronic records, and 

software and hardware dependencies.



* Decentralization of computing environment: The challenge of managing 

electronic records significantly increases with the decentralization of 

the computing environment. In the centralized environment of a 

mainframe computer, it is relatively easy to identify, assess, and 

manage electronic records. This is not the case in the decentralized 

environment of agencies’ office automation systems, where every user is 

creating electronic files that may constitute a formal record and thus 

should be preserved.



* Complexity of electronic records: Electronic records have evolved 

from simple text-based files to complex digital objects that may 

contain embedded images (still and moving), drawings, sounds, 

hyperlinks, or spreadsheets with computational formulas. Some portions 

of electronic records, such as the content of dynamic Web pages, are 

created on the fly from databases and exist only during the viewing 

session. Others, such as E-mail, may contain multiple attachments, and 

they may be threaded (that is, related E-mail messages are linked into 

send-reply chains). These records cannot be converted to paper or text 

formats without the loss of context, functionality, and information.



* Obsolescence and aging of storage media: Storage media are affected 

by the dual problems of obsolescence and decay. They are fragile, have 

limited shelf life, and become obsolete in a few years. Few computers 

today have disk drives that can read information stored on 8-or 5¼-inch 

diskettes, even if the diskettes themselves remain readable.



* Massive volumes: Electronic records are increasingly being created in 

volumes that pose significant technical challenge to our ability to 

organize and make them accessible. For example, among the candidates 

for archiving are military intelligence records comprising more than 1 

billion electronic messages, reports, cables, and memorandums, as well 

as over 50 million electronic court case files.



* Software and hardware dependency: Electronic records are created on 

computers with software ranging from word-processors to E-mail 

programs. As computer hardware and application software become 

obsolete, they may leave behind electronic records that cannot be read 

without the original hardware and software.



Past GAO Work Highlighted Electronic Records Challenges:



In July 1999, we reported that NARA and federal agencies were facing 

the substantial challenge of preserving electronic records in an era of 

rapidly changing technology.[Footnote 8] In that report we stated that 

in addition to handling the burgeoning volume of electronic records, 

NARA and the agencies would have to address several hardware and 

software issues to ensure that electronic records were properly 

created, maintained, secured, and retrievable in the future. We also 

noted that NARA did not have governmentwide data on the records 

management capabilities and programs of all federal agencies. As a 

result, we recommended that NARA conduct a governmentwide survey of 

agencies’ electronic records management programs and use the 

information as input to its efforts to reengineer its business 

processes. NARA’s subsequent efforts to assess governmentwide records 

management practices and study the redesign of its business processes 

are discussed later in this report.



Agencies Are Beginning to Automate Management of Electronic Records:



In response to the difficulty of manually managing electronic records, 

agencies are slowly turning to automated records management 

applications to help automate electronic records management life-cycle 

processes. The primary functions of these applications include 

categorizing and locating records and identifying records that are due 

for disposition, as well as storing, retrieving, and disposing of 

electronic records that are maintained in repositories. Also, some 

applications are beginning to be designed to automatically classify 

electronic records and assign them to an appropriate records retention 

and disposition category.



The Department of Defense (DOD), which is pioneering the assessment and 

use of records management applications, has published application 

standards and established a certification program.[Footnote 9] The DOD 

standard, endorsed by NARA, includes the requirement that records 

management applications acquired by DOD components after 1999 be 

certified to meet this standard.[Footnote 10] As of March 2002, DOD had 

certified 31 applications. NARA was testing one of the DOD-certified 

electronic records management applications, and it will be assessing 

the second version of the DOD standard to determine whether it can or 

should become a governmentwide standard.



Theory, Methods, and Model for Long-Term Preservation of Electronic 

Records Are Being Developed:



NARA is not alone in facing the challenges posed by electronic records, 

particularly long-term preservation. There is a general consensus in 

the archival community that a viable strategy for the long-term 

preservation and archiving of electronic records has yet to be 

developed. Accordingly, archives scholars, national archival and 

library institutions, and private industry representatives are 

collaborating on major initiatives to develop the theoretical and 

methodological knowledge needed for the permanent preservation of 

records created in electronic systems. These initiatives include the 

following:



* The International Research on Permanent Authentic Records in 

Electronic Systems project is a major two-phase international research 

project in which archival and computer engineering scholars, national 

archival institutions (including NARA), and private industry 

representatives are collaborating to develop the theoretical and 

methodological knowledge required for the permanent preservation of 

authentic records created in electronic systems. The first phase of the 

project, focusing on records generated in databases and document 

management systems, was recently completed; the second phase (2002 to 

2006) deals with the issues of authenticity, reliability, and accuracy 

of records produced in new digital environments.



* The Library of Congress’ National Digital Information Infrastructure 

and Preservation Program is a national cooperative effort led by the 

Library to develop the strategy and technical approaches needed to 

archive and preserve digital information; NARA is also participating in 

this effort. The program is in an early stage; completion is not 

expected until 2004 or 2005, when the Library will provide 

recommendations to the Congress.



* NARA is collaborating in a joint effort on electronic record 

archiving with the Defense Advanced Research Projects Agency (DARPA), 

the U.S. Patent and Trademark Office, the National Partnership for 

Advanced Computational Infrastructure, and the San Diego Supercomputer 

Center. Led by DARPA, the collaboration aims to develop and demonstrate 

architectures and technologies for electronic archiving and the 

development of persistent object preservation, a proposed technique for 

electronic archiving (discussed in app. II).



These initiatives are all in their early stages; none of them has yet 

yielded proof-of-concept prototypes demonstrating the viability of a 

long-term solution to preserving and accessing electronic records.



Progress has been made, however, in the development of a standard model 

for electronic archiving systems. The Open Archival Information System 

(OAIS) model, which is currently emerging as a standard in the archival 

community, was initially developed by the National Aeronautics and 

Space Administration (NASA) for archiving the large volumes of data 

produced by space missions. However, the model is applicable to any 

archive, digital library, or repository. As a standard framework for 

long-term preservation archives, the model defines the environment 

necessary to support a digital repository and the interactions within 

that environment. According to NASA, it also promotes the understanding 

and increased awareness of archival concepts needed for long-term 

digital information preservation and access, as well as for describing 

and comparing architectures and operations of existing and future 

archives.



Many institutions have already chosen to use the framework of the OAIS 

reference model to guide their digital preservation efforts, including 

the National Library of the Netherlands, NARA (in conjunction with the 

development of its electronic records archiving project), NASA’s 

National Space Science Data Center, and many commercial organizations.



The OAIS model (see fig. 3) breaks the archiving system down into six 

distinct functional areas: ingest, archival storage, data management, 

administration, preservation planning, and access.



* In the ingest area, systems accept information submitted from outside 

the framework and prepare the contents for storage. This functional 

area also includes systems to generate descriptive information to allow 

future management within the archive.



* In the archival storage area, systems pass the information, now 

called archival information packages, into a storage repository, where 

it is maintained until the contents are requested and retrieved.



* The data management area encompasses the services and functions for 

populating, maintaining, and accessing both descriptive information 

that identifies and documents archive holdings and administrative data 

used to manage the archive.



* The administration area provides the services and functions for the 

overall operation of the archive system.



* In the preservation planning area, systems monitor the environment of 

the OAIS and provide recommendations to ensure that the information 

stored in the OAIS remains accessible, even if the original computing 

environment becomes obsolete.



* The access area includes systems that allow a user to determine the 

existence, description, location, and availability of information 

stored in the OAIS, allowing information products to be requested and 

received.



Figure 3: OAIS Model and Its Components:



[See PDF for image]



Source: Consultative Committee for Space Data Systems.



[End of figure]



The OAIS framework does not presume or apply any particular 

preservation strategy. This approach allows organizations that adopt 

the framework to apply their own strategies or combinations of 

strategies. The framework does assume that the information managed is 

produced outside the OAIS, and that the information will be 

disseminated to users who are also outside the system. Because the 

model is simplified to include only functions common to all 

repositories, it allows institutions to focus on the approaches 

necessary to preserve the information.



NARA Is Responding to Challenges of Electronic Records Management:



NARA is taking action to respond to long-standing problems associated 

with managing and preserving electronic records in archives. In 2001, 

NARA completed an assessment of governmentwide records management 

practices. This assessment concluded that although agencies are 

creating sufficient records and maintaining them appropriately, most 

electronic records remain unscheduled, and permanent records of 

historical value are not being identified and provided to NARA for 

preservation and archiving. As a result, potentially valuable records 

may be at risk.



According to the study, the problems in electronic records management 

appear to stem from (1) inadequate governmentwide records management 

guidance and (2) the low priority traditionally given to federal 

records management functions and a lack of technology tools to manage 

electronic records. To address these problems, NARA now plans to 

(1) analyze key policy issues related to the disposition of records and 

improve its guidance and (2) examine and redesign, if necessary, the 

scheduling and appraisal process and make this process more effective 

through the use of technology. NARA’s plans, however, do not address 

the low priority given to records functions. Further, these plans do 

not address the need to monitor performance of records management 

programs and practices on an ongoing basis.



NARA’s Assessment of Federal Records Practices Identifies Problems:



Records must be effectively managed throughout their life cycle, which 

includes records creation, maintenance and use, and scheduling and 

disposition. Agencies must create reliable records that meet the 

business needs and legal responsibilities of federal programs and (to 

the extent known) the needs of internal and external stakeholders who 

may make secondary use of the records. To maintain and use the records 

created, agencies are to create internal recordkeeping requirements for 

maintaining records, consistently apply these requirements, and 

establish systems that allow them to find records that they need. 

Scheduling is the means by which NARA and agencies identify federal 

records, determine time frames for disposition, and identify permanent 

records of historical value that are to be transferred to NARA for 

preservation and archiving. With regard particularly to electronic 

records, agencies are also to compile inventories of their information 

systems, after which the agency is required to develop a schedule for 

the electronic records maintained in those systems.



In 2001, NARA completed an assessment of governmentwide records 

management practices, as recommended in our prior work. The assessment 

included a recordkeeping study performed by a contractor--SRA 

International--and a series of records system analyses performed by 

NARA staff. The SRA study was based on a survey of federal employees 

representing over 150 federal government organizations and on 54 focus 

groups and interviews involving individuals from 18 agencies; the NARA 

staff’s records system analyses focused on records management practices 

for key business processes in 11 federal agencies.



The resulting NARA/SRA study identified problems in agency records 

management.[Footnote 11] Specifically, NARA’s assessment of records 

management for key processes in 11 agencies concluded the following.



* Records creation: In general, the NARA study showed that the 

processes that were studied appeared to generate adequate records 

documentation.



* Records maintenance and use: For the most part, recordkeeping 

requirements were adequate, documented, and consistently applied. In 

addition, employees were generally able to find the records that they 

needed.



* Records scheduling and disposition: The study identified significant 

problems in both records scheduling and disposition. According to the 

study, many significant records--as well as most federal electronic 

records--are unscheduled. In addition to the unscheduled records, NARA 

identified several significant records that had been improperly 

scheduled. The study concluded that records scheduling was clearly a 

problem area.



Our review at four agencies (Commerce, Housing and Urban Development, 

Veterans Affairs, and State) provides confirmation of this result, 

eliciting a collective estimate that less than 10 percent of mission-

critical systems were inventoried. The number of mission-critical 

systems at these four agencies was reported to be 907, according to 

information collected by the Office of Management and Budget in 

November 1999 as part of the federal government’s effort to assess the 

Year 2000 computing challenge.[Footnote 12] Thus for these four 

agencies alone, over 800 systems had not been inventoried and the 

electronic records maintained in them had not been scheduled. 

Scheduling the electronic records in a large number of major 

information systems presents an enormous challenge, particularly since 

it generally takes NARA, in conjunction with agencies, well over 6 

months to approve a new schedule.[Footnote 13]



Failure to inventory systems and schedule records places these records 

at risk. The absence of inventories and schedules means that NARA and 

agencies have not examined the contents of these information systems to 

identify official government records, appraised the value of these 

records, determined appropriate disposition, and directed and trained 

employees in how to maintain and when and how to dispose of these 

records. As a result, temporary records may remain on hard drives and 

other media long after they are needed or could be moved to less costly 

forms of storage. In addition, there is increased risk that these 

records may be deleted prematurely while still needed for fiscal, 

legal, and administrative purposes.



The lack of scheduling presents particular risks to the preservation of 

permanent records of historic significance. NARA’s study of 11 agencies 

found instances where valuable permanent electronic records were not 

being appropriately transferred to NARA’s archives because these 

records had not been scheduled, appraised, identified as permanent, and 

placed under the control of the agency’s records program. This lack of 

management control places these valuable records at increased risk of 

loss, destruction, and deterioration.



NARA’s Records Management Guidance Has Not Kept Pace with the 

Challenges of Electronic Records:



The NARA/SRA study identified the lack of sufficient governmentwide 

guidance as one cause of records management problems. As NARA has 

acknowledged, its policies and processes on electronic records have not 

yet evolved to reflect the modern recordkeeping environment: records 

created electronically in decentralized processes.[Footnote 14] 

Despite repeated attempts to clarify its electronic records guidance 

through a succession of NARA bulletins, the current guidance remains 

incomplete and confusing. According to the study, for example, 

employees lack knowledge concerning how to identify electronic records 

and what to do with them once identified. The guidance does not provide 

disposition instructions for electronic records maintained in many of 

the common types of formats produced by federal agencies, including PDF 

files, Web pages, and spreadsheets. To support their missions, many 

agencies must maintain such records--often in large volumes--with 

little guidance from NARA (see app. IV for a discussion of the records 

management challenges faced by selected agencies).



The NARA/SRA study concluded that while agencies appreciate the 

specific assistance from NARA personnel, they are frustrated because 

they perceive that NARA is not meeting agencies’ broader needs for 

guidance and records management leadership. This study reported that 

agencies believe that NARA has a responsibility to lead the way in 

transitioning to an electronic records environment and to provide 

guidance and standards, as well as tools to enable agencies to follow 

the guidance. According to the study, some viewed NARA as leaving 

agencies to fend for themselves, sometimes levying impossible 

requirements that pressure agencies to come up with their own 

individual solutions.



Agency Records Management Programs Are Given Low Priority and Lack 

Technology Tools:



The NARA/SRA study identified another cause of records management 

difficulties: the low priority generally afforded to records management 

programs. The study states that records management is not even “on the 

radar scope” of agency leaders. Further, records officers have little 

clout and do not appear to have much involvement in or influence on 

programmatic business processes or the development of information 

systems designed to support them. New government employees seldom 

receive any formal, initial records management training. One agency 

told NARA that records management is “number 26 on our list of top 25 

priorities.” The study also noted that federal downsizing may have 

negatively affected records management and staffing resources in 

agencies.



Further, records management is generally considered a “support” 

activity. Since support functions are typically the most dispensable in 

agencies, resources for and focus on these functions are often limited. 

This finding was echoed by a recent review of archival practices of 

research universities, corporate research and development programs, and 

federal science agencies, which noted that “agency records management 

programs lack the resources to meet even the legally required standards 
of 

securing adequate documentation of their programs and activities.”

[Footnote 15]:



As indicated by the NARA/SRA study, a related issue is the technical 

challenge of electronic records management: effective electronic 

records management may require more sophisticated and expensive 

information technology (such as automated electronic records management 

systems) than was previously necessary for paper-based records 

management programs. Because management tends not to focus on records 

management, priority has not been given to acquiring or upgrading the 

technology required to manage records in an electronic environment. The 

study noted that technology tools for managing electronic records do 

not exist in most agencies, and further, that agency information 

technology environments have not been designed to facilitate the 

retention and retrieval of electronic records. As a result, despite the 

growth of electronic media, agency records systems are predominantly in 

paper format rather than electronic.



The study further noted that agencies planning or piloting automated 

electronic records management systems perform better recordkeeping than 

those without such tools. Typically, such agencies are already 

performing better recordkeeping, and they tend to invest in electronic 

records management systems because of the value they place on good 

records management. According to the study, many agencies are either 

planning or piloting information technology initiatives to support 

electronic records management, but their movement to electronic systems 

is constrained by the level of financial support provided for records 

management.



Inspections of Federal Electronic Records Programs Are Limited:



A possible further cause of agency records management problems, not 

addressed in the NARA/SRA study, is the limited nature of NARA’s 

current inspection program. NARA is responsible, under the Federal 

Records Act, for conducting inspections or surveys of agency records 

and records management programs and practices. Its implementing 

regulations require NARA to select agencies to be inspected (1) on the 

basis of perceived need by NARA, (2) by specific request by the agency, 

or (3) on the basis of a compliance monitoring cycle developed by NARA.

[Footnote 16] In all instances, NARA is to determine the scope of the 

inspection. Such inspections provide not only the means to assess and 

improve individual agency records management programs but also the 

opportunity for NARA to determine overall progress in improving agency 

records management and identify problem areas that need to be addressed 

in its guidance.



Between 1996 and 2000, NARA performed 16 inspections of agency records 

management programs, or about 3 per year. These reviews were systematic 

and comprehensive, covering all aspects of an agency’s records program. 

However, only 2 of the 24 major executive departments or agencies were 

evaluated, with most of NARA’s evaluations focused on component 

organizations or independent agencies. Moreover, these evaluations 

frequently bypassed the issue of electronic records.



In 2000, NARA replaced agency evaluations with a new inspection 

approach--targeted assistance. NARA decided that its previous approach 

to inspections was basically flawed: besides reaching only a few 

agencies, it was often perceived negatively by agencies and resulted in 

a list of records management problems that agencies then had to resolve 

on their own. Under the targeted assistance approach, NARA enters into 

partnerships with federal agencies to provide them with guidance, 

assistance, or training in any area of records management. Services 

offered include expedited review of critical schedules, tailored 

training, and help in records disposition and transfer.



However, although this approach may improve records management in the 

targeted agencies, it is not a substitute for systematic inspections 

and evaluations of federal records programs. Because the targeted 

assistance program is voluntary and, according to NARA, initiated by a 

written request from the agency, relying on it exclusively could 

significantly limit NARA’s evaluations of federal recordkeeping. First, 

only agencies requesting targeted assistance--presumably those already 

having greater appreciation of the importance of records management--

are evaluated. Second, the scope and the focus of the targeted 

assistance are not determined by NARA but by the requesting agency.



NARA Is Addressing Records Management Problems, but Additional 

Opportunities Exist:



NARA has recognized that its policy and regulations for the management 

and disposition of electronic records must be revised to provide 

agencies with clear and comprehensive guidance encompassing all types 

and formats of electronic records. Having completed its assessment of 

federal records management practices, NARA now plan a two-phase project 

to (1) analyze key policy issues related to the disposition of records 

and improve governmentwide guidance, and (2) examine and redesign, if 

necessary, the scheduling and appraisal process and make this process 

more effective through the use of technology.



According to NARA, the purpose of the first phase of the project is to 

analyze and make decisions, as necessary, on key policy issues related 

to determining the disposition of records. NARA plans to evaluate 

current legislation, regulations, and guidance to determine if these 

are adequate in the current recordkeeping environment. NARA expects the 

outcome of the first phase, scheduled for completion by the end of 

fiscal year 2002, to be policy decisions that support the appropriate 

disposition of all government documentation in today’s multimedia 

environment.[Footnote 17] These results are also intended, as 

recommended in our prior work, to inform the redesign of the current 

scheduling and appraisal process planned for the second phase of the 

project, the development of electronic recordkeeping requirements, and 

improvements to records management guidance and assistance to agencies.



In the second phase, NARA plans to examine and redesign, if necessary, 

the process used by the federal government to determine the disposition 

of records. This is planned as a multiyear process (2003 to 2006) 

during which NARA intends to address the scheduling and appraisal of 

federal records in all formats. Currently, it takes NARA well over 6 

months to approve a new schedule. According to NARA, the extensive 

appraisal time delays action on the disposition of records and 

discourages agencies from submitting schedules, potentially putting 

essential evidence at risk. NARA has two goals for this project: 

(1) making the process for determining the disposition of records, 

regardless of medium, more effective and efficient and dramatically 

decreasing the amount of time it takes to get approval for the 

disposition of records from the Archivist of the United States, and 

(2) deciding how to appropriately apply technology to support the 

revised process for determining the disposition of records as part of 

managing records throughout their life cycle.



Although NARA’s plans address the need to improve guidance and 

determine how to use technology to support records management, these 

plans do not address another issue raised in its study: the low 

priority generally given to records management and the related lack of 

management commitment and attention to these functions. Without a 

strategy to establish senior-level agency commitment to records 

management and raise awareness of its importance to the federal 

government, these programs are likely to continue to be regarded by 

agency management and employees as low-priority “support” functions.



In addition, NARA’s plans do not address the issue of systematic 

inspections. While the results of its recent study provide a baseline 

of governmentwide records management practices, NARA’s targeted 

assistance approach does not provide systematic and comprehensive 

information to assess progress over time. Without this type of data, 

NARA will be impaired in its ability to determine if it is achieving 

results in improving agency records management. Further, NARA may not 

have the means to identify agency implementation issues and areas where 

its guidance needs to be clarified, augmented, and strengthened. The 

feedback provided by inspection is especially critical now as NARA 

plans to redesign the scheduling and appraisal process, and improve its 

guidance.



NARA’s Effort to Acquire Advanced Electronic Archival System Faces 

Risks:



Archiving--the final phase of records management for permanent records-

-presents a significant challenge when records are electronic. In light 

of the growth in the volume, complexity, and diversity of electronic 

records, NARA has recognized that its technical strategies to support 

preservation, management, and sustained access to electronic records 

are inadequate and inefficient. To address this challenge, the agency 

is pursuing two strategies. Its short-term strategy is to extend the 

useful life of its current systems and to create some new systems for 

archiving electronic records and for cataloging and displaying 

electronic records on-line. NARA’s long-term strategy, on which it is 

placing its primary focus, is to contract with a private sector firm to 

acquire (that is, obtain) an advanced electronic records archive (ERA).



However, NARA faces substantial risks in implementing its long-term 

strategy. NARA is not meeting its schedule for the ERA system, largely 

because of flaws in how the schedule was developed. As a result, the 

schedule will be compressed, increasing risks. Further, although NARA 

recognizes that to be successful it must improve its information 

technology (IT) management capabilities and has made progress in doing 

so, these efforts are not yet complete.



NARA Is Planning to Acquire an Advanced Electronic Records Archiving 

System:



NARA’s long-term strategic initiative is to develop an advanced 

electronic records archive. The agency’s goals for this system are to 

preserve and provide access to any kind of electronic record, free from 

dependency on any specific hardware or software, so that the agency can 

carry out its mission into the future.



Although the new archival system is not yet formally defined, agency 

documents, public presentations, and interviews with agency officials 

and staff indicate, in broad outline, how they envision this system. It 

will probably be a distributed system, allowing the storage and 

management of massive record collections at a variety of installations, 

with accessibility provided via the Internet. It may be based on 

persistent object preservation, an advanced form of file format 

conversion and encapsulation (described in app. II) that is the subject 

of research sponsored by NARA and other organizations. A leading 

candidate for performing this encapsulation and capturing the necessary 

information is the Extensible Markup Language (XML), which provides a 

means for “tagging” (annotating) information in a meaningful fashion 

that can be readily interpreted by disparate computer systems (XML is 

further discussed in app. II).



NARA has indicated that ERA will be a major system, and that it is 

likely that it will be developed and implemented in several phases (or 

“builds”), with each phase adding more functions to the system. 

According to NARA, its development will take several years, and it will 

involve a significant expenditure of resources on program management, 

research, and systems development activities.



NARA is planning to award the contract for the new electronic archival 

system in January 2004. Table 1 is a timeline showing key tasks for the 

program.



Table 1: Timeline for ERA Program:



Key ERA tasks: Develop vision statement; Completion dates: 

March 1, 2002[ A].



Key ERA tasks: Develop concept of operations; Completion 

dates: April 1, 2002[ B].



Key ERA tasks: Conduct market survey; Completion dates: June 

28, 2002.



Key ERA tasks: Perform analysis of alternatives; Completion 

dates: July 22, 2002.



Key ERA tasks: Develop cost estimates; Completion dates: 

August 19, 2002.



Key ERA tasks: Develop high-level conceptual and functional 

requirements; Completion dates: September 24, 2002.



Key ERA tasks: Develop business case/economic analysis; 

Completion dates: September 30, 2002.



Key ERA tasks: Develop final functional requirements; 

Completion dates: December 2, 2002.



Key ERA tasks: Issue Request for Information; Completion 

dates: January 13, 2003.



Key ERA tasks: Release Request for Proposal; Completion dates: 

August 4, 2003.



Key ERA tasks: Fiscal year 2004 budget for ERA In effect; 

Completion dates: October 1, 2003.



Key ERA tasks: Award ERA contract; Completion dates: January 

12, 2004.



[A] Completed April 18, 2002.

[B] Completed in draft on April 1, 2002.



[End of table]



To assist in this effort, NARA contracted with Integrated Computer 

Engineering (ICE), Incorporated,[Footnote 18] a private company 

experienced in systems development and acquisition. With the assistance 

of this contractor, NARA has been establishing the ERA program 

management office. Since July 2001, the program management office has 

been focused on developing the capability to manage the development and 

acquisition of the ERA system.



NARA is also funding two independent assessments of the research into 

the technology that is proposed for ERA. These two independent 

assessments, conducted by the National Academy of Sciences, will review 

research that NARA is now sponsoring, as well as alternative 

approaches. The first assessment is a technical review of the viability 

of persistent object preservation, the architecture for persistent 

archives of electronic records that is being researched by the National 

Partnership for Advanced Computational Infrastructure (see app. II). 

This assessment--scheduled for completion on January 31, 2003--will 

address the adequacy and soundness of the persistent object 

preservation architecture as a whole, as well as its major components, 

from the points of view of computer science, systems engineering, and 

archival sciences. NARA has stated that the assessment of the 

persistent object information management architecture and its technical 

validation should be completed before ERA is developed. In its fiscal 

year 2002 budget hearings, NARA referred to the articulation of the 

persistent object preservation architecture as the one “major 

dependency” in its strategy for acquiring an ERA system.



The second assessment will identify and evaluate alternative methods 

for digital preservation of records, examine the operational use of the 

Internet for digital archiving, and identify those aspects of the 

preservation of electronic records that cannot be adequately addressed 

either by state-of-the-art information technology or by technologies 

under development. It will also address the feasibility of 

commercializing new ideas from research. According to NARA, the second 

assessment is to be completed 6 to 9 months after the first.



ERA Schedule Faces Significant Risks:



Although the ERA project is still in its initial stages, it is already 

falling behind schedule. As shown in table 1, the initial deliverables 

for design and acquisition are late: the vision statement, due March 1, 

was not completed until April 18, and the concept of 

operations,[Footnote 19] due April 1, was delivered in draft form on 

that date and had not been finalized as of May 31. This lateness can be 

attributed to flaws in how the schedule was developed. In its tracking 

of ERA risks, NARA has acknowledged that the schedule for completion of 

tasks was based on incomplete work projections, and that its deadlines 

may not be achievable. Rather than constructing a plan based on 

estimates of the amount of work and resources required to complete each 

task, NARA constructed a “success oriented” schedule that was planned 

around ensuring that ERA was funded beginning in fiscal year 2004.



In addition, the ERA program management office is behind schedule on 

its efforts to develop the plans and guidance to strengthen its 

capability for managing the acquisition and deployment of ERA. In July 

2001, with the help of its systems development and acquisition 

contractor, the office began focusing on developing these plans and 

procedures. We tracked planned and actual completion dates for 13 

policy and planning documents that the program management office needs 

in order to develop and acquire a major system (according to NARA and 

its contractor). To date, however, only 7 of the 13 documents have been 

completed.[Footnote 20] The 7 that have been delivered were late by an 

average of over 2 months. The initially planned delivery dates of the 

other 6 documents have passed; on average these are late by almost 4 

months.[Footnote 21]



Besides the approach taken to constructing the schedule, another 

contribution to schedule slippage may be NARA’s slow start in hiring 

full-time government staff for the ERA program management office. For 

fiscal year 2002, NARA was authorized 16 positions for the ERA program 

office. However, as of April 2002, NARA had only 5 full-time staff on 

board.



NARA Is Strengthening IT Management Capabilities, but These Efforts Are 

Incomplete:



Acquiring a major IT system such as the planned electronic archival 

system is a significant challenge for a relatively small organization 

like NARA, whose IT management capabilities are relatively limited. In 

its fiscal year 2002 budget hearings, NARA indicated that it must 

strengthen its IT management capabilities and infrastructure to support 

the ERA program, and NARA is currently taking steps to do so in three 

key areas: IT investment management, enterprise architecture, and 

information security. None of these efforts, however, is yet complete.



Sound IT Management Capabilities Contribute to Success in Acquiring IT 

Systems:



IT investment management provides a systematic method for agencies to 

minimize risks while maximizing the return on investments. The Clinger-

Cohen Act requires agency heads to implement a process for maximizing 

the value and assessing and managing the risks of an agency’s IT 

investments. Our research of leading private and public sector 

organizations’ IT management practices indicates that effective 

investment management requires the use of defined and disciplined 

investment management processes.



An enterprise architecture provides a description--in useful models, 

diagrams, and narrative--of the mode of operation for an agency. It 

describes the agency in both (1) logical terms, such as interrelated 

business processes and business rules, information needs and flows, and 

work locations and users; and (2) technical terms, such as hardware, 

software, data, communications, and security attributes and standards. 

An enterprise architecture provides these perspectives both for the 

current environment and for the target environment, as well as a 

transition plan for sequencing from the current to the target 

environment. Managed properly, an enterprise architecture can clarify 

and help optimize the dependencies and relationships among an agency’s 

business operations and the underlying IT infrastructure and 

applications that support these operations.



Information security is an important consideration for any organization 

that depends on information systems to carry out its mission. Our study 

of security management best practices, as summarized in our 1998 

executive guide,[Footnote 22] found that leading organizations manage 

their information security risks through an ongoing cycle of risk 

management. This management process involves (1) establishing a 

centralized management function to coordinate the continuous cycle of 

activities while providing guidance and oversight for the security of 

the organization as a whole, (2) identifying and assessing risks to 

determine what security measures are needed, (3) establishing and 

implementing policies and procedures that meet those needs, 

(4) promoting security awareness so that users understand the risks and 

the related policies and procedures in place to mitigate those risks, 

and (5) instituting an ongoing monitoring program of tests and 

evaluations to ensure that policies and procedures are appropriate and 

effective.



NARA Is Improving Its IT Investment Management Processes:



The Clinger-Cohen Act of 1996 requires agencies to establish an IT 

investment process that provides the means for senior management to 

obtain timely information regarding the progress of investments in an 

information system, including a system of milestones for measuring 

progress in terms of cost, timeliness, quality, and the capability of 

the system to meet specified requirements. Weak IT investment 

management processes significantly increase the risk that agency funds 

and resources will not be efficiently expended.



The first step toward establishing effective investment management is 

putting in place foundational, project-level control and selection 

processes. These foundational processes allow the agency to identify 

variances in project cost, schedule, and performance expectations; to 

take corrective action, if appropriate; and to make informed, project-

specific selection decisions.



The second major step toward effective investment management is to 

continually assess proposed and ongoing projects as an integrated and 

competing set of investment options. This portfolio management approach 

enables the organization to consider the relative costs, benefits, and 

risks of new and previously funded investments and thereby identify the 

mix that best meets its mission, strategies, and goals.



NARA’s IT investment management policies and processes were assessed 

and reported on by its inspector general (IG) in April 2000. The report 

identified several strengths in NARA’s IT investment management 

processes, including having an IT investment board, a defined process 

for selecting projects, criteria to be applied in considering whether 

to undertake a particular IT investment, ratings of each investment’s 

breadth of impact, and a determination of the net benefits and risks be 

identified for proposed investments. However, the IG identified 

weakness and made 13 recommendations for strengthening NARA’s IT 

investment management processes. NARA concurred with all 

recommendations. While it has to date fully addressed only 2 of the 

recommendations, it plans to resolve the remaining 11 issues by 

September 30, 2002.



While NARA’s investment management process has several strengths and 

NARA continues to improve process weaknesses, NARA has yet to complete 

its efforts to establish a mature investment management capability. 

Lacking a fully mature investment management process increases the risk 

that the electronic archival system will not be implemented on time and 

within budget, and that crucial resources and funds for meeting the 

electronic records challenges will not be invested effectively and 

efficiently. Specifically, if NARA management’s oversight of the ERA 

program is not based on complete information (including comparisons of 

the actual cost and schedule to the estimated cost and schedule, as 

well as identification of project risks and benefits), the risk is 

increased that NARA management will not be able to determine whether 

the ERA program is having schedule or other problems and ensure that 

corrective actions are taken.



NARA Is Developing an Enterprise Architecture:



The importance of enterprise architecture development, implementation, 

and maintenance is a basic tenet of effective IT management. Used in 

concert with other IT management controls, an enterprise architecture 

can greatly increase the chances for optimal mission performance. We 

have found that attempting to modernize operations and systems without 

an enterprise architecture leads to operational and systems 

duplication, lack of integration, and unnecessary expense.



Over the past several years, NARA has taken action to develop an 

enterprise architecture. NARA has drafted a current architecture and is 

working on a target architecture, but this work is incomplete.[Footnote 

23] However, the process to develop the electronic archival system is 

well under way. Without an enterprise architecture to guide its 

development, NARA increases the risk that the planned electronic 

archival system will be incompatible with existing and future 

operations and systems, thus wasting resources and requiring that 

unnecessary interfaces be built to achieve integration.



NARA Is Improving Information Security, but Has Not Yet Completed Key 

Tasks:



NARA is currently strengthening its information security, having 

recognized that it has numerous weaknesses. Significant security 

weaknesses were identified by two IG assessments (conducted in fiscal 

years 2000 and 2001) and a NARA-initiated vulnerability assessment of 

its network (performed concurrently with the IG assessments). As a 

result of these assessments, the Archivist of the United States 

declared information security a material weakness in fiscal year 

2000.[Footnote 24] Actions taken by the Archivist to addresses these 

shortcomings and respond to recommendations identified in the reports 

include establishing an information security program, updating and 

developing new security policy documents, developing contingency plans 

and business recovery plans, and strengthening firewalls across the 

network to control inbound and outbound traffic. NARA said that it 

would implement the IG’s recommendations by June 28, 2002, and by the 

end of fiscal year 2002 it plans to have rectified the shortcomings 

that led to its information security being declared a material 

weakness.



However, although NARA is making progress in strengthening its 

information security, two additional weaknesses could affect the ERA 

program. First, NARA currently lacks a program for assessing agencywide 

information security risks. Federal guidance requires all federal 

agencies to establish comprehensive information security programs based 

on assessing and managing risks.[Footnote 25] Risk assessments provide 

a basis for establishing appropriate policies and selecting cost-

effective techniques to implement these policies. NARA intends to 

develop an agencywide risk assessment capability in fiscal year 2003, 

but it is not clear that this will allow vulnerability assessments to 

be completed before ERA is developed. Without a method to identify and 

evaluate risks, NARA cannot be assured that it has effective mechanisms 

for protecting its information assets: networks, systems, and 

information associated with ERA. Because a compromise of security in a 

single poorly secured system can undermine the security of multiple 

systems, NARA needs to complete vulnerability assessments of all 

systems that will interface with ERA.



Second, because NARA lacks an enterprise architecture, it may have 

difficulty addressing agencywide security. Federal guidance calls for 

agencies to make security controls for systems consistent with and an 

integral part of the enterprise architecture of the agency.[Footnote 

26] Without an enterprise architecture that addresses security issues 

agencywide, NARA cannot be sure that its current or future archiving 

systems are adequately protected.



These weaknesses may be particularly significant for ERA, because this 

system presents security issues that NARA has never before addressed, 

according to an initial assessment report on ERA prepared by NARA’s 

systems development and acquisition contractor.[Footnote 27] The 

proposed distributed structure of ERA introduces the security risks 

associated with the Internet--threats to the integrity of data and to 

data accessibility. According to the Federal Bureau of Investigation, 

Internet systems are threatened by hackers (who may be terrorists, 

transnational criminals, and intelligence services) using information 

exploitation tools such as computer viruses, worms, Trojan horses, 

logic bombs, and eavesdropping sniffers.[Footnote 28] As Internet usage 

increases, the Internet has become an increasingly tempting target, and 

the number of reported Internet-related security incidents is 

growing.[Footnote 29] The effect on ERA of the vulnerabilities of the 

Internet would have to be assessed and addressed.



Conclusions:



In response to the challenges associated with managing and preserving 

electronic records, NARA has performed an assessment of governmentwide 

records management--an important first step that identified several 

problems, including the inadequacy of guidance on electronic records, 

the low priority generally given to records management, and the lack of 

technology tools to manage electronic records. While NARA has plans to 

improve its guidance and address the need for technology, it has not 

yet formulated a strategy to deal with the stature of records 

management programs across government. Further, it has no strategy for 

acquiring the kind of comprehensive information on records management 

that would be provided by systematic inspections and evaluations of 

federal records programs. Without such a strategy, records management 

will likely continue to be considered a low-priority “support” activity 

lacking appropriate management attention, and NARA will not acquire 

information needed to address problems in agency records management and 

guidance. Inadequacies in records management put at risk records that 

may be valuable: records providing information on essential government 

functions, information that is necessary to protect government and 

citizen interests, and information that is significant for the 

historical record.



NARA’s effort to acquire an advanced electronic records archive is at 

risk. NARA is not meeting its schedule for the ERA system, largely 

because of flaws in how the schedule was developed. As a result, the 

schedule will be compressed, leaving less time for completing essential 

planning tasks. In addition, NARA has not yet improved IT management 

capabilities that would reduce the risks inherent in its effort to 

acquire ERA. Without these capabilities, NARA risks spending funds to 

acquire a system that does not meet mission needs and requirements, 

effectively work with existing systems, or provide adequate security 

over the information it contains.



Recommendations for Executive Action:



To address the low priority given to records management programs across 

government, we recommend that the Archivist of the United States 

develop a documented strategy for raising agency senior management 

awareness of and commitment to records management principles, 

functions, and programs. Further, we recommend that the Archivist 

develop a documented strategy for conducting systematic inspections of 

agency records management programs to (1) periodically assess agency 

progress in improving records management programs and (2) evaluate the 

efficacy of NARA’s governmentwide guidance.



To mitigate the risks associated with the acquisition of an advanced 

electronic archival system, we recommend that the Archivist reassess 

the ERA project schedule. A revised schedule should be developed, based 

on estimates of the amount of work and resources required to complete 

each task, that allows sufficient time for NARA to:



* complete essential planning tasks and:



* strengthen its IT management capabilities by (1) implementing an IT 

investment management process, (2) developing an enterprise 

architecture, and (3) improving information security.



Agency Comments and Our Evaluation:



In written comments on a draft of this report, which are reprinted in 

appendix V, the Archivist of the United States generally agreed with 

our recommendations but provided clarifications concerning records 

management priority, inspections, and the ERA schedule. NARA also 

provided technical comments, which we have incorporated as appropriate.



The Archivist agreed with our recommendation that NARA develop a 

strategy for raising agency senior management awareness of and 

commitment to records management principles, functions, and programs, 

adding that the responsibility for oversight of records management is 

not NARA’s alone, but is shared by the Office of Management and Budget 

(OMB), the General Services Administration (GSA), and the heads of 

federal agencies. Further, he acknowledged that more needs to be done 

to have a major effect on agency leadership. The Archivist, however, 

disagreed with our conclusion that NARA does not plan to address the 

low priority generally given to records management.



Our conclusion was not meant to imply that NARA does not intend to 

address the priority of records management. We acknowledge NARA’s past 

efforts to raise awareness of the importance of records management and 

its stated plans to further address this issue. Instead, our conclusion 

reflects the fact that NARA’s written plan to reform federal records 

management policies and practices--which NARA refers to as its Records 

Management Initiatives--does not currently address this issue. We 

believe that to be successful, NARA must document its plans to address 

the low priority of records management programs across government, 

including specific goals, strategies, and milestones. Such a plan is 

critical in ensuring concurrence on planned actions among the key 

players that NARA mentions, including federal agencies, GSA, and OMB; 

that appropriate resources are assigned; and that NARA has the means to 

track progress against its goals.



The Archivist also agreed with our recommendation that NARA develop a 

strategy for conducting systematic inspections of agency records 

management program, but noted that continuing its past inspection 

program, as cited in the report, would not succeed. NARA disagreed with 

our conclusion that it has no plans to address the issue of records 

management inspections, noting that it plans to use risk management 

analysis while leveraging its inspection resources. The Archivist said 

that this approach would include an assessment of broad categories of 

important records across agencies, agency-specific interventions, and 

the use of NARA’s authority to report the results of evaluations of at-

risk records to OMB and the Congress.



We are not suggesting that NARA resurrect its past inspection program, 

which it concluded was basically flawed. However, we also do not 

believe that NARA’s current targeted assistance approach is an 

appropriate substitute for systematic inspections and evaluations of 

federal records programs. In regard to our conclusion, it is again 

based on the fact that the written strategy for the Records Management 

Initiatives does not address the need for systematic inspections. We 

acknowledge NARA’s statement that it plans to use a risk-based approach 

to addressing this issue, but we reiterate the need for a documented 

plan with associated goals, strategies, and milestones.



In commenting on our recommendation that NARA reassess the ERA project 

schedule, the Archivist stated that such a reassessment is prudent and 

that NARA intends to conduct such reassessments repeatedly, both 

periodically from an overall program management viewpoint and on a 

continuing basis as part of its ERA risk management activity. The 

Archivist noted that NARA is currently reassessing the schedule as part 

of its refinement of the ERA acquisition strategy, and that this 

reassessment will address the issues raised in our report.



Regarding the schedule for the ERA system, the Archivist noted that 

while some program documentation was not completed on schedule, all 

items on the ERA project’s “critical path” have been completed on time, 

and NARA expects to meet all milestones on the critical path this year. 

We disagree. As discussed in our report, the development of key program 

documents--such as the ERA vision statement and the concept of 

operations--were affected by delays. For example, the ERA vision 

statement, planned for completion on March 1, 2002, was not completed 

until April 18, 2002, approximately 6 weeks late. Similarly, the 

concept of operations, due on April 1, 2002, and which NARA 

documentation shows as being on the critical path, was delivered in 

draft form on that date and had not been finalized as of May 31. 

Falling behind schedule in the initial stages presents risks to 

successful and timely completion of the ERA project and is one of the 

reasons we are recommending that the agency reassess its schedule.



The Archivist also disagreed with our conclusion that if the results of 

the two National Academy of Sciences assessments are not fully 

reflected in the ERA requirements, there is added risk that the 

technical strategy underlying the development of the system will prove 

not to be optimal, and that alternatives will not have been considered. 

The Archivist noted that NARA should receive the first National Academy 

of Sciences report at a time when it expects to receive the industry’s 

response to NARA’s request for information, and that the report will 

provide an unbiased, expert view of the feasibility of building a 

system that is inherently evolutionary, addressing the core problem of 

digital preservation. According to the Archivist, NARA will factor both 

the scientific and the industry views into its articulation of a draft 

request for proposals. In regard to the second National Academy of 

Sciences report, the Archivist noted that its primary purpose is to 

provide input to NARA’s long-range plans for addressing the continuing 

evolution of information technology and electronic records, and that 

the report will be useful in revising the ERA research plan to address 

new problems and opportunities identified by the experts, and in plans 

for successive builds of the ERA system.



We acknowledge NARA’s clarification regarding the timing and use of the 

two NAS studies and believe this approach should assist in developing a 

system that will meet mission needs. Accordingly, we have revised our 

recommendation to reflect this.



We are sending copies of this report to the Ranking Minority Member, 

Subcommittee on Government Efficiency, Financial Management and 

Intergovernmental Relations, House Committee on Government Reform, and 

to the Ranking Minority Member, Subcommittee on Treasury, Postal 

Service and General Government, House Committee on Appropriations. We 

are also sending copies to the Archivist of the United States, the 

Secretary of Housing and Urban Development, the Secretary of State, the 

Secretary of Commerce, the Secretary of Veterans Affairs, and the 

Administrator of NASA. This report will also be available on GAO’s home 

page at http://www.gao.gov.



If you have any questions concerning this report, please call me at 

(202) 512-6240 or Mirko J. Dolak, Assistant Director, at (202) 512-

6362. We can also be reached by E-mail at koontzl@gao.gov and 

dolakm@gao.gov, respectively. Key contributors to this report were 

Timothy Case, Barbara Collier, Jamey Collins, David Plocher, and Megan 

Savage.

Linda D. Koontz

Director, Information Management Issues:



Signed by Linda D Koontz:



Appendix I: Objectives, Scope, and Methodology:



Our objectives were to:



* determine the status of NARA’s efforts to respond to governmentwide 

electronic records management problems and the adequacy of its future 

plans and:



* assess NARA’s efforts to acquire an archival system for electronic 

records.



As part of our assessment of NARA’s efforts to acquire an electronic 

records archiving system, we were also asked to identify alternative 

technologies under consideration for the long-term preservation of 

electronic records.



To determine the status of NARA’s efforts to assess and respond to 

governmentwide electronic records management problems and the adequacy 

of its future plans, we reviewed federal legislation and NARA records 

management guidance, available studies, and reports; surveyed NARA’s 

appraisal archivists working with federal agencies; reviewed records 

management activities and obtained the views of record managers in 

selected federal agencies managing large volumes of electronic records-

-the Departments of State, Commerce, Housing and Urban Development 

(HUD), and Veterans Affairs (VA), as well as NASA and the Patent and 

Trademark Office; and reviewed legal challenges to federal electronic 

recordkeeping practices, including Public Citizen v. John Carlin and 

Scott Armstrong v. Executive Office of the President. We also reviewed 

NARA’s documentation of its effort to redesign its approach and 

guidance for the management of electronic records. As part of this 

effort, we investigated whether agencies are scheduling their major 

information systems and the related databases; to do so, we asked five 

major agencies--Commerce, HUD, VA, State, and NASA--what portion of 

their major information systems were scheduled and placed under the 

agency records management program. We based our assessment on the 

inventory of Year 2000 mission-critical systems reported by 24 major 

agencies to the Office of Management and Budget.[Footnote 30] In 

addition, to determine the status of the Library of Congress’ National 

Digital Information Infrastructure and Preservation Program and its 

relationship to NARA’s efforts to design and acquire advanced 

electronic archival system, we discussed the program’s objectives and 

schedule with Library of Congress officials.



To assess NARA’s efforts to acquire an archival system for electronic 

records, we reviewed agency and contractors’ documentation for the 

electronic records archive (ERA) program, including program and project 

phasing; on the basis of federal requirements and information industry 

practice, we assessed NARA’s effort to develop or enhance its 

information technology capabilities, including information technology 

investment management, enterprise architecture, and information 

security.



To identify alternative technologies under consideration for the long-

term preservation of electronic records, we reviewed archival studies 

and literature, and we surveyed selected digital preservation 

approaches used by the information industry and selected national 

governments. In addition, we contacted the archives of three 

judgmentally selected foreign countries (Australia, Canada, and the 

United Kingdom) that had been identified by records management 

professionals as using advanced electronic records management and that 

we had previously reviewed.[Footnote 31] We also contacted the Public 

Record Office of Victoria, Australia; although this archive is not at 

the scale of a national archive, we included it because it has employed 

a unique technological approach to archiving electronic records.



We performed our work from June 2001 to May 2002 in accordance with 

generally accepted government auditing standards.



[End of section]



Appendix II: Approaches to Archiving Electronic Records Provide Partial 

Solutions:



The challenge of managing and preserving the vast and rapidly growing 

volumes of electronic records produced by modern organizations is 

placing pressure on archives and on the information industry to develop 

a cost-effective long-term preservation strategy that will free 

electronic records from the constraints of proprietary file formats and 

software and hardware dependencies. Part of this strategy will involve 

ways to capture and use information about the records to make them 

accessible, as information in card catalogs does in traditional 

libraries. After considerable research in this area, some agreement is 

being reached on the metadata (data about data) required for preserving 

electronic records, and some practical applications are using XML 

(Extensible Markup Language[Footnote 32]) for creating such metadata.



However, there is no current solution to the electronic records 

archiving challenge, and so archival organizations now rely on a 

mixture of evolving approaches that generally fall short of solving the 

long-term preservation problem. The four most common approaches--

migration, emulation, encapsulation, and conversion--are in use or 

under consideration by the major archives. NARA is supporting the 

investigation of a new approach involving records conversion (known as 

persistent object preservation), but this has yet to mature.



Recognizing that archival solutions may be some time off, companies in 

the information industry are relying on off-the-shelf technology for 

providing access to billions of electronic records. These commercial 

archives, however, concentrate on electronic records of types that are 

relatively uniform in comparison to those that a government archive 

must address.



Archiving Requires Documentation of Attributes and Relationships of 

Records:



Archives use catalogs of various types to capture information about 

records, information that is critical for sharing, storing, managing, 

and accessing records effectively--particularly in the context of 

millions of records. Because such information is data containing 

descriptive information about other data, it is referred to as 

metadata. Metadata are a central element of any approach to ensure that 

preserved records are functional. For electronic records, the metadata 

needed are often more extensive than information in traditional 

catalogs, including information that is important for preservation.



Metadata Provide Information Necessary to Describe Electronic 

Collections:



The creation of accessible software-and hardware-independent 

electronic records requires that all materials that are placed in 

archives be linked to information about their structure, context, and 

use history. Metadata to be associated with electronic records may 

include information about:



* the source of the record;



* how, why, and when it was created, updated, or changed;



* its intended function or purpose;



* how to open and read it;



* terms of access, and:



* how it is related to other software and records used by the 

originating organization.



These metadata must be sufficient to support any changes made to 

records through various generations of hardware and software, to 

support the reconstruction of the decisionmaking process, to provide 

audit trails throughout a record’s life cycle, and to capture internal 

documentation. Without an adequately defined metadata structure, an 

effective electronic archive cannot be constructed.



Numerous research projects have examined the question of defining 

metadata that would be sufficient to ensure digital preservation. 

Although archives experts note that unresolved issues remain, the work 

on preservation metadata is beginning to move from the research area to 

practice. The Public Record Office Victoria (Australia), a state 

archive, has published standards for the management of electronic 

records that includes a metadata model originally developed by the 

National Archives of Australia.



For incorporating metadata, the Victoria archive mandates the use of 

XML. XML is being actively considered by archives and researchers as a 

promising approach to generating metadata.



XML Enables Infrastructure-Independent Description of Electronic 

Records:



XML is a flexible, nonproprietary set of standards for annotating 

(“tagging”) data with semantically rich labels that permit computers to 

process files on the basis of their meaning.[Footnote 33] Like the more 

familiar HTML (Hypertext Markup Language) files used on the World Wide 

Web, XML files can be easily transmitted via the Internet, and with 

appropriate software, they can be displayed by Web browsers. The 

difference is that HTML is used only for telling computers how to 

display information for a human being to view, whereas the semantically 

based XML tags allow computers to automatically interpret and process 

XML files.



XML is called extensible because it is not a fixed format. Instead, XML 

is actually a “metalanguage”--a language for describing other 

languages--which allows the design of customized markup languages for 

limitless different types of documents. Thus, although in the beginning 

stages of adoption, XML is viewed as a promising format for a wide 

range of applications.[Footnote 34]



Several XML attributes make it attractive for archive applications. The 

semantic nature of XML tags makes XML suitable for recording metadata. 

Its extensibility would allow archives to expand their systems to 

accommodate evolving needs. As an open standard, it reduces the 

problems of proprietary software. Further, because they are basically 

text files, XML files can be readily interpreted by disparate computer 

systems. Even without the mediation of software, human beings can 

interpret an XML-tagged file, because XML tags are human readable (see 

fig. 4). This quality allows them to be preserved both on computer 

media and on paper (so that they would be readable both by human beings 

and automatically through optical character recognition).



Figure 4: Sample of XML Version of State Department Telegram:



[See PDF for image]



Source: San Diego Supercomputer Center.



[End of figure]



Figure 4 is an example of a text document--a World War II vintage 

telegram in the Franklin D. Roosevelt library--converted to XML 

format.[Footnote 35] The XML “tags” provide the means for identifying-

-and retrieving--key pieces of information, such as date sent, 

addressee, and place of sender. If the file were viewed in an XML-

compliant Web browser, the tags in the telegram would not be visible, 

and the telegram itself could be displayed in various ways for the 

convenience of the human reader. At the same time, the presence of the 

tags permits computer systems to perform powerful searches and exchange 

data.



XML is also used by the National Archives of Australia,[Footnote 36] 

which converts files from their native formats to XML versions, while 

retaining a copy of the original source file. The Australian archives 

has also developed a metadata model, but it has not yet determined its 

final preservation metadata requirements.



Electronic Archives Take Combinations of Approaches to Preservation:



For long-term preservation of electronic records, electronic archives 

must address the problems of obsolescence and aging of storage media, 

the dependence of electronic records on the software and hardware on 

which they were created, the complexity of electronic records, and the 

massive volumes of records created by often decentralized systems. 

According to one archival expert, a viable strategy for long-term 

preservation for electronic records would call for “a long-lived 

solution that does not require continual heroic effort or repeated 

intervention of new approaches every time formats, software, or 

hardware paradigms, document types, or recordkeeping practices 

change.”[Footnote 37]



Since no one solution is yet available that addresses all the problems, 

most archives and other institutions that preserve records use a 

variety of approaches, often in combination. The current approaches for 

dealing with the technical issues associated with long-term electronic 

archiving are:



* technology preservation--maintaining old technologies to allow access 

to old formats;



* emulation--using software running on new-technology platforms to 

mimic old technologies;



* migration--transferring digital materials from one hardware/software 

configuration to another, or from one generation of computer technology 

to a subsequent generation;[Footnote 38]



* encapsulation--grouping together a digital object with other 

information necessary to provide access to that object; and:



* conversion to standard formats--transforming records into objects 

that are relatively software and hardware independent.



The recent development of durable analog storage media (that is, media 

that preserve images of human-readable documents, much as microfiche 

does) suggests the possibility of approaches that combine those above 

with the use of analog rather than digital media.[Footnote 39]



Technology Preservation Is a Short-Term Solution Only:



Technology preservation refers to the practice of maintaining outdated 

equipment well after it is useful in everyday business processes. Under 

this approach, electronic files or records, which are saved in their 

native formats, continue to be accessible through the use of original 

hardware and software. In the short term, this is a simple and cost-

effective approach, and some organizations do maintain older 

information systems only to be able to access their records.[Footnote 

40]



However, this approach is at best an interim solution to the problem of 

the dependence of electronic records on the software and hardware on 

which they were created. The solution eventually fails, because 

maintaining the original technology grows increasingly difficult and 

costly with the passage of time. Further, it does not solve the problem 

of aging and obsolescent storage media, which would also grow more 

difficult if not impossible to replace. Issues of cataloging and 

metadata are also not addressed by this approach. With the seemingly 

endless introduction of new hardware and software, the sheer number of 

differing formats and applications, and the cost to maintain any and 

all systems, technology preservation is not a feasible strategy for the 

long term.



Emulation Is Currently More Theoretical Than Practical for Electronic 

Archiving:



A proposed approach to the problem of software and hardware dependence 

is emulation, which aims to preserve the original software environment 

in which records were created. Emulation software mimics the 

functionality of older software (generally operating systems) and 

hardware. Under the emulation approach, data files are stored along 

with copies of the creating software as well as software that emulates 

the hardware/operating system required to run the software.[Footnote 

41] This technique seeks to recreate a digital document’s original 

functionality, look, and feel by reproducing, on current computer 

systems, the behavior of the older system on which the document was 

created. In other words, an emulation strategy means that nothing is 

done to the original electronic file; rather, the original environment 

is recreated. Since the original file remains unaltered, emulation also 

offers a solution to the problem of preserving the original 

functionality and the “look and feel” of complex digital files.



Emulation has been in practical use on computer systems for many years:



* IBM mainframes emulate previous mainframes in order to support legacy 

systems and allow several generations of operating system versions to 

be run.



* Operating system emulators allow a single computer to provide more 

than one operating environment (such as Macintosh and Windows).



* Emulation software allows desktop computers to run video games and 

legacy video gaming systems.



However, according to one archival expert, emulation has not yet been 

applied to preserving archival documents in any systematic way. 

Although emulation could in theory be part of a solution to the problem 

of hardware and software independence, it is just beginning to be 

explored as an archival approach. Emulation is under consideration as 

one of various archiving approaches by the United Kingdom’s Public 

Record Office.[Footnote 42]



One problem unique to emulation is that intellectual property rights 

issues may be involved when either operating systems or applications 

are emulated.[Footnote 43] Even if the software and hardware are 

obsolete, their copyrighted specifications are not likely to be 

released for the benefit of archival integrity. Further, the use of an 

emulated operating system or application introduces outmoded programs 

into a modern environment, requiring users to understand how to use 

them; in other words, using the old software may require expert 

knowledge of the outdated systems--knowledge that is likely to 

disappear.



Other problems with emulation include the increasing possibility that 

software failures will occur as the old systems continue to age and the 

pool of expertise concerning them shrinks. Emulation assumes that the 

emulated software will continue to run without maintenance. As the year 

2000 date conversion problem showed, this is not a safe assumption, as 

it is possible that software may contain bugs that may eventually cause 

catastrophic loss of information.[Footnote 44] Further, an emulation 

approach depends on several components working together (the emulation 

software, the original application, and the data); as the number of 

components increases, so does the risk of failure.



Migration of Both Media and File Formats May Preserve Records:



Migration refers to the periodic transfer of digital materials from one 

format configuration to another, or from one generation of computer 

technology to a subsequent generation. In the context of archiving, 

migration can refer both to the media on which information resides 

(conversion from older to newer media or forms of media) and to the 

formats in which it is encoded (conversion from one file format or 

system to another).



The first type of migration, media migration, has been so far 

unavoidable: it is the standard approach to the problem of media 

obsolescence and aging. In media migration, records are moved from 

older storage media to newer media, either to avoid the obsolescence or 

decay of an older medium or to upgrade to a more advanced medium (often 

to increase storage capacities while reducing cost). However, media 

migration alone does not ensure that the electronic records transferred 

to the new media continue to be accessible, especially if their format 

is obsolete. As new storage technologies evolve--including extreme-

longevity analog media such as the High Density Rosetta disk discussed 

later in this appendix--the migration process may become less frequent 

and more efficient.



The second type of migration, format migration, is a process of 

preservation by conversion: specifically, format migration is defined 

as rearranging the original sequence of structural and data elements of 

a file to conform to another configuration. Such migration occurs 

whenever older systems and formats are displaced by newer, often more 

advanced systems and formats. Many organizations have, for example, 

converted old database systems to newer systems, and in the process 

they have converted the formats of the records they contain.



The major difficulty with format migration is the risk of altering 

records during conversion from the source to the target format. For 

conversions to be successful, those performing the transition must have 

knowledge of the original application and data formats,[Footnote 45] 

and the more complex the file structure, the more important this 

knowledge is. Whether the application is commercial or generated in 

house, over time this knowledge may be lost and with it the ability to 

perform a successful migration. For such reasons, migration has been 

described as cost effective only for certain types of records that 

remain in operational use.[Footnote 46] For records in use, problems 

with imperfect conversion are more likely to be discovered by users, 

and organizational resources are more likely to be devoted to ensuring 

that these are resolved or mitigated.



Further, although format migration has occurred in many contexts in the 

past, it has not been extensively used in archiving. Most electronic 

archives are relatively new, so they are dealing with records in 

current formats created by systems that are still operational. Thus, 

they have not yet experienced the need to incorporate format migration 

into their processes. Rather, they treat migration as a future option 

for dealing with preserving the types of records that they are 

currently storing.



As a strategy for the long-term preservation of electronic records, 

relying on format migration is risky. Migration as a preservation 

strategy would have to be a continuous process, with conversions 

occurring whenever a new format needed to be introduced. With each 

format conversion, the possibility of loss would be increased, and the 

more complex the record, the more the possibility of loss. Thus, 

migration is at best an imperfect solution as it can potentially lead 

to the loss of record integrity.



Migration was selected by the United Kingdom’s Public Record Office as 

its current archival approach. In addition to migration, the Public 

Records Office is also considering using emulators and viewers to 

access archived files in their native formats.



Encapsulation Preserves Both Records and Information about Records:



Encapsulation is the combining of several elements to create a new 

single entity; in the context of archiving, the elements would be the 

records themselves, metadata identifying and describing the records, 

and possibly other elements (such as viewers enabling the records to be 

read).[Footnote 47]



Unlike migration, encapsulation does not necessarily involve a change 

in the original file format. If the format is unchanged, encapsulation 

would avoid the problem of loss of integrity that migration entails. 

Leaving records in their native formats would leave open the 

possibility of processing the objects with the original software, and 

it would also permit subsequent transformation of the encapsulated 

records using methods that were not available when the records were 

originally placed into the archives.[Footnote 48]



Encapsulation is currently being used by the Victoria Public Records 

Office in Australia.[Footnote 49] The Victoria archive uses XML to 

encapsulate records along with standardized metadata describing each 

record in a Victorian Electronic Record Strategy (VERS) 

format.[Footnote 50] The VERS format mandates the use of XML to 

describe and encapsulate records. However, the Victoria archive has 

only recently begun applying its process, and its electronics records 

collection is as yet small (described as “a few records”), so it is 

premature to judge its effectiveness for large-scale, long-term 

preservation.



Conversion to Standard Formats Makes Records Less Dependent on Hardware 

and Software:



Conversion transforms records into standard text formats such as 

ASCII[Footnote 51] or XML to increase their independence from hardware 

and software. This approach is currently used by the National Archives 

of Canada[Footnote 52] and by NARA (both of which accept databases in 

ASCII format), as well as the National Archives of Australia,[Footnote 

53] which converts files from their native formats to XML, while 

retaining a copy of the original source file.



The Victoria archives is using a combination of conversion and 

encapsulation in its preservation approach, because before 

encapsulating selected types of documents, it is requiring their 

conversion (where appropriate) to Adobe Systems’ Portable Document 

Format (PDF). PDF is a compact format that preserves all the fonts, 

formatting, graphics, and color of any source document, regardless of 

the software and hardware used to create it. Although PDF is a 

proprietary file format, PDF files can be shared, viewed, navigated, 

and printed exactly as intended by anyone with the freely distributed 

Adobe Acrobat Reader.



The primary shortcomings of the conversion approach are the limitations 

and the longevity of the selected standard.[Footnote 54] For example, 

converting databases to ASCII format limits their usefulness: the 

conversion of a relational database to flat ASCII database tables will 

eliminate the embedded information about the relationships among data 

elements.[Footnote 55] Conversion to XML, on the other hand, may 

involve fewer such limitations, but it depends on the XML standard 

remaining in use and accessible.



NARA is investigating an advanced form of conversion combined with 

encapsulation known as persistent object preservation (POP). Under this 

approach, records are converted by XML tagging and then encapsulated 

with metadata. According to NARA, the persistent object transformation 

approach would make electronic records self-describing in a way that is 

independent of specific hardware and software. The architecture for POP 

is being developed through the National Partnership for Advanced 

Computational Infrastructure. The partnership is a collaboration of 46 

institutions nationwide (including NARA) and 6 foreign affiliates, with 

the San Diego Supercomputer Center serving as the technical resource.



According to NARA, persistent object preservation would accommodate 

preservation of persistent but evolving collections by providing the 

ability to dynamically reconstruct data collections on new technology. 

The result would be a system that could upgrade individual technical 

components and migrate media while safeguarding the archived records. 

POP would thus not only enable the use of future, advanced 

technologies, it would also reduce threats to integrity and 

authenticity, because POP would not require changes in the preserved 

data. However, POP may not be sufficiently mature to be translated into 

system design.



Migration to Durable Analog Media May Offer Hybrid Approach:



An archive that stores records digitally must use media migration as a 

preventive measure to avoid decay and obsolescence. However, the use of 

analog storage offers a possible alternative that may diminish the need 

for media migration. Whereas all current media now record digital 

information as 0’s and 1’s, analog storage of documents is suggested by 

a new product, called a High Density Rosetta, developed by Norsam 

Technologies (see fig. 5).



Figure 5: The Long Now Foundation Rosetta Disk Language Archive:



[See PDF for image]



Source: Rolfe Horn, courtesy of the Long Now Foundation.



[End of figure]



The nickel-plated disk, which has a life expectancy that is orders of 

magnitude longer than current electronic media,[Footnote 56] allows the 

analog storage of information and images that are readable via an 

electron or optical microscope. Such a medium could avoid the 

obsolescence created by software-reliant media. The plates are 

physically inscribed by an ion beam, through a process known as ion 

milling.[Footnote 57] This medium can store on each side of its 2-inch 

plate over 196,000 pages (with electron microscope retrieval) or 5,000 
to 

18,000 pages (with optical microscope retrieval). Using a text-based 
coding 

system such as XML would permit both coded (software readable) and 
image 

(human readable) information to be stored on this long-lived medium. 
The 

migration issue would then arise if new software were to be adopted, 
but 

the image information would persist.



The High Density Rosetta is being used by the Long Now Foundation to 

create an extreme-longevity archive of selected languages.[Footnote 58] 

According to the foundation, 50 to 90 percent of the world’s languages 

are predicted to disappear in the next century, many with little or no 

significant documentation. As part of the effort to secure this 

critical legacy of linguistic diversity, the foundation initiated the 

Rosetta Project,[Footnote 59] an effort to develop a contemporary 

version of the historic Rosetta Stone. The project’s goal is the 

development of a permanent archive of 1,000 languages. For storage of 

this archive, the project is using the High Density Rosetta to micro-

etch text of archived languages at a scale readable by a 1,000-power 

optical microscope.



Information Technology Industry Relies on Off-the-Shelf Technologies to 

Provide Access to Electronic Collections:



While government and academic institutions are searching for a 

permanent solution to electronic records archiving problems, the 

private sector, also concerned about and affected by the potential loss 

of electronic records, relies on existing information architectures and 

off-the-shelf technologies to make accessible massive volumes of 

electronic records dating back over two decades. These archiving 

achievements do not meet the rigorous requirements for permanence and 

authenticity that are demanded by a government archive, nor are their 

owners required to process, store, and access the full range of complex 

file formats encountered by governments. However, they do illustrate 

the capability to provide storage and access to large quantities of 

data. Two of the most notable private sector efforts are the Internet 

Archives and the Google archive of Usenet messages.



Internet Archives:



The Internet Archives has created a digital library of Internet sites 

and other born-digital cultural artifacts. It is attempting to archive 

the entire publicly available Web, offering free access to researchers, 

historians, scholars, and the general public. Anyone with access to the 

Internet can, through the Internet Archives Web site,[Footnote 60] 

navigate the Web at any moment in time from 1996 to the present. This 

collection of Web pages contains over 100 terabytes, or 10 billion Web 

pages, and it is currently growing at a rate of 12 terabytes per month. 

The stored and accessible 100 terabytes is larger than the amount of 

data contained in the world’s largest libraries, including the Library 

of Congress, making it the largest known database in existence. Without 

the efforts of the Internet Archives, these 10 billion Web pages might 

have been lost. As it is, they provide a record of the origins and 

evolution of the Internet, as well as a reflection of societal 

interests and opinions at different moments in time. This is 

particularly true in the case of Web sites such as those of 

presidential candidates (see fig. 6) and of monumental events such as 

the September 11 attacks, both of which have prominence on the Internet 

Archives Web site as “Special Wayback Collections.”:



Figure 6: Internet Archive Collection of Presidential Candidate Web 

Sites:



[See PDF for image]



Source: Internet Archives.



[End of figure]



According to the Internet Archives, it has achieved inexpensive storage 

on a major scale: it uses off-the-shelf technology at a cost of about 

$4,000 per terabyte. As a preservation strategy, the Internet Archives 

currently uses media migration to avoid media obsolescence and take 

advantage of technological advances to reduce costs. As a safety 

measure, backup copies of a part of the collection are also created.



Google:



Google claims to have the largest index of Web sites available on the 

World Wide Web and the industry’s most advanced search technology. 

Google’s Web site also contains an archive of Usenet messages that 

cover the past 20 years (see fig. 7).[Footnote 61] Usenet is a 

collection of text messages that are posted on Internet electronic 

bulletin boards. These bulletin boards--which existed before E-mail, 

Web browsers, and the Web itself--provide avenues for communication in 

an open forum, allowing others to read and reply. Some notable “posts” 

included in Google’s Usenet Archives are the first post mentioning 

Microsoft (1981), the first post mentioning a compact disc (1982), and 

the posts sent just after the September 11 attacks.



Figure 7: Google’s Usenet Archive:



[See PDF for image]



Source: Google.



[End of figure]



Google currently provides access to more than 700 million messages 

dating back to 1981, and this number is rapidly increasing. Google’s 

collection is by far the most complete collection of Usenet articles 

ever assembled. Before Google’s acquisition of the archive, posts 

without activity were usually deleted from the live discussion forums 

after a few days or weeks, and therefore they were not viewable or 

searchable by users. Some feel that Google’s Usenet archive is an 

irreplaceable and invaluable reference, representing “the human side 

of the Internet” through first-hand accounts of historical events.



[End of section]



Appendix III: NARA’s Electronic Records Guidance Has Evolved:



A review of the development of electronic records guidance issued by 

the National Archives and Records Administration (NARA) over the last 

several decades demonstrates the extent to which the rapid evolution of 

information technology has posed significant challenges for NARA in its 

role of providing guidance to federal agencies concerning the 

management of electronic records under the Federal Records 

Act.[Footnote 62]



NARA provides guidance for electronic records management and 

disposition largely through two sets of guidance:



* the electronic records management regulation, which provides general 

responsibilities for agency management of electronic records;[Footnote 

63] and:



* the general record schedules, which provide disposal authorization 

for specific categories of temporary records common to most 

agencies.[Footnote 64]



The history of these two sets of guidance reflects the evolution of 

NARA’s electronic records guidance.



Electronic records management was given a formal role in 1968 when 

NARA, then the National Archives and Records Service (NARS) of the 

General Services Administration (GSA), established a unit to develop 

policies for selecting and preserving electronic records. This Data 

Archives Staff undertook to develop three sets of guidance: (1) 

inventory guidance--forms for inventorying magnetic tape files; (2) 

environmental guidance--recommendations for proper handling and 

storage of magnetic tape; and (3) GRS 20--a general records schedule 

for computerized records.



Of that guidance, GRS 20 emerged as NARA’s first significant electronic 

records guidance. It was intended to cover electronic records created 

by mainframe applications in the then-dominant agency data processing 

operations. The major purpose was to address the efficient disposition 

of those electronic records, including destruction of unneeded 

temporary records and transfer to NARS (NARA) of permanent records.



The 1972 GRS 20, entitled Data Automation Program Records, stated, 

“This schedule covers machine readable records, related documentation 

required for their servicing, and files related to the automatic data 

processing (ADP) procurement, operations, and management functions.” 

GRS 20 divided these records into categories that “correspond roughly 

to the typical organizational and functional structure found in most 

ADP installations and their parent organizations.”[Footnote 65]



According to recent NARA summaries, the 1972 GRS 20 was meant “to 

provide disposal authority for specific categories of temporary records 

associated with mainframe applications. Excluded from its coverage, and 

all subsequent revisions, were the types of records generated by large 

data systems that might have archival value.”[Footnote 66] The clear 

meaning of the 1972 GRS 20, however, was that it was not meant merely 

to identify and provide for efficient disposal of “ancillary materials 

common to most data processing operations.”[Footnote 67] Quite the 

contrary, the guidance identified a range of records that should be 

scheduled through filing of a Standard Form 115. These ranged from 

various temporary records to potentially permanent records, such as 

master data files.



GRS 20 was revised in 1977.[Footnote 68] While the 1977 revision 

restructured the 1972 electronic records categories, it retained the 

earlier purpose of providing disposition instructions for virtually all 

records associated with data processing operations--temporary and 

permanent, program and administrative.[Footnote 69]:



In 1983, GSA issued Bulletin FPMR B-127, Archives and Records, which 

provided guidance on records created or maintained “using personal 

computers and electronic information storage or transmission equipment 

(electronic filing and electronic mail).”[Footnote 70] According to the 

bulletin, “The proliferation of personal computers in many Federal 

agencies and the implementation of sophisticated electronic filing and/

or mail systems has created a need for adaptation of traditional 

records management techniques for the control and disposal of records 

and information.” The bulletin then reiterated that the disposition of 

all records regardless of physical form is controlled by the Federal 

Records Act and instructed agencies to ensure “that appropriate 

internal controls are instituted to prevent the loss or alienation of 

official records created or acquired in electronic form.”:



Two pieces of similar guidance followed in 1985. First, NARA issued 

Bulletin 85-2 to provide general guidance “on how to manage records 

created, stored, or transmitted using personal computers or other 

electronic office equipment including word processors.”[Footnote 71] 

This bulletin again rooted electronic records management in the 

fundamental requirements of the Federal Records Act: “The creation, 

maintenance, and disposition of all official records regardless of 

physical form is controlled by the provisions of [the Federal Records 

Act and implementing regulations].”:



Two weeks after issuing Bulletin 85-2, NARA issued an ADP Records 

Management regulation.[Footnote 72] This rule was the first version of 

the regulation still found at 36 CFR 1234. The rule consolidated 

guidance consistent with the goals of the 1968 Data Archives Staff, 

requiring each agency (in very summary terms) to:



* establish a program for the management of ADP records, including 

classifying, preserving, and scheduling machine-readable records; and:



* ensure proper care, handling, and storage of magnetic computer tapes 

and disk packs.



The next major step in the evolution of NARA’s electronic records 

guidance occurred in the 1988 revision of two general records 

schedules: GRS 20, now entitled Electronic Records, and GRS 23, Records 

Common to Most Offices within Agencies.[Footnote 73] The revisions 

significantly modified the scope of both general records schedules and, 

for the first time, provided disposal authority for personal computer 

records in GRS 23.



With regard to GRS 20, the 1988 revision altered its scope, stating, 

“This schedule applies to disposable electronic records routinely 

stored on magnetic media by Federal agencies in central data processing 

facilities.” As opposed to the broad purpose of the 1972 and 1977 

versions, which had been to provide disposition guidance for all 

electronic records associated with data processing operations, the 1988 

GRS 20 discussed only disposable records. All references to scheduling 

records were removed. This change was not limited, however, to GRS 20. 

It reflected a NARA decision that all general records schedules should 

pertain only to disposable records. The intent was to rely on other 

guidance to provide instructions about scheduling and disposition of 

permanent records, such as the regulation at 36 CFR 1234 and the 

Appraisal Guidelines for Permanent Records, now published as an 

appendix in NARA’s Disposition of Federal Records handbook.



The second major change in 1988 was the GRS 23 treatment of records 

generated on personal computers. Like the 1988 GRS 20, the 1988 GRS 23 

was explicitly limited to disposable records: “The records covered by 

this schedule relate to routine internal administrative and 

housekeeping activities.” GRS 23 provided disposal authority for 

temporary administrative records generated by end-user applications on 

stand-alone or networked computers. This included word processing 

files, spreadsheets, and administrative databases. In addition to 

authorizing the destruction of administrative or housekeeping records 

when no longer needed, the 1988 GRS 23 authorized the deletion of 

electronic versions of records created after they were printed to hard 

copy, unless the records were maintained only in electronic form. If 

the electronic record was maintained only in electronic form, it could 

be deleted only after the expiration of the retention period authorized 

for the hard copy by the GRS or a NARA-approved SF 115. As NARA 

subsequently stated, its acceptance of paper recordkeeping for 

electronic records was based on the assessment that even with the 

growing use of computers, “agencies continued to maintain records 

produced with office automation applications in organized paper files, 

especially since end-user applications were not designed to classify, 

index, and maintain documents for their authorized retention period …” 

Thus, the revised GRS authorized deletion of word processing and E-mail 

records after they had been copied to paper or microform.[Footnote 74]



The 1988 revisions to GRS 20 and 23 were followed by the 1990 revision 

to NARA’s electronic records management regulation.[Footnote 75] This 

revision continued the purposes of the 1985 bulletins, but provided 

more detailed mandates for “procedures to manage electronic records, to 

provide for the selection and maintenance of electronic storage media, 

and to follow the legal requirements for the disposition of such 

records.” Agency requirements under this still valid and largely 

unchanged regulation include the following:



* develop and implement an agencywide electronic records management 

program;



* establish procedures for addressing records management requirements 

before approving new electronic records systems or enhancements to 

existing systems; and:



* specify the location, manner, and media in which electronic records 

will be maintained to meet operational and archival requirements, and 

maintain inventories of electronic records systems.



While NARA endeavored to create a comprehensive electronic records 

management scheme through the combination of affirmative guidance, such 

as the 1990 regulation, and the revised general records schedules, the 

GRS 20 principle that paper printouts could substitute for electronic 

records became the focus of controversy through a lawsuit challenging 

the 1989 destruction of White House E-mail tapes. The case, Armstrong 

v. Executive Office of the President, spanned several years and 

involved multiple issues and court rulings. In a 1993 ruling in that 

case, the U.S. Court of Appeals ruled that paper printouts of E-mail 

messages were not adequate substitutes for electronic versions stored 

on computer tapes because they “may omit fundamental pieces of 

information which are an integral part of the original electronic 

records, such as the identity of the sender and/or recipient and the 

time of receipt.”[Footnote 76] Thus, the court rejected the 

government’s argument that “electronic records are merely ‘extra 

copies’ of the paper versions,” and concluded that “since there are 

often fundamental and meaningful differences in content between the 

paper and electronic versions of these documents, the electronic 

versions do not lose their status as records and must be managed and 

preserved in accordance with the FRA.”:



Largely in response to the court’s findings, NARA revised GRS 20 in 

1995.[Footnote 77] First, as an organizational matter, it moved the 

electronic records instructions from GRS 23 into GRS 20 in order to 

have a single general schedule for all disposable electronic records. 

This resulted in combining instructions for the broad format categories 

of word processing files, electronic mail records, and electronic 

spreadsheets with those for specific functional categories of 

administrative records, such as backup files, finding aids, and systems 

operations records. Second, as a substantive matter, NARA now 

instructed agencies to “identify records created using office 

automation and to maintain them in a recordkeeping system that 

preserves their content, structure, and context for their required 

period.” According to the GRS,



“Only after the records have been properly preserved in a recordkeeping 

system will agencies be authorized by GRS 20 to delete the versions on 

the electronic mail and word processing systems. As indicated, most 

agencies have no viable alternative at the present time but to use 

their current paper files as their recordkeeping system. As the 

technology progresses, however, agencies will be able to consider 

converting to electronic recordkeeping systems for their records.”:



Thus, NARA stated in the 1995 GRS, “Program records that have been 

transferred to the recordkeeping system will not be affected by GRS 

20.” However, because NARA accepted the use of paper files as 

appropriate recordkeeping systems for electronic records, this logic 

permitted the disposal of electronic versions of records that required 

retention or permanent preservation. Accordingly, while GRS 20 did not 

authorize the destruction of program records, it did permit the 

destruction of electronic copies of those records.



In 1997, a Federal District court, in Public Citizen v. John Carlin, 

overturned the 1995 GRS 20, finding that it did not go far enough to 

direct agencies to protect electronic records.[Footnote 78] The court 

ruled that NARA should not have treated electronic records as 

disposable simply because they could be copied into another form:



“[The] differences between electronic and paper records illustrate the 

fact that the administrative, legal, research, and historical value of 

electronic records is not always fully captured--indeed, is usually not 

captured--by paper or microfiche copies. Electronic records therefore 

do not become valueless duplicates or lose their character as ‘program 

records’ once they have been printed on paper; rather, they retain 

features unique to their medium.”:



The court also found that NARA failed to perform its statutory duty to 

evaluate the value of records for disposal: “By categorically 

determining that electronic records possess no administrative, legal, 

research or historical value beyond paper print-outs of the same 

document or record, the Archivist has absolved both himself and the 

federal agencies he is supposed to oversee of their statutory duties to 

evaluate specific electronic records as to their value.”:



In response to the district court ruling, NARA established an 

Electronic Records Work Group to review the 1995 GRS 20 and make 

recommendations for revisions. It also issued a number of pieces of 

guidance to reflect the District Court’s ruling.[Footnote 79]



On August 6, 1999, the U.S. Court of Appeals for the D.C. Circuit 

upheld NARA’s GRS 20, reversing the District Court decision that had 

overturned the 1995 GRS 20.[Footnote 80] The Court of Appeals rejected 

the lower court’s reasoning that NARA had authorized destruction of all 

types of word processing and E-mail records without regard to content: 

“GRS 20 does not authorize disposal of electronic records per se; 

rather, such records may be discarded only after they have been copied 

into an agency recordkeeping system.”:



The court acknowledged that an electronic recordkeeping system would be 

superior to a paper recordkeeping system, but it also agreed with NARA 

that agencies should be free “to maintain their recordkeeping systems 

in the form most appropriate to the business of the agency.” Thus the 

court said,



“We agree with Public Citizen that electronic recordkeeping has 

advantages over paper recordkeeping, but our duty as a reviewing court 

is to ask only whether the Archivist’s policy choice is arbitrary or 

capricious; manifestly it is not. All agencies by now, we presume, use 

personal computers to generate electronic mail and word processing 

documents, but not all have taken the next step of establishing 

electronic recordkeeping systems in which to preserve those records. It 

may well be time for them do so, but that is a question for the 

Congress or the Executive, not the Judiciary, to decide.”:



Finally, the court found that the 1995 GRS 20 met the Armstrong test of 

requiring that electronic records be stored in a manner that captures 

all relevant transmission data.



As a result of the Court of Appeals ruling, NARA instructed agencies to 

again use the 1995 GRS 20 to dispose of temporary electronic records 

after recordkeeping copies were filed in electronic, paper, or 

microform recordkeeping systems.[Footnote 81] NARA did say, however,



“We believe there may be better alternatives to GRS 20 for disposition 

authority for electronic copies of program records and expect to 

develop those alternatives as part of a comprehensive review of the 

policies and procedures for scheduling and appraisal of records in all 

formats. The Court decision provides the Government time to include 

electronic copies in this overall review. Our review may result in 

significant changes in the way that agencies schedule their records in 

the future. When we have completed this review, we will promulgate new 

guidance.”:



On October 10, 2001, NARA published a notice seeking public comment on 

a petition for rulemaking filed by the Public Citizen Litigation Group 

(a plaintiff in both Public Citizen v. John Carlin and Armstrong v. 

Executive Office the President) requesting NARA to revise its 

electronic records management regulations.[Footnote 82] In this notice, 

NARA stated that it was currently “evaluating alternatives to GRS 20 

for disposition authority as part of a comprehensive review of the 

policies and procedures for scheduling and appraisal of records in all 

formats.” As of May 2002, this review was ongoing.



[End of section]



Appendix IV: Agencies Are Managing Large Volumes of Important 
Electronic 

Records:



Agencies are facing the complex challenge of managing electronic 

records and in some cases maintaining these records on a long-term 

basis. For example, because of their particular missions, NASA, the 

Patent and Trademark Office, Veterans Affairs (VA), and the State 

Department must each electronically manage millions of electronic 

records, either long-term or permanently. In some instances, the 

volumes of electronic records that these agencies manage are far larger 

than the volumes of permanent electronic records that NARA currently 

archives. The experiences of these agencies highlight electronic 

records management and the gaps in existing guidance.



National Aeronautics and Space Administration:



NASA is committed to the long-term preservation of massive volumes of 

electronic space science data and images of our solar system. The 

observational data sets from NASA missions record the continually 

changing aspects of our Earth and represent an asset that must be 

retained in a findable, accessible, and usable state. The agency 

proposed to permanently maintain these data within the agency in order 

to support future science usage. Presently, NASA’s National Space 

Science Data Center archives over 20 terabytes of digital space science 

data from past and present NASA missions, of which 3 terabytes are 

currently electronically accessible. In addition, the Hubble Space 

Telescope has created a data archive of over 7 terabytes of images of 

our solar system, and continues to archive an additional 3 to 5 

gigabytes every day. Archiving and ensuring data integrity of all these 

electronic records require periodic data renewal cycles, involving 

migration from old to new media, resource-intensive data reorganization 

and reformatting, or even recreation of related software.



Because these records are of permanent value and NARA has no means to 

archive them in any useful way, NASA retains custody of them. They 

accordingly fall into an undefined category: they are permanent records 

that NARA cannot archive. The current arrangement by which they are 

maintained is not covered by NARA guidance. Nor is NASA’s archiving 

approach covered by this guidance, which does not cover migration and 

archival formats (other than flat ASCII files on tape), management of 

digital images, or maintenance of electronic records in databases for 

extended periods of time.



U.S. Patent and Trademark Office:



The Patent and Trademark Office manages and indefinitely preserves 

millions of digitized patents and trademarks. Patent examiners must 

have access to a complete collection of the history of U.S. patents in 

order to research prior art before approving new patents. Recently, the 

office replaced the examiners’ collection of paper patents with EAST 

(Examiners Automated Search Tool) and WEST (Web Examiner Search Tool), 

which are complete electronic patent collections containing the full 

text of over 2.5 million U.S. patents and full images of over 6.5 

million U.S. patents and over 14.5 million foreign patents. In 

addition, the Patent and Trademark Office has digitized the text and 

images of over 2.7 million trademark applications and registration. The 

Patent and Trademark Office has been using XML[Footnote 83] to develop 

and implement systems to support the filing, examination, publication, 

and archival storage of intellectual property documents in electronic 

format.



The Patent and Trademark Office’s digitization program has highlighted 

an issue that is not adequately addressed by NARA guidance: that is, 

when a record exists in many versions (electronic, paper, microform, 

etc.), which should be considered primary? Many of the patent files 

that have been digitized were originally paper files, and it has been 

argued that destroying the original paper versions after digitization 

has led to or risked loss of important information.[Footnote 84] Just 

as converting an electronic original to paper may lead to information 

loss, so may the reverse. NARA guidance does not address this issue, 

leaving agencies at risk of losing information.



Department of Veterans Affairs:



VA must manage and preserve, for 75 years, millions of electronic 

medical and benefit records. An integral part of VA’s enrollment 

process for each veteran applying for health benefits is the use of 

several Veterans Health Information Systems and Technology Architecture 

(VISTA) databases to enter and verify veteran eligibility information. 

This information must be maintained in the system and accessible for 

the life of the veteran in order to document entitlement to health care 

benefits, which VA has determined to be a maximum period of 75 years. 

One enrollment database alone contains information for 9 million 

veterans.



VA patient enrollment records present another instance of the confusion 

regarding scheduling requirements for electronic records and for 

records in multiple versions. Although VA is working toward a 

completely electronic process, enrollment records are initiated on 

paper because of current legal requirements for ink signatures. In 

general, however, VA does not schedule electronic records when it has 

scheduled the paper version. It is NARA policy, however, that 

electronic records must also be scheduled. According to VA, another key 

challenge that it faces is ensuring the validity and authenticity of 

electronic records, and it would like to see adequate guidance and 

standards about electronic signatures from NARA so that all government 

agencies are using the same approach.



Department of State:



State electronically preserves over 25 million diplomatic cables and 

more than 400,000 digital images of correspondence of the Secretary of 

State. The State Archiving System (SAS) is a repository for over 25 

million cables, from 1973 to the present, documenting the conduct of 

U.S. foreign policy. The cables are managed electronically for 25 years 

before they are due to be transferred to NARA. However, if the cable 

records in SAS had been transferred to NARA for archiving, they would 

no longer have been accessible to users.



NARA has responded to the State Department’s archiving and access needs 

by developing a new system (Access to Archival Databases), which is 

expected to be available in the summer of 2002. This system will allow 

NARA to provide on-line access to archived State Department cables. 

When the system is available, the cable records will be transferred to 

NARA for archiving.



In addition, the Secretariat Tracking and Retrieval System (STARS) 

tracks approximately 440,000 digital images of foreign policy memoranda 

and correspondence of the Secretary of State from 1986 to the present. 

Both STARS and SAS must not only preserve the records, but also 

maintain reliable and rapid access to the image data. As technologies 

change, preserving and providing access to the records present complex 

electronic records management challenges.



The State Department’s records management office has sole 

responsibility for maintaining SAS, and it has had to proceed with the 

long-term management and preservation of the system records--

periodically updating and migrating all the images to reflect new 

technologies--without guidance from NARA. NARA guidance does not 

address updating or migration of file formats.



[End of section]



Appendix V: Comments from the National Archives and Records 

Administration:



National:



Archives at College Park:



8601 Adelphi Road College Park, Maryland 20740-6001:



May 30, 2002:



Joel C. Willemssen Managing Director Information Technology Team 

General Accounting Office 441 G Street NW Washington, DC 20548:



Dear Mr. Willemssen:



Thank you for the opportunity to review and comment on the draft report 

on challenges in managing and preserving electronic records. The report 

recognizes the enormous challenges the Federal Government faces in 

managing and preserving electronic records and many of the actions the 

National Archives and Records Administration has taken to meet those 

challenges. Nevertheless, we agree that more must be done, and we 

support the report’s recommendations. We would like to clarify several 

points in the report, however, and have suggested some technical 

corrections in an attachment.



Records Management:



The report recommends that we develop a strategy for raising agency 

senior management awareness of and commitment to records management 

principles, functions, and programs. We certainly agree with this 

recommendation, and are active on a number of fronts to raise senior 

management awareness of and commitment to records management in Federal 

agencies. Such activities include:



*The Deputy Archivist of the United States and I along with senior NARA 

program officials have held a series of meetings with agency heads on 

the importance of records management and specific agency records 

issues.



*The Deputy Archivist and I speak at agency conferences to emphasize 

the importance of records management. For example, in April I addressed 

senior leadership at the Treasury Department’s records management 

conference.



*NARA has developed tools (e.g., PowerPoint presentations) that 

agencies can use to do their own management briefings. These have been 

popular with agency records management officers.



*NARA developed specific guidance for senior agency management, 

Documenting Your Public Service, which was distributed to all senior 

officials at the start of the Administration.



*NARA works with the Office of Management and Budget (OMB) to include a 

records management emphasis or implications in new guidance to agencies 

such as the OMB Circular A-130 revision, annual OMB Circular A-11 

revisions, and the Government Paperwork Elimination Act.



*NARA is the managing partner for the Electronic Records Management E-

Government Initiative, which involves a coalition of Federal agencies 

working together to develop policies and tools to improve electronic 

records management.



Despite all of these activities, however, we agree that more needs to 

be done to have a major effect on agency leadership. Effective records 

management must be a partnership, a concept reflected in the U.S. Code. 

As laid out in 44 U.S.C. chapters 29 and 35, the responsibility for 

oversight of records management is shared by NARA, the Office of 

Management and Budget (OMB), and the General Services Administration. 

Of equal importance, the head of each Federal agency is charged with 

the responsibility to make and preserve records (44 U.S.C. 3101) and 

“establish and maintain an active, continuing” records management 

program (44 U.S.C. 3102, emphasis added).



Federal agency management will not take an interest in records 

management unless it can help them meet their business needs. The 

recent Report on Current Recordkeeping Practices within the Federal 

Government, which we commissioned, found that when agencies have a 

strong business need for good recordkeeping, such as for legal or 

operational needs, their recordkeeping practices are better. As part of 

the strategy we are developing for our Records Management Initiatives, 

we plan to create incentives for agencies to work with us in a 

“virtuous cycle” where our records management program adds value to the 

agencies’ business processes and as a result records are kept long 

enough to protect rights, ensure accountability, and document the 

national experience. We disagree with the GAO report’s conclusion that 

NARA does not plan to address the low priority generally given to 

records management. Our whole approach is predicated on the assumption 

that records and records management are integral aspects of agencies’ 

business architectures. In addition, our plans recognize that we need 

to show more leadership with Federal agencies and the Congress on 

records management issues.



The GAO report also recommends that we develop a strategy for 

conducting systematic inspections of agency records management 

programs. While we agree with the thrust of the recommendation, 

continuing our past inspection program as cited in the report will not 

succeed. When NARA undertook the Records Management Initiatives to 

rethink completely how we do records management in the Federal 

Government, we put our evaluation program on hold, pending changes to 

the program, because it was clear we needed to do things differently. 

For example:



*The evaluation program could at best conduct 3 agency evaluations a 

year, meaning it would take at least 60 years to cover the major 

agencies of the Federal Government. *Each evaluation was extremely 

labor intensive involving staff from multiple units (headquarters and 

field) up to a year.



*Because the evaluations were of records management programs, 

responsibility for responding to them fell to records management staff, 

not the program staff who actually managed the records. Where records 

management is not closely identified with the business process, it will 

not be effective.



*Many of the recommendations were broad, could take years to implement, 

and could be extremely resource intensive. Frequently agencies lost 

interest in the issues, especially if there was a change in records 

officer before the action plan was completed.





*Program effectiveness was very uneven. A few agencies (e.g., IRS) 

completed their actions plans in a timely fashion. Yet even though we 

have not started a new evaluation in several years, there are a number 

of agencies that have not completed their action plans.



In addition, the Report on Current Recordkeeping Practices within the 

Federal Government concluded that while NARA should work with 

individual agencies, “given the availability of resources ... NARA may 

wish to carefully consider which agencies should be selected for 

assessment. The situational factors at some agencies may limit the 

likelihood that specific, or any, intervention options can improve 

RM.”’ While heeding this caution, we plan to make evaluations, surveys, 

and inspections part of the strategy we are developing to assess how 

well records are managed in agencies as a result of our Records 

Management Initiatives. We disagree with the GAO report’s conclusion 

that NARA has no plans to address the issue of records management 

inspections. Using risk management analysis while leveraging our 

inspection resources, our approach will include looking systematically 

at broad categories of important records across agencies as well as 

undertaking agency-specific interventions. We also plan to make more 

use, as necessary, of our authority to report the results of 

evaluations to OMB and the Congress, especially on issues related to 

at-risk records.



Electronic Records Archives:



The GAO report recommends that we reassess the Electronic Records 

Archives (ERA) project schedule. We believe that such reassessment is 

prudent and intend to conduct such reassessments repeatedly, both 

periodically from an overall program management viewpoint and on a 

continuing basis as part of our ERA risk management activity. We are 

currently reassessing the schedule as part of our refinement of the ERA 

acquisition strategy. This reassessment will address the issues that 

the report raised, and we will report the results of our reassessment 

to both GAO and our Congressional committees.



We would, however, like to clarify two points of special importance 

related to the ERA project schedule. First, the report states that 

“NARA is not meeting its schedule for the ERA system....” Although some 

program documentation deliverables have not been completed on schedule, 

all items on the “critical path” have been completed on time, and we 

expect to meet all milestones on the critical path this year.



Second, the report suggests aligning the project schedule for 

deliverables from the study NARA is sponsoring by the Computer Science 

and Technology Board of the National Academy of Sciences (NAS) with the 

system acquisition schedule. The NAS study is divided into two parts, 

with a separate report to be issued at the end of each. The division of 

this study into two parts reflects the fact that the preservation of 

electronic records is an open-ended, evolving challenge for which there 

can be no one-time solution. NARA has both near-term and long-term 
needs 

to preserve electronic records. The critical near-term need is to stem 

and prevent the loss of valuable electronic records of the Federal 
Government

by developing the capability to preserve and provide access to them. 
The 

long-term need must incorporate the expectation of continuing and often 

unpredictable change into NARA’s long-range planning.



The first part of the NAS study will assess the technical 

recommendations NARA has received from research we cosponsored, with 

the National Science Foundation, in the National Partnership for 

Advanced Computational Infrastructure (NPACI). It will focus on the 

information management architecture proposed by NPACI for persistent 

archives of digital information. The most basic requirement for any 

digital archives is for a solution that is sustainable in the face of 

continuing and ultimately unpredictable change in information 

technology. Otherwise, the solution itself will come to embody, in a 

relatively short time, the very problems it purports to solve.



Thus, as the GAO report correctly notes, the infrastructure 

independence of the basic architecture is a “major dependency” for the 

acquisition of the ERA system. The NAS report on this topic “is 

expected to address the adequacy and soundness of the architecture as a 

whole and its major components.” But the GAO report asserts, “NARA’s 

planning has left little opportunity for the assessment results to be 

reflected in the ERA design without disrupting the acquisition process 

and increasing the risk to the ERA schedule.” We disagree with this 

conclusion, which assumes NARA will make design decisions about the ERA 

system prior to receipt of the NAS report. In fact, NARA will not even 

begin to address design until well after the NAS report is received. 

The delivery of the first report, projected for January 2003, is timed 

to fit into the schedule for development of the ERA system.



NARA should receive the first NAS report in the same time frame that we 

receive industry’s responses to our planned request for information. 

Those two information sources will be complementary. The NAS report 

will provide an unbiased, expert view of the feasibility of building a 

system that is inherently evolutionary, addressing the core problem of 

digital preservation. The industry responses will indicate how close 

the market is to supporting the development of a system that is 

independent of infrastructure. NARA will factor both the scientific and 

the industry views into its articulation of a draft request for 

proposals.



The GAO report also asserts, “If these results [of the two NAS reports] 

are not fully reflected in the requirements, there is added risk that 

the technical strategy underlying the development of the system will 

prove not to be optimal, and that alternatives will not have been 

considered.” We disagree with this conclusion. NARA is articulating 

requirements to reflect its mission needs and the interests and needs 

of its stakeholders. The requirements will state what the system must 

do, not how it should accomplish these goals. Rather than dictate a 

solution, NARA will ask industry to propose the optimal methods of 

satisfying our requirements. Any other approach would create 

unnecessary and inappropriate barriers to acquiring the best possible 

solution that the market can provide. In this context, it should be 

noted that the NPACI architecture for persistent archives is a notional 

architecture. It does not specify any particular hardware, software or 

network architecture. Furthermore, after contract award for design of 

the system, the ERA Program will enter a requirements definition and 

refinement stage ending with a System Requirements Review, 
collaborating 

with the contractor to finalize what is to be built. This will be 
another 

opportunity to fold in additional research-related information.



With respect to the second part of the NAS study, the GAO report 

states, “By this date [October 1, 2003], the Request for Proposals for 

the electronic archival system will be released, leaving little or no 

opportunity for the results of the second assessment to influence the 

first build of the system.” The primary purpose of the second NAS 

report, however, is to provide input to NARA’s long range plans for 

addressing the continuing evolution of information technology and 

electronic records. As stated in the NAS contract statement of work, 

the second part of the study “will provide a more comprehensive 

discussion of the digital archiving and preservation issues and options 

confronting the National Archives and Records Administration.” The 

second NAS report will be useful in revising the ERA research plan to 

address new problems and opportunities identified by the experts, and 

in plans for successive builds of the ERA system. Even in the initial 

build, we intend to provide the second NAS report to contractors to 

develop designs for the initial build of the ERA system. Given that 

design work will start only after award of the contract, the contractor 

will be able to take the NAS assessment into account in developing its 

design, and NARA will be able to use it in evaluating the design.



Thank you for considering our comments. As your report recognizes, we 

face enormous challenges in managing and preserving electronic records, 

and we welcome the perspective GAO brings to these issues. Work we 

already have underway will be instrumental in meeting the report’s 

recommendations, and we will be pleased to report to you and the 

Congress regularly about our progress.



If you have any questions, please contact Lori Lisowski, Director of 

Policy and Communications, at 301-837-1850.



John W. Carlin Archivist of the United States:



Signed by John W. Carlin:



Enclosure:



[T] SRA International, Inc., Report on Current Recordkeeping Practices 

within the Federal Government, December 10, 2001, p. 32. Emphasis in 

original.



[End of section]



Glossary:



administrative records:



Records created by several or all federal agencies in performing common 

facilitative functions that support the agency’s mission activities, 

but do not directly document the performance of mission functions. 

Administrative records relate to activities such as budget and finance, 

human resources, equipment and supplies, facilities, public and 

congressional relations, and contracting. Administrative records are 

temporary and are covered by general record schedules.



business process:



A collection of related, structured activities--a chain of events--that 

produce a specific service or product for a particular customer or 

customers.



data architecture:



The framework for organizing and defining the interrelationships of 

data in support of an organization’s missions, functions, goals, 

objectives, and strategies. Data architectures provide the basis for 

the incremental, ordered design and development of systems or subject 

databases based on successively more detailed levels of data modeling.



electronic record:



In the context of the federal government, any information that is 

recorded by or in a format that only a computer can process and 

satisfies the definition of a federal record in 44 U.S.C. 3301.



electronic recordkeeping system:



An electronic system in which records are collected, organized, and 

categorized to facilitate their preservation, retrieval, use, and 

disposition.



enterprise architecture:



An institutional systems blueprint that defines in both business and 

technology terms an organization’s current and target operating 

environments and provides a road map for moving between the two.



Extensible Markup Language (XML):



			A flexible, nonproprietary set of standards for tagging information so 

that it can be transmitted using Internet protocols and readily 

interpreted by disparate computer systems.



federal records:



In the context of federal recordkeeping, all books, papers, maps, 

photographs, machine-readable materials, or other documentary 

materials, regardless of physical form or characteristics, made or 

received by an agency of the U.S. government under federal law or in 

connection with the transaction of public business, and preserved or 

appropriate for preservation by that agency or its legitimate successor 

as evidence of the organization, functions, policies, decisions, 

procedures, operations, or other activities of the government or 

because of the informational value of the data in them.



metadata:



Data containing descriptive information about other data.



office automation records:



Electronic records created by means of office automation software, such 

as word processors, spreadsheets, other desktop applications, or 

electronic mail.



office automation:



The techniques and means used for the automation of office activities, 

in particular, the processing and communication of text, images, and 

voice.



permanent records:



Records that NARA appraises as having sufficient value to warrant 

continued preservation by the federal government as part of the 

National Archives of the United States.



Portable Document Format (PDF):



A proprietary de facto standard for electronic document distribution 

worldwide. Created by Adobe Systems, the portable document file format 

preserves all the fonts, formatting, graphics, and color of any source 

document, regardless of the application and platform used to create it.



program records:



Records created by each federal agency in performing the unique 

functions that stem from the distinctive mission of the agency. The 

agency’s mission is defined in enabling legislation and further 

delineated in formal regulations. Program records may be temporary or 

permanent; they must be scheduled.



record:



See federal records.



recordkeeping system:



A manual or automated system in which records are collected, organized, 

and categorized to facilitate their preservation, retrieval, use, and 

disposition.



recordkeeping:



The act or process of creating and maintaining records.



records management:



The planning, controlling, directing, organizing, training, promoting, 

and other managerial activities involved in records creation, 

maintenance and use, and disposition in order to achieve adequate and 

proper documentation of the policies and transactions of the federal 

government.



records management application:



The term used by the Department of Defense’s Design Criteria Standard 

for Electronic Records Management Software Applications (DOD 5015.2-

STD) for software that manages records. The primary management 

functions of such software are categorizing and locating records and 

identifying records that are due for disposition.



records schedule:



A document providing mandatory instructions for what to do with records 

no longer needed for current business, with provision of authority for 

the final disposition of recurring and nonrecurring records.



technical reference model:



A taxonomy that provides a consistent set of service areas, interface 

categories, and relationships to address interoperability and open 

systems; part of an enterprise architecture.



temporary records:



Records appraised as having temporary or limited value and approved for 

destruction either immediately or after a specific period of time.



Usenet:



An Internet-based worldwide distributed discussion system. Usenet 

consists of a set of “newsgroups” with names that are classified 

hierarchically by subject. “Articles” or “messages” are “posted” to 

these newsgroups by people on computers with the appropriate software; 

these articles are then broadcast to other interconnected computer 

systems via a wide variety of networks.



XML:



See Extensible Markup Language.



XML document:



A text document marked up with hierarchically arranged descriptive tags 

and attributes conforming to the XML standard. An XML document can also 

begin with declarations that refer to other files providing further 

instructions for interpreting and displaying data elements.



FOOTNOTES



[1] 44 U.S.C. chapters 21, 29, 31, and 33.



[2] NARA’s regulations implementing the Federal Records Act are found 

at 36 CFR 1200-1280.



[3] PDF is a proprietary format of Adobe Systems, Inc., that preserves 

the fonts, formatting, graphics, and color of any source document, 

regardless of the application and platform used to create it.



[4] A geographic information system is a computer system for capturing, 

storing, checking, integrating, manipulating, analyzing, and 

displaying data related to positions on the Earth’s surface. Typically, 

a GIS is used for handling maps of one kind or another. These might be 

represented as several different layers where each layer holds data 

about a particular kind of feature (e.g., roads). Each feature is 

linked to a position on the graphical image of a map.



[5] In January 2001, NARA directed agencies to provide a one-time 

“snapshot” of their public Web sites as they existed on or before 

January 20, 2001.



[6] National Research Council, Preservation of Historical Records, 

National Academy Press (Washington, D.C.: 1986).



[7] International Council on Archives, Guide for Managing Electronic 

Records from an Archival Perspective (Paris: February 1997).



[8] U.S. General Accounting Office, National Archives: Preserving 

Electronic Records in an Era of Rapidly Changing Technology, GGD-99-94 

(Washington, D.C.: July 19, 1999) (http://www.gao.gov/archive/1999/

gg99094.pdf).



[9] Department of Defense, Design Criteria Standard for Electronic 

Records Management Software Applications, DOD 5015.2-STD (November 

1997) (http://www.dtic.mil/whs/directives/corres/html/50152std.htm).



[10] DOD 5015.2-STD requires that records management applications be 

able to manage records regardless of their media.



[11] SRA International, Inc., Report on Current Recordkeeping Practices 

within the Federal Government (Dec. 10, 2001) (http://www.nara.gov/

records/rkreport.html). Both the SRA study and the NARA staff analyses 

were reported within this document.



[12] The 24 major agencies reported 6,435 mission-critical systems. 

Subcommittee on Government Management, Information, and Technology, 

House Committee on Government Reform, Federal Government Earns B+ on a 

Final Y2K Report Card, news release (Washington, D.C.: Nov. 22, 1999).



[13] According to NARA, its current goals for schedule processing are 

180 days for simple schedules and 365 days for complex schedules. In FY 

2001 the median time for completing schedules was 237 days.



[14] National Archives and Records Administration, An Overview of Three 

Projects Relating to the Changing Federal Recordkeeping Environment 

(January 2001) (http://www.nara.gov/records/rmioverview.html).



[15] Center for History of Physics, American Institute of Physics, AIP 

Study of Multi-institutional Collaborations: Final Report--Highlights 

and Project Recommendations, College Park, MD (2001) (http://

www.aip.org/history/pubs/collabs/highlights.html).



[16] CFR 1220.54 (a).



[17] NARA expects the policy review phase to be completed by the end of 

2002, but according to NARA, all new or revised policies will not be in 

place by that date. The entire project will not be complete until 2006.



[18] On January 15, 2002, American Systems Corporation (ASC) announced 

its acquisition of ICE, Inc. According to the ERA project manager, this 

change does not affect the status of NARA’s contract with ICE, Inc.



[19] A concept of operations is a document that describes 

characteristics of the system from the user’s viewpoint.



[20] The seven completed documents were the acquisition strategy, 

configuration management plan, risk management plan, quality assurance 

plan, life-cycle model, requirements management plan, and technology 

research plan.



[21] The six uncompleted documents were the revised program management 

office (PMO) organization, PMO billet roles/responsibilities, metrics 

plan, PMO training needs assessment, ERA PMO training plan, and program 

management plan.



[22] U.S. General Accounting Office, Information Security Management: 

Learning from Leading Organizations, GAO/AIMD-98-68 (Washington, D.C.: 

May 1998).



[23] NARA’s effort to develop an enterprise architecture includes a 

separate effort to develop a data architecture.



[24] Fiscal Year 2000 Federal Managers’ Financial Integrity Assurance 

(FMFIA) Report to the President.



[25] Chapter 35 of title 44, section 1061, subchapter II--Information 

Security, United States Code.



[26] Office of Management and Budget, Incorporating and Funding 

Security in Information Systems Investments, Memorandum 00-07 

(Washington, D.C.: Feb. 28, 2000).



[27] Integrated Computer Engineering, Inc., Electronic Records Archives 

Initial Assessment Final Report, version 1.2 (Oct. 18, 2001).



[28] Virus: a program that “infects” computer files, usually executable 

programs, by inserting a copy of itself into the file. These copies are 

usually executed when an infected file is loaded into memory, allowing 

the virus to infect other files. Unlike the computer worm, a virus 

requires human involvement (usually unwitting) to propagate. Worm: an 

independent computer program that reproduces by copying itself from one 

system to another across a network. Unlike computer viruses, worms do 

not require human involvement to propagate. Trojan horse: a computer 

program that conceals harmful code. A Trojan horse usually masquerades 

as a useful program that a user would wish to execute. Logic bomb: in 

programming, a form of sabotage in which a programmer inserts code that 

causes the program to perform a destructive action when some triggering 

event occurs, such as termination of the programmer’s employment. 

Sniffer or packet sniffer: a program that intercepts routed data and 

examines each packet in search of specified information, such as 

passwords.



[29] For example, the number of incidents handled by Carnegie-Mellon 

University’s Computer Emergency Response Team (CERT) Coordination 

Center has increased from 1,334 in 1993 to 8,836 during the first two 

quarters of 2000. Similarly, the Federal Bureau of Investigation 

reports that its caseload of computer-intrusion-related cases is more 

than doubling every year.



[30] Subcommittee on Government Management, Information, and 

Technology, House Committee on Government Reform, Federal Government 

Earns a B+ on Final Y2K Report Card, news release (Washington, D.C.: 

Nov. 22, 1999).



[31] U.S. General Accounting Office, National Archives: Preserving 

Electronic Records in an Era of Rapidly Changing Technology, GAO/GGD-

99-94 (Washington, D.C.: July 19, 1999) (http://www.gao.gov/archive/

1999/gg99094.pdf).



[32] XML is a simplified subset of the Standard Generalized Markup 

Language (SGML) used to define portable document formats.



[33] Tagging data in a standard way allows any system that recognizes 

the standard to readily understand and process data that conform to 

that standard. In tagging, a standard format is used to label each 

element of a data set with metadata that clarify what kind of 

information is being provided. Common tagging systems for electronic 

information--also known as markup languages--use labels set off by 

angled brackets to show where data elements begin and end: for example, 

in , the second tag includes a slash to indicate 

that it is a closing tag.



[34] U.S. General Accounting Office, Electronic Government: Challenges 

to Effective Adoption of the Extensible Markup Language, GAO-02-327 

(Washington, D.C.: Apr. 5, 2002).



[35] Amarnath Gupta, Preserving Presidential Library Websites, San 

Diego Supercomputer Center, SDSC TR-2001-3 (Jan. 18, 2001).



[36] National Archives of Australia (http://www.naa.gov.au/).



[37] Jeff Rothenberg, Avoiding Technological Quicksand: Finding a 

Viable Technical Foundation for Digital Preservation, Council on 

Library and Information Resources (January 1999) (http://www.clir.org/

pubs/reports/rothenberg/contents.html).



[38] Task Force on Archiving of Digital Information, Preserving Digital 

Information (May 1, 1996) (http://www.rlg.org/ArchTF/).



[39] HD-Rosetta Archival Preservation Services (http://www.norsam.com/

hdrosetta.htm).



[40] Andrew Waugh, Ross Wilkinson, Brendan Hills, and Jon Dell’oro, 

Preserving Digital Information Forever, Commonwealth Scientific and 

Industrial Research Organisation (CSIRO) Mathematical and Information 

Sciences (undated) (http://pigfish.vic.cmis.csiro.au/~ajw/

PresDigitInfoL.pdf).



[41] Jeff Rothenberg, Using Emulation to Preserve Digital Information, 

Position Paper, NSF Workshop on Data Archiving & Information 

Preservation (Mar. 26, 1999) (http://cecssrv1.cecs.missouri.edu/

NSFWorkshop/ppaper3.html).



[42] The Public Record Office is the national archive of England, 

Wales, and the United Kingdom (http://www.pro.gov.uk/).



[43] Jeff Rothenberg, Using Emulation to Preserve Digital Documents, 

Rand-Europe, Koninklijke Bibliotheek (The Hague: July 2000).



[44] See footnote 40.



[45] See footnote 40.



[46] See footnote 40.



[47] Encapsulation, Preserving Access to Digital Information (PADI) 

(http://www.nla.gov.au/padi/topics/20.html).



[48] Ken Thibodeau, “Building the Archives of the Future: Advances in 

Preserving Electronic Records at the National Archives and Records 

Administration,” D-Lib Magazine (February 2001) (http://www.dlib.org/

dlib/february01/thibodeau/02thibodeau.html).



[49] Public Records Office Victoria (http://www.prov.vic.gov.au/

welcome.htm).



[50] The metadata are based on a model developed by the National 

Archives of Australia.



[51] The ASCII character set of 128 characters includes the familiar 

letters, numbers, and punctuation of the roman alphabet, along with 

certain other characters such as spaces, tabs, and carriage returns.



[52] National Archives of Canada (http://www.archives.ca/).



[53] National Archives of Australia (http://www.naa.gov.au/).



[54] See footnote 40.



[55] A relational database allows the definition of data structures and 

storage and retrieval operations. In such a database the data and 

relations between them are organized in tables. A table is a collection 

of records and each record in a table contains the same fields. Certain 

fields may be designated as keys, which means that searches for 

specific values of that field will use indexing for increased speed. 

Interdependencies among these tables are expressed by data values.



[56] The manufacturer claims a life expectancy of at least 1,000 years 

and a temperature threshold of 500° C.



[57] Ion milling is an etching process in which high-energy gallium 

ions produced by a focused ion beam machine knock atoms from the 

surface and micro-engrave into any given medium.



[58] The Long Now Foundation (http://www.longnow.org).



[59] The Rosetta Project (http://www.rosettaproject.org:8080/live).



[60] Internet Archives (http://www.archive.org/).



[61] Google Groups (http://www.google.com/grphp?hl=en).



[62] 44 U.S.C. chapters 21, 29, 31, and 33.



[63] 36 CFR Part 1234. This rule is supplemented by NARA’s Records 

Management Handbook and periodic guidance on specific issues, e.g., 

NARA Bulletin No. 2000-02 (Dec. 27, 1999).



[64] GRS 20 (August 1995).



[65] GRS 20, Data Automation Program Records, FPMR 101-11.4 (Apr. 28, 

1972).



[66] GRS 20 (August 1995).



[67] History of General Records Schedule 20, Electronic Records 

(www.nara.gov/records/grs20/20hist.html).



[68] GRS 20, Machine-Readable Records, FPMR 101-11.4 (Feb. 16, 1977).



[69] Administrative records are those created in the performance of 

common facilitative functions that support an agency’s mission 

activities, but do not directly document the performance of mission 

functions. Administrative records are temporary. Program records are 

those created in the performance of the unique functions that stem from 

an agency’s mission. Program records may be temporary or permanent; 

they must be scheduled.



[70] GSA Bulletin FPMR B-127 (June 17, 1983).



[71] NARA Bulletin No. 85-2 (June 18, 1985).



[72] 36 CFR 1234, 50 FR 26939 (June 28, 1985).



[73] GRS 20 (June 1988); GRS 23, Records Common to Most Offices within 

Agencies (June 1988).



[74] GRS 20 (August 1995).



[75] Electronic Records Management, 55 FR 19216 (May 8, 1990).



[76] Armstrong v. Executive Office of the President, 1 F. 3d 1274 (Aug. 

13, 1993).



[77] GRS 20 (August 1995).



[78] Public Citizen v. John Carlin, 2 F. Supp. 2d 1 (D.D.C. 1997).



[79] See, e.g., NARA, Disposition of Electronic Records, Bulletin 98-02 

(Mar. 10, 1998); U.S. General Accounting Office, National Archives: 

Preserving Electronic Records in an Era of Rapidly Changing Technology, 

GAO/GGD-99-94 (Washington, D.C.: July 1999).



[80] Public Citizen v. John Carlin, 184 F.3d 900 (D.C. Cir. 1999).



[81] NARA Bulletin 2002-2 (Dec. 27, 1999).



[82] 66 FR 51739 (Oct. 10, 2001).



[83] Extensible Markup Language (XML) is discussed further in appendix 

II.



[84] The potential problem of information lost during the conversion 

from paper to electronic patents was identified in a recent 

Congressional hearing: when searching electronic patent databases for 

prior art, patent searchers miss relevant patents. As noted in 

testimony by an association representing patent researchers, this is 

due to a unique problem related to how an invention is described: “in 

many, if not most, cases the invention is never fully described ‘in the 

words.’ The patent law requires only that the specification, including 

the drawings, together be understandable and enabling to one of 

ordinary skill in the art to make and use the invention. ‘The words,’ 

in many if not most cases, merely ‘flesh out’ what is shown in the 

drawings and do not replicate ‘in words’ what is in the drawings, but 

are ancillary thereto. Thus, in a patent database electronic search one 

is often presented the additional problem of ‘searching’ for ‘words’ 

which were never there to begin with.” --Testimony of James F. Cottone, 

President, National Intellectual Property Researchers Association, 

Oversight Hearing on the U.S. PTO of the Subcommittee on Courts and 

Intellectual Property of the House Judiciary Committee (Thursday, Mar. 

9, 2000) (http://www.house.gov/judiciary/cottone.htm).



GAO’s Mission:



The General Accounting Office, the investigative arm of Congress, 

exists to support Congress in meeting its constitutional 

responsibilities and to help improve the performance and accountability 

of the federal government for the American people. GAO examines the use 

of public funds; evaluates federal programs and policies; and provides 

analyses, recommendations, and other assistance to help Congress make 

informed oversight, policy, and funding decisions. GAO’s commitment to 

good government is reflected in its core values of accountability, 

integrity, and reliability.



Obtaining Copies of GAO Reports and Testimony:



The fastest and easiest way to obtain copies of GAO documents at no 

cost is through the Internet. GAO’s Web site (www.gao.gov) contains 

abstracts and full-text files of current reports and testimony and an 

expanding archive of older products. The Web site features a search 

engine to help you locate documents using key words and phrases. You 

can print these documents in their entirety, including charts and other 

graphics.



Each day, GAO issues a list of newly released reports, testimony, and 

correspondence. GAO posts this list, known as “Today’s Reports,” on its 

Web site daily. The list contains links to the full-text document 

files. To have GAO e-mail this list to you every afternoon, go to 

www.gao.gov and select “Subscribe to daily E-mail alert for newly 

released products” under the GAO Reports heading.



Order by Mail or Phone:



The first copy of each printed report is free. Additional copies are $2 

each. A check or money order should be made out to the Superintendent 

of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or 

more copies mailed to a single address are discounted 25 percent. 

Orders should be sent to:



U.S. General Accounting Office

441 G Street NW, Room LM

Washington, D.C. 20548:



To order by Phone: 	Voice: 	(202) 512-6000 

TDD: 	(202) 512-2537

Fax: 	(202) 512-6061



To Report Fraud, Waste, and Abuse in Federal Programs:



Contact:



Web site: www.gao.gov/fraudnet/fraudnet.htm

E-mail: fraudnet@gao.gov

Automated answering system: (800) 424-5454 or (202) 512-7470



Public Affairs:



Jeff Nelligan, managing director, NelliganJ@gao.gov (202) 512-4800

U.S. General Accounting Office, 441 G Street NW, Room 7149 

Washington, D.C. 20548: