Sustainability of Digital Formats
 Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Introduction >> Overview | Formats, Evaluation Factors, and Relationships | Papers and Presentations | Related Resources

Related Resources for Digital Format Sustainability

Listed here are selected resources that pertain to
• the assessment of digital formats
• efforts to provide gateways and/or listings of format documentation and transcoding tools
• aspects of other digital preservation activities that include recommendations relating to digital content in particular formats.

A separate resource page provides access to freely downloadable specifications for digital formats either directly from the Library of Congress or from standards organizations.

Resources related to particular digital formats will be found in individual Format Descriptions.


View the Library of Congress disclaimer regarding external links  external link disclaimer icon.

Recommendations for additional items are encouraged. [Contact]

Digital format sustainability: analyses and descriptions

DELOS, File Formats Typology and Registries for Digital Preservation (2004)
This project was sponsored by the European Commission and carried out at the Università degli Studi di Urbino. The 54-page document categorizes formats, discusses "digital longevity," and suggests format-assessment criteria for use by archives.
URL: http://www.dpc.delos.info/private/output/DELOS_WP6_d631_finalv2(5)_urbino.pdf

Denmark, the State and University Library and the Royal Library, Handling File Formats (2004)
Lars R. Clausen's document categorizes formats, identifies the aspects important for sustainability, and suggests strategies for preservation.
URL: http://netarchive.dk/publikationer/FileFormats-2004.pdf

Diffuse Standards and Specifications List
The Diffuse project was sponsored by the European Commission. Its primary result, a valuable source for standards documents and specifications, including data representation, is no longer maintained or accessible at its former URL. The Digital Curation Centre in the United Kingdom is currently repurposing and updating the information and remounting it as DCC Diffuse. Entries are being added as prepared. As of January 2008, the Data Representation section, which deals with digital formats, has five entries.
Former URL: http://www.diffuse.org/standards.html
Via Internet Archive Wayback Machine: http://web.archive.org/web/20030622190406/http://www.diffuse.org/
DCC Diffuse at: http://www.dcc.ac.uk/diffuse/

Leeds, University of, Survey and Assessment of Sources of Information on File Formats and Software Documentation
Report from the Representation and Rendering Project at the University of Leeds (UK, n.d., ca. 2003). Describes the publicly available sources of information on file formats and software, with some comments on its quality and completeness.
URL: http://www.jisc.ac.uk/uploaded_documents/FileFormatsreport.pdf

National Library of the Netherlands (KB), Evaluating File Formats for Long-term Preservation
The KB has developed a quantifiable file format risk assessment method, based on seven sustainability criteria that are assigned weights by importance. The KB observes consensus on the criteria but recognizes that the weights assigned to criteria must be guided by institutional policy. With this paper, the KB hopes to inspire other cultural heritage institutions to define their own quantifiable file format evaluation method.
URL: http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf

PIN, Groupe Pérennisation des Informations Numériques
An initiative of Association Aristote and the French space agency (CNES) for long-term preservation of digital information. Developing methodology for evaluating formats for preservation. Web site is entirely in French.
URL: http://pin.cnes.fr/
URL: http://www.ssd.rl.ac.uk/ccsdsp2/mon04/methodology_for_format_evaluation.ppt [presentation on format evaluation methodology]
URL: http://www.ssd.rl.ac.uk/ccsdsp2/mon04/long_term_preservation_criteria.doc [Criteria for evaluating data formats in terms of their suitability for ensuring information long term preservation, C. Huc]

Back to top

Format registries: in production, under development

Global Digital Format Registry
An activity with funding from the Andrew W. Mellon Foundation to build a registry that "will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for those formats." Data models and design documents are available.
URL: http://www.formatregistry.org/
URL: https://collaborate.oclc.org/wiki/gdfr/index.php/Published:Home. Current design and architecture documents.
URL: https://collaborate.oclc.org/wiki/gdfr/documents.html. Earlier documents.

PRONOM Digital Format Database
From the Public Records Office of the United Kingdom, the PRONOM database provides information about file formats and the application software needed to open them.
URL: http://www.nationalarchives.gov.uk/pronom/

Representation Information Registry Repository
An activity with support from the UK's Digital Curation Centre and the CASPAR (Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval) project to build a registry that "curates OAIS reference model (ISO:14721:2002) defined Representation Information which is intended to add meaning to data and aid its long-term preservation." The intent is to build a repository designed to hold representation information interpreted more broadly than technical information about digital formats.
URL: http://registry.dcc.ac.uk/

NGDA Format Registry (from National Geospatial Digital Archive)
This format registry (in beta as of January 2007) is aimed at experts in digital formats for geospatial data. It uses a community-based wiki in a moderated fashion to develop records for formats using a standard template. See http://ngda.library.ucsb.edu/format/index.php/Help:The_Process. Once an entry for a format is judged ready for archiving by the moderator, the information in the wiki is exported and stored, with specifications and supplementary documentation if available, in the NGDA Archive. The information in the wiki template is transferred into an equivalent XML instance for archiving. The archived entry is intended to hold information that will be useful to future users of objects in the format, in particular to provide information that allows future programmers to build software for rendering or migration, when the format is no longer in current use and if no such tools exist.
URL: http://ngda.library.ucsb.edu/format/index.php/Main_Page

Back to top

Format registries: demonstration projects

FOCUS [Format Curation Service, a demonstration]
FOCUS was one element of a research project at the Institute for Advanced Computer Study, University of Maryland (UMIACS) funded by NDIIPP under the DIGARCH program. The intent was to demonstrate that a scalable and secure environment for a global digital format registry could be built using proven web technologies, such as LDAP and web services. The key functionality of FOCUS is to identify software tools for rendering, editing, converting, and validating the formats in the registry. The demonstration system is populated with some of the most common formats and applications and offers a validation service based on JHOVE.
URL: http://www.umiacs.umd.edu/research/adapt/focus/

Back to top

Format registries: related tools

Magic Database File Formats (from magicdb.org)
One page on this site is generated from a simple data base of 'standard" file formats, with links to documentation if available online. Also available from the magicdb.org site is a more extensive list of formats and the associated magic number 'database.' The text-format database is designed for use with a command line (Windows or DOS) tool for identifying files; the tool is distributed at no charge by Optima SC, Inc.
URL: http://www.magicdb.org/stdfiles.html
URL: http://www.magicdb.org/magic.html
URL: http://www.magicdb.org/magic.db

NIST National Software Reference Library
From the National Institute of Standards and Technology, a project to collect software and to incorporate file profiles computed from this software into a reference data set that can be used by law enforcement, government, and industry to identify files found on a computer.
URL: http://www.nsrl.nist.gov

Wikipedia index to digital data storage formats
Wikipedia has information on many digital formats. These can be reached as a body using Wikipedia categories as an index.
URL: http://en.wikipedia.org/wiki/Category:Digital_data_storage_formats

FILExt - The File Extension Source
FILExt is a database of file extensions and the various programs that use them.
URL: http://filext.com/

File Signatures Table from Gary Kessler
A table of file signatures (aka "magic numbers"). The author declares it a work-in-progress. Th ecompilers of this resource found that it had been updated within a month when checked in January 2008.
URL: http://www.garykessler.net/library/file_sigs.html

Wotsit.org
describes itself as a programmer's file and data format resource. This site contains information on hundreds of different file types, data types, hardware interface details and all sorts of other useful programming information; algorithms, source code, specifications, etc.
URL: http://www.wotsit.org/

Back to top

Lists of formats supported by archival institutions or projects

Florida Center for Library Automation (FCLA) Digital Archive
URL: http://www.fcla.edu/digitalArchive/daInfo.htm
Includes Recommended Data Formats for Preservation Purposes, intended for Florida university administrators develop guidelines for submitting files, a listing of formats categorized as preferred, acceptable, and bit-level preservation only.
URL: http://www.fcla.edu/digitalArchive/pdfs/recFormats.pdf

MIT DSpace statement concerning format support
From the DSpace implementation at MIT, a listing of formats categorized as supported, known, and unsupported.
URL: http://libraries.mit.edu/dspace-mit/build/policies/format.html

ProQuest (UMI Dissertation Publishing) Preparing Your Manuscript Guide
Thesis is expected to be in PDF. Guidelines include list of acceptable formats for multimedia files.
URL: http://wwwlib.umi.com/dissertations/about_etds

Library and Archives Canada Guidelines for Computer File Types, Interchange Formats and Information Standards
This document identifies computer file types; interchange formats and information standards that the Library and Archives Canada (LAC) is recommending to facilitate the interoperability of digital information in the Government of Canada (GoC). Recommended file types and interchange formats are also those that are preferred by the LAC for the transfer of digital information to its control after its operational business value to an organization has ceased.
URL: http://www.collectionscanada.gc.ca/government/products-services/007002-3017-e.html

Digital records in the National Archives of Australia
Tha NAA strategy for digital preservation is to "convert proprietary file formats into open, fully-specified, standards-based formats, most of which are are XML-based." The description of the software they have developed includes a list of supported formats
URL: http://www.naa.gov.au/records-management/secure-and-store/e-preservation/at-NAA/software.aspx URL: http://www.naa.gov.au/records-management/secure-and-store/e-preservation/at-NAA/index.aspx

Back to top

Formats for different content categories

Resources related to particular digital formats will be found in individual Format Descriptions.

Alternative File Formats for Storing Master Images of Digitisation Projects
A study of alternative formats for storing master files of digitisation projects of the Koninklijke Bibliotheek (KB) took place in the context of reviewing the KB's storage strategy. The formats reviewed as alternatives to uncompressed TIFF were JPEG, JPEG 2000, PING, and TIFF with LZW compression.
URL: http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/Alternative%20File%20Formats%20for%20Storing%20Masters%202%201.pdf
Citation: Robèrt Gillesse, Judith Rog, and Astrid Verheusen (National Library of the Netherlands, 2008).

Graphics File Formats, 2nd Edition
Reasonable overviews of many formats albeit lacking in the detail for certain types of digital archeology.
Citation: James D. Murray and William vanRyper (Sebastopol, CA: O'Reilly & Associates, 1994).

Graphics File Formats FAQ (Part 3 of 4): Where to Get File Format Specifications
Web site created by James D. Murray that offers links to about 200 sites for graphics file formats. Intended as a mechanism for updating information in the printed Graphics File Formats, 2nd Edition.When consulted in April 2003, the last modified tag reported a date of "20Jan97."
URL: http://www.faqs.org/faqs/graphics/fileformats-faq/part3/preamble.html

Back to top

Resources that include discussion of digital formats

Building an Electronics Records Archive at the National Archives and Records Administration: Recommendations for Initial Development (Pre-publication Draft, 2003)
Report of study by a committee under the auspices of the National Research Council of the National Academies; see especially the section titled "Data Types and Obsolescence," pp. 5-3 to 5-5.
URL: http://books.nap.edu/books/0309089476/html/index.html

eDAVID (Belgium)
eDavid, an activity that builds on research on preserving electronic records done for the city of Antwerp, has produced a handbook, Digital archiving: the new challenge?. Chapter 3 is about file formats.
URL: http://www.expertisecentrumdavid.be/docs/digitalarchiving_manual.pdf
URL: http://www.expertisecentrumdavid.be/eng/edavid.php

Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving email
From the Digital Preservation Testbed project of the Dutch National Archives, this 2003 document is part of a larger report in progress that will also cover text documents and spreadsheets. It discusses email in terms of authenticity (as an official record) and assesses the various preservation strategies that may be applied.
URL: http://www.digitaleduurzaamheid.nl/index.cfm?paginakeuze=185&categorie=2

NLM Journal Archiving and Interchange Tag Suite
From the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM), created with the intent of providing a common format in which publishers and archives can exchange journal content.
URL: http://dtd.nlm.nih.gov/

KB/IBM. Authenticity in a Digital Environment
From IBM and the National Library of the Netherlands (KB), this report (December 2002) discusses a framework for defining what is meant by an authentic digital object. Includes an approach for analyzing content in terms of its makeup; for example, see pp. 16-18.
URL: http://www.kb.nl/hrd/dd/dd_onderzoek/reports/2-authenticity.pdf

KB/IBM. Preservation Requirements in a Deposit System
From IBM and the National Library of the Netherlands (KB), this report (n.d., ca. 2002) presents the requirements for the preservation subsystem of the Digital Information Archiving System (DIAS) under development at the KB. Appendix C (p. 35) lists 35 recognized file types and subtypes to be implemented in the first release of DIAS.
URL: http://www.kb.nl/hrd/dd/dd_onderzoek/reports/3-preservation.pdf

Permanent pixels: Building blocks for the longevity of digital surrogates of historical photographs. René van Horik.
This report was written as a dissertation, defended on 1 November 2005 at Delft University of Technology, based on research from 2000-2004. The report is published by Data Archiving and Networked Services (DANS).
URL: http://www.knaw.nl/publicaties/pdf/20051103.pdf

Preservation of Word Processing Documents. Barnes, Ian, Australian National University.
Analysis and recommendations on preserving word-processing documents by the developer of the Digital Scholar's Workbench. Includes strong recommendations for structured rather than visual formats, for open non-binary formats, and in particular for use of the DocBook XML standard.
URL: http://www.apsr.edu.au/publications/preservation_of_word_processing_documents.html

Preservation of TeX/LaTeX Documents . Barnes, Ian, Australian National University.
Analysis and recommendations on preserving Tex/LaTex documents by the developer of the Digital Scholar's Workbench. In 2006, no existing conversion tools that did not lose information were identified. The current recommendation is to keep the files in their original format.
URL: http://www.apsr.edu.au/publications/LaTeX-preservation.pdf

Back to top

Best practice guidelines for particular classes of content

Collaborative Digitization Program, Digital Audio Working Group
The Collaborative Digitization Program, formerly the Colorado Digitization Program, has working groups on best practices for digitization from analog materials.
URL: http://www.cdpheritage.org/cdp/workinggroups/audio/index.html
The Digital Audio Working Group has published guidelines as "Digital Audio Best Practices."
URL: http://www.cdpheritage.org/digital/audio/documents/cdpdabp_1-2.pdf

Yawah.com, developer of eRez Imaging Server
"Working with Digital Master Images." Recommendations for how to create a "digital master" to ensure the highest level of reuse and re-expression. Describes a master file that could be functional for an imaging server application.
URL: http://erez3.yawah.com/erez4/html/Working%20with%20Digital%20Master%20Images.html

Back to top

Other lists of resources related to preserving digital content

PADI - Preserving Access to Digital Information
The PADI Web site is a comprehensive gateway to digital preservation resources.
URL: http://www.nla.gov.au/padi/

Digital Library Reference Center [accessible to Library of Congress staff only]
A collection of items is held in hard copy in LA300 for reference. These are items that have surfaced as staff have pursued a variety of digital preservation and life cycle management issues. The catalog also includes some external online resources.
LC Access Only: Digital Library Reference Center catalog.


Back to top

Last Updated: Monday, 15-Sep-2008 14:51:39 EDT