Sustainability of Digital Formats
 Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Introduction >> Overview | Formats, Evaluation Factors, and Relationships | Papers and Presentations | Related Resources

Related Resources for Digital Format Sustainability

Listed here are selected resources that pertain to
• the assessment of digital formats
• efforts to provide gateways and/or listings of format documentation and transcoding tools
• aspects of other digital preservation activities that include recommendations relating to digital content in particular formats.

A separate resource page provides access to freely downloadable specifications for digital formats either directly from the Library of Congress or from standards organizations.

Resources related to particular digital formats will be found in individual Format Descriptions.


View the Library of Congress disclaimer regarding external links  external link disclaimer icon.

Recommendations for additional items are encouraged. [Contact]

Digital format sustainability: analyses and descriptions

DELOS, File Formats Typology and Registries for Digital Preservation (2004)
This project was sponsored by the European Commission and carried out at the Università degli Studi di Urbino. The 54-page document categorizes formats, discusses "digital longevity," and suggests format-assessment criteria for use by archives.
URL: http://www.dpc.delos.info/private/output/DELOS_WP6_d631_finalv2(5)_urbino.pdf

Denmark, the State and University Library and the Royal Library, Handling File Formats (2004)
Lars R. Clausen's document categorizes formats, identifies the aspects important for sustainability, and suggests strategies for preservation.
URL: http://netarchive.dk/publikationer/FileFormats-2004.pdf

Diffuse Standards and Specifications List
The Diffuse project was sponsored by the European Commission. Its primary result, a valuable source for standards documents and specifications, including data representation, is no longer maintained or accessible at its former URL. The Digital Curation Centre in the United Kingdom is currently repurposing and updating the information and remounting it as DCC Diffuse. Entries are being added as prepared. As of January 2008, the Data Representation section, which deals with digital formats, has five entries.
Former URL: http://www.diffuse.org/standards.html
Via Internet Archive Wayback Machine: http://web.archive.org/web/20030622190406/http://www.diffuse.org/
DCC Diffuse at: http://www.dcc.ac.uk/resources/standards/diffuse/

Leeds, University of, Survey and Assessment of Sources of Information on File Formats and Software Documentation
Report from the Representation and Rendering Project at the University of Leeds (UK, n.d., ca. 2003). Describes the publicly available sources of information on file formats and software, with some comments on its quality and completeness.
URL: http://www.jisc.ac.uk/uploaded_documents/FileFormatsreport.pdf

National Library of the Netherlands (KB), Evaluating File Formats for Long-term Preservation
The KB has developed a quantifiable file format risk assessment method, based on seven sustainability criteria that are assigned weights by importance. The KB observes consensus on the criteria but recognizes that the weights assigned to criteria must be guided by institutional policy. With this paper, the KB hopes to inspire other cultural heritage institutions to define their own quantifiable file format evaluation method.
URL: http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf

PIN, Groupe Pérennisation des Informations Numériques
An initiative of Association Aristote and the French space agency (CNES) for long-term preservation of digital information. One area of activity is evaluating formats for preservation. Web site is entirely in French.
URL: http://pin.association-aristote.fr/doku.php/public/formats
URL: http://www.ssd.rl.ac.uk/ccsdsp2/mon04/methodology_for_format_evaluation.ppt [presentation on format evaluation methodology]
URL: http://www.ssd.rl.ac.uk/ccsdsp2/mon04/long_term_preservation_criteria.doc [Criteria for evaluating data formats in terms of their suitability for ensuring information long term preservation, C. Huc]

Back to top

Format registries: in production, under development

Global Digital Format Registry
An activity with initial funding from the Andrew W. Mellon Foundation to build a registry that "will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for those formats." The GDFR activity has merged into a new international activity, UDFR, based on the experience with GDFR and PRONOM. Data models and design documents are available.
URL: http://www.gdfr.info/
URL: http://www.gdfr.info/docs.html#data_model. Current design and architecture documents.
URL: http://www.udfr.org/

PRONOM Digital Format Database
From the Public Records Office of the United Kingdom, the PRONOM database provides information about file formats and the application software needed to open them.
URL: http://www.nationalarchives.gov.uk/pronom/

Representation Information Registry Repository
An activity with support from the UK's Digital Curation Centre and the CASPAR (Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval) project to build a registry that "curates OAIS reference model (ISO:14721:2002) defined Representation Information which is intended to add meaning to data and aid its long-term preservation." The intent is to build a repository designed to hold representation information interpreted more broadly than technical information about digital formats.
URL: http://registry.dcc.ac.uk/

Unified Digital Format Registry
The UDFR activity will build on the existing PRONOM registry to support requirements and use cases from GDFR.
URL: http://www.udfr.org/
URL: https://bitbucket.org/udfr/main/wiki/Home

Back to top

Format registries: demonstration projects

FOCUS [Format Curation Service, a demonstration]
FOCUS was one element of a research project at the Institute for Advanced Computer Study, University of Maryland (UMIACS) funded by NDIIPP under the DIGARCH program. The intent was to demonstrate that a scalable and secure environment for a global digital format registry could be built using proven web technologies, such as LDAP and web services. The key functionality of FOCUS is to identify software tools for rendering, editing, converting, and validating the formats in the registry. The demonstration system is populated with some of the most common formats and applications and offers a validation service based on JHOVE.
URL: https://wiki.umiacs.umd.edu/adapt/index.php/Focus:Main

Back to top

Format registries: related tools

Magic Database File Formats (from magicdb.org)
One page on this site is generated from a simple data base of 'standard" file formats, with links to documentation if available online. Also available from the magicdb.org site is a more extensive list of formats and the associated magic number 'database.' The text-format database is designed for use with a command line (Windows or DOS) tool for identifying files; the tool is distributed at no charge by Optima SC, Inc.
URL: http://www.magicdb.org/stdfiles.html
URL: http://www.magicdb.org/magic.html
URL: http://www.magicdb.org/magic.db

NIST National Software Reference Library
From the National Institute of Standards and Technology, a project to collect software and to incorporate file profiles computed from this software into a reference data set that can be used by law enforcement, government, and industry to identify files found on a computer.
URL: http://www.nsrl.nist.gov

Wikipedia index to computer file formats
Wikipedia has information on many formats. These can be reached as a body using Wikipedia categories as an index.
URL: http://en.wikipedia.org/wiki/Category:Computer_file_formats

File Signatures Table from Gary Kessler
A table of file signatures (aka "magic numbers"). The author declares it a work-in-progress. The compilers of this resource found that it had been updated within a month when checked in January 2008.
URL: http://www.garykessler.net/library/file_sigs.html

FILExt - The File Extension Source
FILExt is a database of file extensions and the various programs that use them.
URL: http://filext.com/

File-Extensions.org
A file extension resource.
URL: http://www.file-extensions.org/

Wotsit.org
Describes itself as a programmer's file and data format resource. This site contains information on hundreds of different file types, data types, hardware interface details and all sorts of other useful programming information; algorithms, source code, specifications, etc.
URL: http://www.wotsit.org/

DotWhat!?
A file extension resource.
URL: http://dotwhat.net/

Back to top

Lists of formats supported by archival institutions or projects

Florida Center for Library Automation (FCLA): Florida Digital Archive File Preservation Strategies by Format
Provides action plans for dealing with a selection of formats that the archive will attempt to preserve.
URL: http://fclaweb.fcla.edu/fda_format_landing_page
Also includes Table of FDA-supported File Formats . This list categorizes formats using "level of confidence" (high, medium, low) as a measure of suitability for long-term preservation.
URL: http://fclaweb.fcla.edu/node/795

MIT DSpace statement concerning format support
From the DSpace implementation at MIT, a listing of formats categorized as supported, known, and unsupported.
URL: http://libraries.mit.edu/dspace-mit/build/policies/format.html

ProQuest (UMI Dissertation Publishing) Preparing Your Manuscript Guide
Thesis is expected to be in PDF. Guidelines include list of acceptable formats for multimedia files.
URL: http://www.il.proquest.com/assets/downloads/products/UMI_PreparingYourManuscriptGuide.pdf

Library and Archives Canada Guidelines for Computer File Types, Interchange Formats and Information Standards
This document identifies computer file types; interchange formats and information standards that the Library and Archives Canada (LAC) is recommending to facilitate the interoperability of digital information in the Government of Canada (GoC). Recommended file types and interchange formats are also those that are preferred by the LAC for the transfer of digital information to its control after its operational business value to an organization has ceased.
URL: http://www.collectionscanada.gc.ca/digital-initiatives/012018-2200-e.html

Dissecting the Digital Preservation Software Platform. Version 1.0 (RKS: 2009/4026)
The National Archives of Australia (NAA) strategy for digital preservation is to "convert proprietary file formats into open, fully-specified, standards-based formats, most of which are are XML-based." The document describing the software they have developed to support digital preservation includes a list of supported formats.
URL: http://www.naa.gov.au/Images/Digital-Preservation-Software-Platform-v1_tcm16-47139.pdf
URL: http://naa.gov.au/records-management/agency/preserve/e-preservation/at-naa/process/index.aspx

Back to top

Formats for different content categories

Resources related to particular digital formats will be found in individual Format Descriptions.

Alternative File Formats for Storing Master Images of Digitisation Projects
A study of alternative formats for storing master files of digitisation projects of the Koninklijke Bibliotheek (KB) took place in the context of reviewing the KB's storage strategy. The formats reviewed as alternatives to uncompressed TIFF were JPEG, JPEG 2000, PING, and TIFF with LZW compression.
URL: http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/alternative_file_formats_for_storing_masters_2_1.pdf
Citation: Robèrt Gillesse, Judith Rog, and Astrid Verheusen (National Library of the Netherlands, 2008).

CODECS Database
Web site associated with Video Inspector tool. Lists CODECs in order of use as recognized by the tool.
URL: http://www.codecsdb.com/

FME Supported Geospatial Formats
Web site associated with FME (Feature Manipulation Engine) tool from the Canadian company Safe Software. Each of the URLs below list the more-than-200 geospatial formats supported by the FME tool, together with explanatory information.
URL: http://www.safe.com/fme/format-search/
URL: http://docs.safe.com/fme/html/FME_ReadersWriters/Default.htm
URL: http://docs.safe.com/fme/pdf/FMEReadersWriters.pdf

Graphics File Formats, 2nd Edition
Reasonable overviews of many formats albeit lacking in the detail for certain types of digital archeology.
Citation: James D. Murray and William vanRyper (Sebastopol, CA: O'Reilly & Associates, 1994).

Graphics File Formats FAQ (Part 3 of 4): Where to Get File Format Specifications
Web site created by James D. Murray that offers links to about 200 sites for graphics file formats. Intended as a mechanism for updating information in the printed Graphics File Formats, 2nd Edition.When consulted in April 2003, the last modified tag reported a date of "20Jan97."
URL: http://www.faqs.org/faqs/graphics/fileformats-faq/part3/preamble.html

Preserving Geospatial Data
Full title: Technology Watch Report: Preserving Geospatial Data, from the Digital Preservation Coalition in the UK, May 2009. Excellent, comprehensive overview of geospatial formats and their charateristics.
URL: http://www.dpconline.org/component/docman/doc_download/363-preserving-geospatial-data-by-guy-mcgarva-steve-morris-and-gred-greg-janee

Geospatial Multistate Archive and Preservation Partnership (GeoMAPP)
Documents produced by GeoMAPP include documents comprising analysis and recommendations on metadata, container formats, and archiving ESRI GeoDatabase content. Also available is a spreadsheet analysing geospatial formats:
URL: http://www.geomapp.net/docs/GIS_OAIS_Archival_Metadata_v1.0_FINAL_20110921.pdf
URL: http://www.geomapp.net/docs/ContentPackaging_v1.0_final_20111202.pdf
URL: http://www.geomapp.net/docs/Geodatabase_Report_v1.0_final_20111206.pdf
URL: http://www.geomapp.net/docs/GeoMAPP_Geospatial_data_file_formats_FINAL_20110701.xls
URL: GeoMAPP website

Federal Geographic Data Committee (FGDC)
Lists from FGDC of standards developed or endorsed by FGDC for geospatial data and metadata:
URL: http://www.fgdc.gov/standards/fgdc-endorsed-external-standards/index_html
URL: http://www.fgdc.gov/standards/projects/FGDC-standards-projects/fgdc-endorsed-standards

Back to top

Resources that include discussion of digital formats

Building an Electronics Records Archive at the National Archives and Records Administration: Recommendations for Initial Development (Pre-publication Draft, 2003)
Report of study by a committee under the auspices of the National Research Council of the National Academies; see especially the section titled "Data Types and Obsolescence," pp. 5-3 to 5-5.
URL: http://books.nap.edu/openbook.php?isbn=0309089476

eDAVID (Belgium)
eDavid, an activity that builds on research on preserving electronic records done for the city of Antwerp, has produced a handbook, Digital archiving: the new challenge?. Chapter 3 is about file formats.
URL: http://www.expertisecentrumdavid.be/docs/digitalarchiving_manual.pdf
URL: http://www.expertisecentrumdavid.be/eng/edavid.php

Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving Email (2003)
From the Digital Preservation Testbed project of the Dutch National Archives. Discusses email in terms of authenticity (as an official record) and assesses the various preservation strategies that may be applied.
URL: http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-email-en.pdf

Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving Spreadsheets (2003)
From the Digital Preservation Testbed project of the Dutch National Archives. Discusses spreadsheets in terms of authenticity (as an official record) and assesses the various preservation strategies that may be applied.
URL: http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-spreadsh-en.pdf

Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving Databases (2003)
From the Digital Preservation Testbed project of the Dutch National Archives. Discusses databases in terms of authenticity (as an official record) and assesses the various preservation strategies that may be applied.
URL: http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-databases-en.pdf

Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving Text Documents (2003)
From the Digital Preservation Testbed project of the Dutch National Archives. Discusses text documents in terms of authenticity (as an official record) and assesses the various preservation strategies that may be applied.
URL: http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-textdocs-en.pdf

NLM Journal Archiving and Interchange Tag Suite
From the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM), created with the intent of providing a common format in which publishers and archives can exchange journal content.
URL: http://dtd.nlm.nih.gov/

KB/IBM. Authenticity in a Digital Environment
From IBM and the National Library of the Netherlands (KB), this report (December 2002) discusses a framework for defining what is meant by an authentic digital object. Includes an approach for analyzing content in terms of its makeup; for example, see pp. 16-18.
URL: http://www.kb.nl/hrd/dd/dd_onderzoek/reports/2-authenticity.pdf

KB/IBM. Preservation Requirements in a Deposit System
From IBM and the National Library of the Netherlands (KB), this report (n.d., ca. 2002) presents the requirements for the preservation subsystem of the Digital Information Archiving System (DIAS) under development at the KB. Appendix C (p. 35) lists 35 recognized file types and subtypes to be implemented in the first release of DIAS.
URL: http://www.kb.nl/hrd/dd/dd_onderzoek/reports/3-preservation.pdf

Permanent pixels: Building blocks for the longevity of digital surrogates of historical photographs. René van Horik.
This report was written as a dissertation, defended on 1 November 2005 at Delft University of Technology, based on research from 2000-2004. The report is published by Data Archiving and Networked Services (DANS).
URL: http://www.knaw.nl/publicaties/pdf/20051103.pdf

Preservation of Word Processing Documents. Barnes, Ian, Australian National University.
Analysis and recommendations on preserving word-processing documents by the developer of the Digital Scholar's Workbench. Includes strong recommendations for structured rather than visual formats, for open non-binary formats, and in particular for use of the DocBook XML standard.
URL: http://www.apsr.edu.au/publications/preservation_of_word_processing_documents.html

Preservation of TeX/LaTeX Documents . Barnes, Ian, Australian National University.
Analysis and recommendations on preserving Tex/LaTex documents by the developer of the Digital Scholar's Workbench. In 2006, no existing conversion tools that did not lose information were identified. The current recommendation is to keep the files in their original format.
URL: http://www.apsr.edu.au/publications/LaTeX-preservation.pdf

Back to top

Best practice guidelines for particular classes of content

NASA's Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)
Provides information for Data Providers, including "Best Practices for Preparing Environmental Data Sets to Share and Archive.
URL: http://daac.ornl.gov/PI/pi_info.shtml
URL: http://daac.ornl.gov/PI/BestPractices-2010.pdf

Federal Agencies Digitization Guidelines Initiative
Developing guidelines for digitizing still images, including pictures and page images, and audio-visual materials.
URL: http://www.digitizationguidelines.gov/

Back to top

Other lists of resources related to preserving digital content

PADI - Preserving Access to Digital Information
The PADI Web site is a comprehensive gateway to digital preservation resources.
URL: http://www.nla.gov.au/padi/

Digital Library Reference Center [accessible to Library of Congress staff only]
A collection of items is held in hard copy in LA300 for reference. These are items that have surfaced as staff have pursued a variety of digital preservation and life cycle management issues. The catalog also includes some external online resources.
LC Access Only: Digital Library Reference Center catalog.


Back to top

Last Updated: Thursday, 26-Jul-2012 18:49:42 EDT