Introduction >> Overview | Formats, Evaluation Factors, and Relationships | Papers and Presentations | Related Resources
Related Resources for Digital Format Sustainability
Listed here are selected resources that
pertain to
• the assessment of digital formats
• efforts to provide gateways and/or listings of format documentation and transcoding tools
• aspects of other digital preservation activities
that include recommendations relating to digital content in particular formats.
A separate resource page provides access to freely downloadable specifications for digital formats either directly from the Library of Congress or from standards organizations.
Resources related to particular digital formats will be found in individual
Format Descriptions.
View the Library of Congress disclaimer regarding external links .
Recommendations for additional items are encouraged. [Contact]
Digital format sustainability: analyses and descriptions
DELOS, File Formats Typology and Registries for Digital Preservation (2004)
This project was sponsored by the European Commission and carried out at the Università degli Studi di Urbino. The 54-page document categorizes formats, discusses "digital longevity," and suggests format-assessment criteria for use by archives.
URL: http://www.dpc.delos.info/private/output/DELOS_WP6_d631_finalv2(5)_urbino.pdf
Denmark, the State and University Library and the Royal Library, Handling File Formats (2004)
Lars R. Clausen's document categorizes formats, identifies the aspects important for sustainability, and suggests strategies for preservation.
URL: http://netarchive.dk/publikationer/FileFormats-2004.pdf
Diffuse Standards and Specifications List
The Diffuse project was sponsored by the European Commission. Its primary result, a valuable source for standards
documents and specifications, including data representation, is no longer maintained or accessible at its former URL.
The Digital Curation Centre in the United Kingdom is currently repurposing and updating the
information and remounting it as DCC Diffuse. Entries are being added as prepared. As of January 2008, the Data Representation section, which deals with digital formats, has five entries.
Former URL: http://www.diffuse.org/standards.html
Via Internet Archive Wayback Machine: http://web.archive.org/web/20030622190406/http://www.diffuse.org/
DCC Diffuse at: http://www.dcc.ac.uk/resources/standards/diffuse/
Leeds, University of, Survey and Assessment of Sources of Information on File Formats
and Software Documentation
Report from the Representation and Rendering Project at the University of Leeds
(UK, n.d., ca. 2003). Describes the publicly available sources of information on
file formats and software, with some comments on its quality and completeness.
URL: http://www.jisc.ac.uk/uploaded_documents/FileFormatsreport.pdf
National Library of the Netherlands (KB), Evaluating File Formats for Long-term Preservation
The KB has developed a quantifiable file format risk assessment method, based on seven sustainability criteria that are assigned weights by importance. The KB observes consensus on the criteria but recognizes that the weights assigned to criteria must be guided by institutional policy. With this paper, the KB hopes to inspire other cultural heritage institutions to define their own quantifiable file format evaluation method.
URL: http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf
PIN, Groupe Pérennisation des Informations Numériques
An initiative of Association Aristote and the French space agency (CNES) for long-term preservation of digital information. One area of activity is evaluating formats for preservation. Web site is entirely in French.
URL: http://pin.association-aristote.fr/doku.php/public/formats
URL: http://www.ssd.rl.ac.uk/ccsdsp2/mon04/methodology_for_format_evaluation.ppt [presentation on format evaluation methodology]
URL: http://www.ssd.rl.ac.uk/ccsdsp2/mon04/long_term_preservation_criteria.doc [Criteria for evaluating data formats in terms of their suitability for ensuring information long term preservation, C. Huc]
Back to top
Format registries: in production, under development
Global Digital Format Registry
An activity with initial funding from the Andrew W. Mellon Foundation to build a registry that "will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for
those formats." The GDFR activity has merged into a new international activity, UDFR, based on the experience with GDFR and PRONOM. Data models and design documents are available.
URL: http://www.gdfr.info/
URL: http://www.gdfr.info/docs.html#data_model. Current design and architecture documents.
URL: http://www.udfr.org/
PRONOM Digital Format Database
From the Public Records Office of the United Kingdom, the PRONOM database provides information about file
formats and the application software needed to
open them.
URL: http://www.nationalarchives.gov.uk/pronom/
Representation Information Registry Repository
An activity with support from the UK's Digital Curation Centre and the CASPAR (Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval) project to build a registry that "curates OAIS reference model (ISO:14721:2002) defined Representation Information which is intended to add meaning to data and aid its long-term preservation." The intent is to build a repository designed to hold representation information interpreted more broadly than technical information about digital formats.
URL: http://registry.dcc.ac.uk/
Unified Digital Format Registry
The UDFR activity will build on the existing PRONOM registry to support requirements and use cases from GDFR.
URL: http://www.udfr.org/
URL: https://bitbucket.org/udfr/main/wiki/Home
Back to top
Format registries: demonstration projects
FOCUS [Format Curation Service, a demonstration]
FOCUS was one element of a research project at the Institute for Advanced Computer Study, University of Maryland (UMIACS) funded by NDIIPP under the DIGARCH program. The intent was to demonstrate that a scalable and secure environment for a global digital format registry could be built using proven web technologies, such as LDAP and web services. The key functionality of FOCUS is to identify software tools for rendering, editing, converting, and validating the formats in the registry. The demonstration system is populated with some of the most common formats and applications and offers a validation service based on JHOVE.
URL: https://wiki.umiacs.umd.edu/adapt/index.php/Focus:Main
Back to top
Format registries: related tools
Magic Database File Formats (from magicdb.org)
One page on this site is generated from a simple data base of 'standard" file formats, with links to documentation if available online. Also available from the magicdb.org site is a more extensive list of formats and the associated magic number 'database.' The text-format database is designed for use with a command line (Windows or DOS) tool for identifying files; the tool is distributed at no charge by Optima SC, Inc.
URL: http://www.magicdb.org/stdfiles.html
URL: http://www.magicdb.org/magic.html
URL: http://www.magicdb.org/magic.db
NIST National Software Reference Library
From the National Institute of Standards and Technology, a project to collect
software and to incorporate file profiles computed from this software into a
reference data set that can be used by law enforcement, government, and industry
to identify files found on a computer.
URL: http://www.nsrl.nist.gov
Wikipedia index to computer file formats
Wikipedia has information on many formats. These can be reached as a body using Wikipedia categories as an index.
URL: http://en.wikipedia.org/wiki/Category:Computer_file_formats
File Signatures Table from Gary Kessler
A table of file signatures (aka "magic numbers"). The author declares it a work-in-progress. The compilers of this resource found that it had been updated within a month when checked in January 2008.
URL: http://www.garykessler.net/library/file_sigs.html
FILExt - The File Extension Source
FILExt is a database of file extensions and the various programs that use them.
URL: http://filext.com/
File-Extensions.org
A file extension resource.
URL: http://www.file-extensions.org/
Wotsit.org
Describes itself as a programmer's file and data format resource. This site contains information on hundreds of different file types, data types, hardware interface details and all sorts of other useful programming information; algorithms, source code, specifications, etc.
URL: http://www.wotsit.org/
DotWhat!?
A file extension resource.
URL: http://dotwhat.net/
Back to top
Lists of formats supported by archival institutions or projects
Florida Center for Library Automation (FCLA):
Florida Digital Archive File Preservation Strategies by Format
Provides action plans for dealing with a selection of formats that the archive will attempt to preserve.
URL: http://fclaweb.fcla.edu/fda_format_landing_page
Also includes Table of FDA-supported File Formats
. This list categorizes formats using "level of confidence" (high, medium, low) as a measure of suitability for long-term preservation.
URL: http://fclaweb.fcla.edu/node/795
MIT DSpace statement concerning format support
From the DSpace implementation at MIT, a listing of formats categorized as supported, known,
and unsupported.
URL: http://libraries.mit.edu/dspace-mit/build/policies/format.html
ProQuest (UMI Dissertation Publishing) Preparing Your Manuscript Guide
Thesis is expected to be in PDF. Guidelines include list of acceptable formats for multimedia files.
URL: http://www.il.proquest.com/assets/downloads/products/UMI_PreparingYourManuscriptGuide.pdf
Library and Archives Canada Guidelines for Computer File Types, Interchange Formats and Information Standards
This document identifies computer file types; interchange formats and information standards that the Library and Archives Canada (LAC) is recommending to facilitate the interoperability of digital information in the Government of Canada (GoC). Recommended file types and interchange formats are also those that are preferred by the LAC for the transfer of digital information to its control after its operational business value to an organization has ceased.
URL: http://www.collectionscanada.gc.ca/digital-initiatives/012018-2200-e.html
Dissecting the Digital Preservation Software Platform. Version 1.0 (RKS: 2009/4026)
The National Archives of Australia (NAA) strategy for digital preservation is to "convert proprietary file formats into open, fully-specified, standards-based formats, most of which are are XML-based." The document describing the software they have developed to support digital preservation includes a list of supported formats.
URL: http://www.naa.gov.au/Images/Digital-Preservation-Software-Platform-v1_tcm16-47139.pdf
URL: http://naa.gov.au/records-management/agency/preserve/e-preservation/at-naa/process/index.aspx
Back to top
Formats for different content categories
Resources related to particular digital formats will be found in individual Format Descriptions.
Alternative File Formats for Storing Master Images of Digitisation Projects
A study of alternative formats for storing master files of digitisation projects of the Koninklijke Bibliotheek (KB) took place in the context of reviewing the KB's storage strategy. The formats reviewed as alternatives to uncompressed TIFF were JPEG, JPEG 2000, PING, and TIFF with LZW compression.
URL: http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/alternative_file_formats_for_storing_masters_2_1.pdf
Citation: Robèrt Gillesse, Judith Rog, and Astrid Verheusen (National Library of the Netherlands, 2008).
CODECS Database
Web site associated with Video Inspector tool. Lists CODECs in order of use as recognized by the tool.
URL: http://www.codecsdb.com/
FME Supported Geospatial Formats
Web site associated with FME (Feature Manipulation Engine) tool from the Canadian company Safe Software. Each of the URLs below list the more-than-200 geospatial formats supported by the FME tool, together with explanatory information.
URL: http://www.safe.com/fme/format-search/
URL: http://docs.safe.com/fme/html/FME_ReadersWriters/Default.htm
URL: http://docs.safe.com/fme/pdf/FMEReadersWriters.pdf
Graphics File Formats, 2nd Edition
Reasonable overviews of many formats albeit lacking in the detail for certain
types of digital archeology.
Citation: James D. Murray and William vanRyper (Sebastopol, CA: O'Reilly &
Associates, 1994).
Graphics File Formats FAQ (Part 3 of 4): Where to Get File Format Specifications
Web site created by James D. Murray that offers links to about
200 sites for graphics file formats. Intended as a mechanism for updating information in the printed Graphics File Formats, 2nd Edition.When consulted in April 2003, the last
modified tag reported a date of "20Jan97."
URL: http://www.faqs.org/faqs/graphics/fileformats-faq/part3/preamble.html
Preserving Geospatial Data
Full title: Technology Watch Report: Preserving Geospatial Data, from the Digital Preservation Coalition in the UK, May 2009. Excellent, comprehensive overview of geospatial formats and their charateristics.
URL: http://www.dpconline.org/component/docman/doc_download/363-preserving-geospatial-data-by-guy-mcgarva-steve-morris-and-gred-greg-janee
Geospatial Multistate Archive and Preservation Partnership (GeoMAPP)
Documents produced by GeoMAPP include documents comprising analysis and recommendations on metadata, container formats, and archiving ESRI GeoDatabase content. Also available is a spreadsheet analysing geospatial formats:
URL: http://www.geomapp.net/docs/GIS_OAIS_Archival_Metadata_v1.0_FINAL_20110921.pdf
URL: http://www.geomapp.net/docs/ContentPackaging_v1.0_final_20111202.pdf
URL: http://www.geomapp.net/docs/Geodatabase_Report_v1.0_final_20111206.pdf
URL: http://www.geomapp.net/docs/GeoMAPP_Geospatial_data_file_formats_FINAL_20110701.xls
URL: GeoMAPP website
Federal Geographic Data Committee (FGDC)
Lists from FGDC of standards developed or endorsed by FGDC for geospatial data and metadata:
URL: http://www.fgdc.gov/standards/fgdc-endorsed-external-standards/index_html
URL: http://www.fgdc.gov/standards/projects/FGDC-standards-projects/fgdc-endorsed-standards
Back to top
Resources that include discussion of digital formats
Building an Electronics Records Archive at the National Archives and Records
Administration: Recommendations for Initial Development (Pre-publication Draft, 2003)
Report of study by a committee under the auspices of the National Research
Council of the National Academies; see especially the section titled "Data Types
and Obsolescence," pp. 5-3 to 5-5.
URL: http://books.nap.edu/openbook.php?isbn=0309089476
eDAVID (Belgium)
eDavid, an activity that builds on research on preserving electronic records done for the city of Antwerp, has produced a handbook, Digital archiving: the new challenge?. Chapter 3 is about file formats.
URL:
http://www.expertisecentrumdavid.be/docs/digitalarchiving_manual.pdf
URL:
http://www.expertisecentrumdavid.be/eng/edavid.php
Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving Email (2003)
From the Digital Preservation Testbed project of the Dutch National Archives.
Discusses email in terms of authenticity (as an
official record) and assesses the various preservation strategies that may be applied.
URL: http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-email-en.pdf
Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving Spreadsheets (2003)
From the Digital Preservation Testbed project of the Dutch National Archives.
Discusses spreadsheets in terms of authenticity (as an
official record) and assesses the various preservation strategies that may be applied.
URL: http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-spreadsh-en.pdf
Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving Databases (2003)
From the Digital Preservation Testbed project of the Dutch National Archives.
Discusses databases in terms of authenticity (as an
official record) and assesses the various preservation strategies that may be applied.
URL: http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-databases-en.pdf
Dutch National Archives, From Digital Volatility to Digital Permanence: Preserving Text Documents (2003)
From the Digital Preservation Testbed project of the Dutch National Archives.
Discusses text documents in terms of authenticity (as an
official record) and assesses the various preservation strategies that may be applied.
URL: http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-textdocs-en.pdf
NLM Journal Archiving and Interchange Tag Suite
From the National Center for Biotechnology Information (NCBI) of the National
Library of Medicine (NLM), created with the intent of providing a common
format in which publishers and archives can exchange journal content.
URL: http://dtd.nlm.nih.gov/
KB/IBM. Authenticity in a Digital Environment
From IBM and the National Library of the Netherlands (KB), this report
(December 2002) discusses a framework for defining what is meant by an
authentic digital object. Includes an approach for analyzing content in terms of
its makeup; for example, see pp. 16-18.
URL:
http://www.kb.nl/hrd/dd/dd_onderzoek/reports/2-authenticity.pdf
KB/IBM. Preservation Requirements in a Deposit System
From IBM and the National Library of the Netherlands (KB), this report (n.d., ca.
2002) presents the requirements for the preservation subsystem of the Digital
Information Archiving System (DIAS) under development at the KB. Appendix
C (p. 35) lists 35 recognized file types and subtypes to be implemented in the first
release of DIAS.
URL: http://www.kb.nl/hrd/dd/dd_onderzoek/reports/3-preservation.pdf
Permanent pixels: Building blocks for the longevity of digital surrogates of historical photographs. René van Horik.
This report was written as a dissertation, defended on 1 November 2005 at Delft University of Technology, based on research from 2000-2004. The report is published by Data Archiving and Networked Services (DANS).
URL: http://www.knaw.nl/publicaties/pdf/20051103.pdf
Preservation of Word Processing Documents. Barnes, Ian, Australian National University.
Analysis and recommendations on preserving word-processing documents by the developer of the Digital Scholar's Workbench. Includes strong recommendations for structured rather than visual formats, for open non-binary formats, and in particular for use of the DocBook XML standard.
URL: http://www.apsr.edu.au/publications/preservation_of_word_processing_documents.html
Preservation of TeX/LaTeX Documents . Barnes, Ian, Australian National University.
Analysis and recommendations on preserving Tex/LaTex documents by the developer of the Digital Scholar's Workbench. In 2006, no existing conversion tools that did not lose information were identified. The current recommendation is to keep the files in their original format.
URL: http://www.apsr.edu.au/publications/LaTeX-preservation.pdf
Back to top
Best practice guidelines for particular classes of content
NASA's Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) Provides information for Data Providers, including "Best Practices for Preparing Environmental Data Sets to Share and Archive.
URL: http://daac.ornl.gov/PI/pi_info.shtml
URL: http://daac.ornl.gov/PI/BestPractices-2010.pdf
Federal Agencies Digitization Guidelines Initiative Developing guidelines for digitizing still images, including pictures and page images, and audio-visual materials.
URL: http://www.digitizationguidelines.gov/
Back to top
Other lists of resources related to preserving digital content
PADI - Preserving Access to Digital Information
The PADI Web site is a comprehensive gateway to digital preservation resources.
URL: http://www.nla.gov.au/padi/
Digital Library Reference Center [accessible to Library of Congress staff only]
A collection of items is held in hard copy in LA300 for reference. These are items that have surfaced as staff have pursued a variety of digital preservation and life cycle management issues. The catalog also includes some external online resources.
LC Access Only: Digital Library Reference Center catalog.
Back to top |