Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2005Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2005-020
Automated Metadata Extraction to Preserve the Digital Contents of Biomedical Collections
Thoma GR, Mao S, Misra D
Proc. 5th IASTED International Conference on Visualization, Imaging and Image Processing (VIIP 2005). September 2005. Benidorm, Spain; 214-19
The long term preservation of digital objects, a growing problem as these are acquired by libraries and archives, requires appropriate systems, standards and institutional policies. A key requirement is the acquisition of metadata about the objects to enable future access and usage, as well as the migration of digital files from obsolete formats to newer ones. Metadata is data about data. It typically consists of information about the intellectual content of a digital object, the data required for appropriate digital representation and interpretation, security or rights management information, and their relation to other digital objects. The manual recording of these metadata elements is highly labor-intensive and automated means for doing this are key to successful preservation. In this paper a prototype system for digital preservation is introduced, its main functions are described highlighting the strategies adopted in designing the system to meet these functionalities in a modular and cost-effective manner, an automated metadata extraction subsystem to minimize manual entry, using string matching and machine learning techniques, is presented, and preliminary performance assessments are given.
PDF