JHOVE and the Development of JHOVE2
September 4, 2008 -- In late 2003, engineers at Harvard University Library and JSTOR developed an open-source tool called JHOVE (the JSTOR/Harvard Object Validation Environment) to validate file formats. Since its release, JHOVE has gradually gained acceptance worldwide as an essential tool for format validation.
Format validation is crucial to digital preservation and access. If you don’t know what a file is, or if its integrity is damaged, you may not be able to read it or hear it or see its content.
JHOVE was designed to process a digital object and determine what the object claims to be (identification), if the object conforms to requirements (validation) and the properties of the object (characterization). When JHOVE finds a file that it cannot validate, it flags the file. Though the process is automated, only a human can decide whether to accept the file as is or try to get a better version.
JHOVE is easy to install and run. Some users embed the JHOVE Java code into their existing system and integrate it into their digital-preservation workflow.
Since the original release of JHOVE it has gained broad adoption as an essential tool for format validation. It has proved to be easy to install, and some users have integrated it into their digital-preservation workflow.
As adoption spread so did awareness of the original tool’s limits. “We came to realize a number of shortcomings,” said Stephen Abrams of the California Digital Library and one of the developers. “Some things we now know we could’ve done better and some things we just didn’t have the opportunity to do.”
Equipped with a new set of requirements, and with support from the Library, Abrams, and colleagues at Portico and Stanford began work on JHOVE2. Their goals are to:
- Change the JHOVE architecture to get better performance, enable more simplified system integration, and encourage third party development and enhancement
- Provide significant new functions
- Implement existing and new functionality
The team is in the requirements gathering and design phase.
The terminology in JHOVE2 has changed a bit from JHOVE. Identification and validation are the same but characterization is now called feature extraction, which Abrams explains as, “Being able to examine formatted objects and extract and report on their salient internal properties.”