“Volun-peers” Help Liberate Smithsonian Digital Collections

Scan of an herb.

Scan of Chamaenerion Latifolium. US National Herbarium, Smithsonian.

The Smithsonian Transcription Center creates indexed, searchable text by means of crowdsourcing…or as Meghan Ferriter, project coordinator at the TC describes it, “harnessing the endless curiosity and goodwill of the public.” As of the end of the current fiscal year, 7,060 volunteers at the TC have transcribed 208,659 pages.

The scope, planning and execution of the TC’s work – the in-house coordination among the Smithsonian’s units and the external coordination of volunteers — is staggering to think about. The Smithsonian Institution is composed of 19 museums, archives, galleries and libraries; nine research centers; and a zoo. Fifteen of the Smithsonian units have collections in the TC, which is run by Ching-hsien Wang, Libraries and Archives System Support Branch manager with the Smithsonian Institution Office of the Chief Information Officer.

Ferriter said, “To manage a project of this scope, one must understand and troubleshoot the system and unit workflows as well as work with unit representatives as they select content and set objectives for their projects.  Neither simply building a tool nor merely inviting participation is enough to sustain and grow a digital project, whatever the scale.”

The TC benefits from the Smithsonian’s online collections. Though individual units may have their own databases, they all link to a central repository, the Smithsonian’s “Enterprise Digital Asset Network,” or EDAN, which is searchable from the Smithsonian’s Collections Search Center. The TC leverages the capabilities of EDAN and builds on the foundation of data and collections-management systems supported by the the Office of the Chief Information Officer. In some cases, for example, a unit may have digitized a collection and the TC arranges for volunteers to add metadata.

Photo of Ching-hsien Wang.

Ching-hsien Wang.

Each unit has a different goal for its digital collections. The goal for one project might be to transcribe handwritten notes; the goal for another project might be to key in text from a scanned document. A project might call for geotagging or adding metadata from controlled vocabularies (pre-set tags, used to avoid ambiguities or sloppy mistakes). But the source for each TC project is always a collection of digital files that a volunteer can access online.

Sharing data across the Smithsonian’s back end is an impressive technological feat but it’s only half of this story. The other half is about the relationship between the TC and the volunteers. And the pivotal component that enables the two sides to engage effectively: trust.

The TC’s role at the Smithsonian is as an aggregator, making bulk data available for volunteers to process and directing the flow of volunteer-processed data to the main repository. So, more than just trafficking in data, the TC nurtures its relationships with volunteers by means of technical fail-safe resources and down-to-earth, sincere human engagement.

Ferriter shows her respect for the volunteers when she refers to them as “volunpeers.” Ferriter said, ” ‘Volunpeers’ indicates the ways unit administrators and Smithsonian staff experience the TC along with volunteers. ‘Volunpeers’ underscores the values articulated by volunteers describing their activities and personal goals on the TC, including to learn, to help and to give back to something bigger….Establishing a collaborative space that uses peer-review resources brings to the foreground what is being done together rather than exclusively highlighting what is being done by particular individuals.”

TC staff made a crucial discovery when they figured out that what motivated people to volunteer was a sincere desire to help. Wang said, “Volunteers feel privileged and take the responsibility seriously. And they like that the Smithsonian values what they do.”

Photo of Meghan Ferriter.

Meghan Ferriter.

Ferriter said, “Volunteers indicated they were seeking increased behind-the-scenes access as a reward for participating, rather than receiving discounts or merchandise from Smithsonian vendors.” So TC staff developed a close relationship with the volunteers and they remain in constant contact my means of social media.

“Communicating in an authentic way is central to my strategy,” Ferriter said. “Being authentic includes being vulnerable and expressing real enthusiasm. It also entails revealing my lack of knowledge while learning alongside volunteers. My strategy incorporates an inclusive attitude with the intent of shortening the distance of institutional authority and public positioning.”

Institutional authority — or the perception of institutional authority — can be a potential obstacle to finding volunteers. Wang said the Smithsonian — like other staid old institutions — was perceived several years ago to have an image problem. She said that research indicated, “People think it’s nothing but old white men scientists.” Wang and Ferriter do not suggest that the solution is for the TC to appear young and hip and “with it.” Rather the TC demonstrates its inclusiveness in a very real and sincere way: by reaching out to any and all volunteers and treating them with appreciation and respect.

Volunteers are always publicly credited for their work. They can download and review PDFs of what they’ve done once a project is finished. Ferriter said, “I advise Smithsonian staff members who want to be part of the Transcription Center, ‘You need to understand that there is a commitment that you’re making to participate in this project, which requires you to be involved with communicating with the public, to answer their questions, to tell them specific details about projects, to be prepared to provide a behind-the-scenes tour.”

Scan of a handwritten letter.

Scan of handwritten document from “The Legend of Sgenhadishon.” National Anthropological Archives, the Smithsonian.

Each project includes three steps: transcription, review and approval. One of the remarkable results of the TC/volunteer relationship is that the review process has become so thorough and consistently reliable, and  volunteers behave so professionally and responsibly, there is often little change required during the approval phase. This trust in the reviewers — trust that the reviewers earn and deserve — saves a significant amount of staff time for the Smithsonian in the approval phase.

Another remarkable result of the volunteers’ dedication is that TC staff has found that their manual transcriptions are statistically far superior than OCR, which often tends to be “dirty” and requires additional time and labor to correct.

Ferriter said that as successful as the Transcription Center is, as evidenced by the amount of digital collections it has made keyword searchable, there remain further opportunities to look at the larger picture of inter-related data. “The story may be more than merely what is contained within the TC project,” Ferriter said. “There are opportunities to connect the project to its significance in history, science and other related SI and cultural heritage collections.”

When those opportunities arise, the volunpeers will no doubt help make the connections happen.

New FADGI Guidelines for Embedded Metadata in DPX Files

The Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group is pleased to announce that its new draft publication, Embedding Metadata in Scanned Motion Picture Film Files: Guideline for Federal Agency Use of DPX Files, is available for public comment. The Digital Picture Exchange format typically stores image-only data from scanned motion picture film or born-digital […]

Wisdom is Learned: An Interview with Applications Developer Ashley Blewer

  Ashley Blewer is an archivist, moving image specialist and developer who works at the New York Public Library. In her spare time she helps develop open source AV file conformance and QC software as well as standards such as Matroska and FFV1. She’s a three time Association of American Moving Image Archivists’ AV Hack […]

Conference Report: Digital Library Federation 2016 Forum

The Digital Library Federation (DLF) 2016 Forum was held alongside the DLF Liberal Arts Colleges Pre-Conference and Digital Preservation 2016 this year from November 6-10 at the Pfister Hotel in Milwaukee, Wisconsin. Self-described as a  ”meeting place, marketplace, and congress“ of digital librarians from member institutions and the wider community, the conference, under the leadership of Bethany Nowviskie, set a […]

Initiatives at the Library of Congress (Digital Preservation 2016 Talk)

Here’s the text of the presentation I gave during the Initiatives panel at Digital Preservation 2016, held in collaboration with the DLF Forum on November 10, 2016. This presentation is about what the National Digital Initiatives division has been up to in FY16 and what’s coming up in FY17. For a report on the DLF Forum, see this Signal post. […]

User Experience (UX) Design in Libraries: An Interview with Natalie Buda Smith

  Natalie Buda Smith is the User Experience (UX) Team supervisor at the Library of Congress, and most recently worked with NDI to design the beautiful graphic for our Collections as Data conference. Her team has been busy redesigning Loc.gov, and the new homepage is set to debut Tuesday, Nov.1st. We caught up over coffee […]

Data and Humanism Shape Library of Congress Conference

The presentations at the Library of Congress’ Collections As Data conference coalesced into two main themes: 1) digital collections are composed of data that can be acquired,  processed and displayed in countless scientific and creative ways and 2) we should always be aware and respectful that data is manipulated by — and derived from — people. […]

NDI Talk at Collections as Data

Here’s the text of the talk I gave last week at the Collections as Data event my group hosted on September 27, 2016. If you would like to watch it, the talk starts at about minute 54 of the video of the event. Welcome to Collections as Data! I’m excited to tell you about our […]

Collections as Data Tomorrow

Tomorrow, September 27, 2016, NDI is hosting our Collections as Data symposium, which will be free and open to the public. We’re really excited about the speakers we have lined up for the day, and hope you can join is in person or through the live-streamed video. In preparation for the event, our colleagues in […]

2016-2017 Class of National Digital Stewardship Residents Selected

Five new National Digital Stewardship Residents will be joining the Library in late September 2016. Selected from a competitive pool and representing five different library schools, the residents bring a range of skills and experience in working with digital and archival collections. The NDSR program offers recent graduates an opportunity to gain professional experience under […]