Introducing Beyond Words

As a part of Library of Congress Labs release last week, the National Digital Initiatives team launched Beyond Words. This pilot crowdsourcing application was created in collaboration with the Serial and Government Publications Division and the Office of the Chief Information Officer (OCIO) at the Library of Congress. In our first week and a half, we’ve hosted nearly 1,300 volunteers and marked over 30,000 pictures in historic newspaper pages. In this post, we explore the goals, background, workflows, possibilities, and more on our progress so far with Beyond Words.

Beyond Words Goals and Background

You’ll find Beyond Words is part of our recently launched labs.loc.gov within our Experiments section. As a pilot, the main goal of Beyond Words is to identify and caption pictures in newspaper pages to create public domain data for researchers to use. The crowdsourced data that are collaboratively generated in Beyond Words are released into the public domain, then available for download as JSON data and exploration in a public gallery.

Screenshot of editorial cartoon in the Beyond Words Picture Gallery

Beyond Words Picture Gallery – Search and Filter

Our secondary goal is to generate feedback about the workflow, instructions, and resulting data. Beyond Words may change quickly and will continue to serve as an experimental application. The pilot is also an opportunity to continue to learn from and apply lessons from other cultural heritage institutions with established transcription programs such as the U.S. National Archives and Records Administration Citizen Archivist and the Smithsonian Institution Transcription Center, as well as examples from the Library including Flickr Commons. Beyond Words further allows us to observe activity and pain points as we begin the design of our forthcoming transcription and tagging platform.

Building Beyond Words

Beyond Words is a web-based application that was developed as an Innovator-in-Residence project by Library of Congress OCIO developer Tong Wang. Beyond Words is an open source crowdsourcing pilot built as an instance of Scribe, the NEH-funded collaboration between the New York Public Library and Zooniverse. You can learn more about our implementation of Scribe on GitHub and watch for updates.

The newspaper pages that are marked and transcribed in Beyond Words are selected from Chronicling America. Chronicling America is a dynamic project that currently supports over 12 million newspaper pages from 40 states, with new papers added every day. Since we designed Beyond Words as a pilot, we needed to hone in on a focused set of newspapers. We targeted the centennial commemoration of World War I and limited our range to the U.S. declaration of war through the cessation of hostilities, 06 April 1917 to 11 November 1918. Since new pages are added each day, we also limited our data set to what was available in Chronicling America, in the date range, as of 14 September 2017.

Jumping in: Tasks & Tips 

How does Beyond Words work? First: No log in! Secondly, you’ll need to know what we’re seeking. We ask that you mark pictures and transcribe the title, caption, and cutline when present; you’ll also categorize the picture type and make a note of the artist, if present. We use the word “pictures” in the instructions to include photographs, illustrations, editorial cartoons, comics, and maps. However, we are excluding advertisements–despite interesting content that lasts–in this pilot newspaper set.

On Beyond Words, you can get started right away by selecting one of three steps: mark, transcribe, verify. At least two people must agree in their task in each step; matching marks and transcriptions to skip the verify step. If inconsistencies emerge, the best transcription, category, and artist (if present) is selected by volunteers in the verify step. Our tutorial shows how to break out the title, caption, and cutline–watch for all three, plus category AND artist as you verify.

View of transcription window and photograph of Captain Wickerham

Transcribing Captain Wickersham’s Promotion

We ask that you take your time as you work to carefully identify the pictures. Pages without images should be marked “Done.” Some of the older photographs may look like illustrations; watch for mix ups of illustration and map. Also keep in mind that the artist is often included in very small print. Common photographers include Underwood & Underwood and Harris & Ewing. You’ll see comics from A.D. Condo, Hop, and W. R. Allman.

Want additional hints? This application works best on a desktop or laptop with a mouse. Zoom in using your keyboard or the zoom tool. You can also begin your Beyond Words activity in a preferred state from the home page. Reminders of instructions are found in the “View A Tutorial” section, as well as the FAQ. Want to transcribe a picture right after you mark it? Select “Transcribe this page now!” And at any point in any of the three steps, you can view the original page in Chronicling America.

We invite you to have fun and do your best; the newspapers are fascinating but marking and transcription isn’t always easy. Remember to take breaks and send us feedback! If you are inspired by what you are learning while using Beyond Words, you can explore Library of Congress World War I collections.

Doors to Discoveries

What might a volunteer discover while marking, transcribing, and verifying newspaper pictures? Certainly many social and cultural changes that marked the Great War era. On 05 January 1918, you’ll see “Women Performing Hard Tasks of Men in Big Chemical Plants” and “Capable Women and their doings” in Ogden, Utah. Another page reveals a significant victory of Florence Ellinwood Allen: successfully defending a women’s suffrage amendment to the charter of East Cleveland before the Supreme Court of Ohio.

Verifying window and photograph of Miss Florence Allen

Verifying Miss Allen’s Victory before Ohio Supreme Court

There are also views into African American papers like the Nashville Globe, established in response to the extension of Jim Crow to Nashville’s city transportation system; the paper began as a means of documenting black business owners and their attempts to establish an alternate streetcar system. The Nashville Globe ran from 1906 to 1960.

Conclusion

We’re continuing to seek and receive feedback on Beyond Words on formatting text, improving accessibility, extending the volunteer experience, greater precision around identifying artists, and more. We hope that educators, researchers, and artists will take advantage of the ability to group image collections by time frame, such as identifying all historic cartoons appearing in World War I era newspapers. If you create something with the data set, tweet us and use the hashtag #BuiltwithLC.

With over 1,200 images waiting to be verified, we could use your help! Thanks in advance for joining us and for your feedback; we’ll share what we’re learning again soon.

Welcoming Jer Thorp as Innovator-in-Residence

Starting this week, acclaimed data artist Jer Thorp began his tenure as the 2017 Library of Congress Innovator-in-Residence. He will spend six months with the National Digital Initiatives team exploring the Library’s digital collections and creating an art piece that will be displayed in the Library’s public spaces. Jer Thorp is an artist and educator […]

Library Launches labs.loc.gov

Today the Library of Congress has launched labs.loc.gov as a new online space designed to empower exploration and discovery in digital collections. Library of Congress Labs will host a changing selection of experiments, projects, events and resources designed to encourage creative use of the Library’s digital collections. To help demonstrate what exciting discoveries are possible, […]

An Invitation to an Adventure in Data Management and Curation at the National Transportation Library

The following is a guest post by Laura Farley, Fellow, National Transportation Library, Bureau of Transportation Statistics. If you’re looking to build skills as a data manager and have a lasting impact on federal data curation, come join the National Transportation Library (NTL) Fellowship program. As a current Fellow, I can attest that your experiences […]

Using data from historic newspapers

This post is derived from a talk David Brunton, current Chief of Repository Development at the Library of Congress, gave to a group of librarians in 2015.  I am going to make a single point this morning, followed by a quick live demonstration of some interfaces. I have no slides, but I will be visiting […]