Mapping Libraries: Creating Real-time Maps of Global Information

The following is a guest post by Kalev Hannes Leetaru, a data scientist and Senior Fellow at George Washington University Center for Cyber & Homeland Security. In a previous post, he introduced us to the GDELT Project, a platform that monitors the news media, and presented how mass translation of the world’s information offers libraries enormous possibilities for broadening access. In this post, he writes about re-imagining information geographically.

Why might geography matter to the future of libraries?

Information occurs against a rich backdrop of geography: every document is created in a location, intended for an audience in the same or other locations, and may discuss yet other locations. The importance of geography in how humans understand and organize the world (PDF) is underscored by its prevalence in the news media: a location is mentioned every 200-300 words in the typical newspaper article of the last 60 years. Social media embraced location a decade ago through transparent geotagging, with Twitter proclaiming in 2009 that the rise of spatial search would fundamentally alter how we discovered information online. Yet the news media has steadfastly resisted this cartographic revolution, continuing to organize itself primarily through coarse editorially-assigned topical sections and eschewing the live maps that have redefined our ability to understand global reaction to major events. Using journalism as a case study, what does the future of mass-scale mapping of information look like and what might we learn of the future potential for libraries?

What would it look like to literally map the world’s information as it happens? What if we could reach across the world’s news media each day in real time and put a dot on a map for every mention in every article, in every language of any location on earth, along with the people, organizations, topics, and emotions associated with each place? For the past two years this has been the focus of the GDELT Project and through a new collaboration with online mapping platform CartoDB, we are making it possible to create rich interactive real-time maps of the world’s journalistic output across 65 languages.

Leveraging more than a decade of work on mapping the geography of text, GDELT monitors local news media from throughout the globe, live translates it, and performs “full-text geocoding” in which it identifies, disambiguates, and converts textual descriptions of location into mappable geographic coordinates. The result is a real-time multilingual geographic index over the world’s news that reflects the actual locations being talked about in the news, not just the bylines of where articles were filed. Using this platform, this geographic index is transformed into interactive animated maps that support spatial interaction with the news.

What becomes possible when the world’s news is arranged geographically? At the most basic level, it allows organizing search results on a map. The GDELT Geographic News Search allows a user to search by person, organization, theme, news outlet, or language (or any combination therein) and instantly view a map of every location discussed in context with that query, updated every hour. An animation layer shows how coverage has changed over the last 24 hours and a clickable layer displays a list of all matching coverage mentioning each location over the past hour.

Figure 1 - GDELT's Geographic News Search showing geography of Portuguese-language news coverage during a given 24 hour period

Figure 1 – GDELT’s Geographic News Search showing geography of Portuguese-language news coverage during a given 24 hour period

Selecting a specific news outlet like bbc.co.uk or indiatimes.com as the query yields an instant geographic search interface to that outlet’s coverage, which can be embedded on any website. Imagine if every news website included a map like this on its homepage that allowed readers to browse spatially and find its latest coverage of rural Brazil, for example. The ability to filter news at the sub-national level is especially important when triaging rapidly-developing international stories. A first responder assisting in Nepal is likely more interested in the first glimmers of information emerging from its remote rural areas than the latest on the Western tourists trapped on Mount Everest.

Coupling CartoDB with Google’s BigQuery database platform, it becomes possible to visualize large-scale geographic patterns in coverage. The map below visualizes all of the locations mentioned in news monitored by GDELT from February to May 2015 relating to wildlife crime. Using the metaphor of a map, this list of 30,000 articles in 65 languages becomes an intuitive clickable map.

Figure 2 - Global discussion of wildlife crime

Figure 2 – Global discussion of wildlife crime

Exploring how the news changes over time, it becomes possible to chart the cumulative geographic focus of a news outlet, or to compare two outlets. Alternatively, looking across global coverage holistically, it becomes possible to instantly identify the world’s happiest and saddest news, or to determine the primary language of news coverage focusing on a given location. By arraying emotion on a map it becomes possible to instantly spot sudden bursts of negativity that reflect breaking news of violence or unrest. Organizing by language, it becomes possible to identify the outlets and languages most relevant to a given location, helping a reader find relevant sources about events in that area. Even the connections among locations in terms of how they are mentioned together in the news yields insights into geographic contextualization. Finally, by breaking the world into a geographic grid and computing the topics trending in each location, it becomes possible to create new ways of visualizing the world’s narratives.

Figure 3 - All locations mentioned in the New York Times (green) and BBC (yellow/orange) during the month of March 2015

Figure 3 – All locations mentioned in the New York Times (green) and BBC (yellow/orange) during the month of March 2015

Figure 4 – Click to see a live animated map of the average “happy/sad” tone of worldwide news coverage over the last 24 hours mentioning each location

Figure 5 - Click to see a live animated map of the primary language of worldwide news coverage over the last 24 hours mentioning each location

Figure 5 – Click to see a live animated map of the primary language of worldwide news coverage over the last 24 hours mentioning each location

Figure 6 - Interactive visualization of how countries are grouped together in the news media

Figure 6 – Interactive visualization of how countries are grouped together in the news media

Turning from global news to domestic television news, these same approaches can be applied to television closed captioning, making it possible to click on a location and view the portion of each news broadcast mentioning events at that location.

Figure 7 - Mapping the locations mentioned in American television news

Figure 7 – Mapping the locations mentioned in American television news

Turning back to the question that opened this post – why might geography matter to the future of libraries? As news outlets increasingly cede control over the distribution of their content, they do so not only to reach a broader audience, but to leverage more advanced delivery platforms and interfaces. Libraries are increasingly facing identical pressures as patrons turn towards services (PDF) like Google Scholar, Google Books, and Google News instead of library search portals. If libraries embraced new forms of access to their content, such as the kinds of geographic search capabilities outlined in this post, users might find those interfaces more compelling than those of non-news platforms. The ability of ordinary citizens to create their own live-updating “geographic mashups” of library holdings opens the door to engaging with patrons in ways that demonstrate the value of libraries beyond as a museum of physical artifacts and connecting individuals across national or international lines. As more and more library holdings, from academic literature to the open web itself, are geographically indexed, libraries stand poised to lead the cartographic revolution, opening the geography of their vast collections to search and visualization, and making it possible for the first time to quite literally map our world’s libraries.

How to Participate in the September 2015 NDSA New England Regional Meeting

The following is a guest post by Kevin Powell, digital preservation librarian at Brown University. On September 25th, UMass Dartmouth will host the National Digital Stewardship Alliance New England Regional Meeting with Brown University. We enthusiastically encourage librarians, archivists, preservation specialists, knowledge managers, and anyone else with an interest in digital stewardship and preservation to […]

DPOE Interview with Danielle Spalenka of the Digital POWRR Project

The following is a guest post by Barrie Howard, IT Project Manager at the Library of Congress. This post is part of a series about digital preservation training informed by the Library’s Digital Preservation Outreach & Education (DPOE) Program. Today I’ll focus on an exceptional individual, Danielle Spalenka, Project Director for the Digital POWRR Project. […]

Keeping Up With the Joneses: The New Recommended Formats Statement

The following post is by Ted Westervelt, head of acquisitions and cataloging for U.S. Serials in the Arts, Humanities & Sciences section at the Library of Congress. Issuing the Recommended Format Specifications When the Recommended Format Specifications were issued last summer, the Library of Congress was making an attempt to come to grips with the […]

We Welcome Our Email Overlords: Highlights from the Archiving Email Symposium

This post is co-authored with Erin Engle, a Digital Archivist in the Office of Strategic Initiatives. Despite the occasional death knell claims, email is alive, well and exponentially thriving in many organizations. It’s become an increasingly complex challenge for collecting and memory institutions as we struggle with the same issues: How is email processed differently […]