The Why and What of Web Archives

For someone who thinks about web archiving almost every day it’s sometimes hard to explain to people outside the digital library community why archiving web sites is worth doing. “They archive themselves,” some say. “Why would you want to save what’s on the Internet?” they wonder. Instead of launching into explanations about cultural heritage, dynamic publishing streams and comprehensive collection policies, I can now point to recent and fun examples of why we should be archiving the web and what it looks like to archive the web.

Why?

perma.cc

perma.cc

NPR’s Weekend Edition Sunday ran a story about a project called perma.cc which is a perfect example of why preserving websites is important. URLs are often given in citations and bibliographies to direct readers and researchers to source materials. The project address the problem of “link-rot,” or broken links that show up in the citations of legal articles and arguments.

We’ve all come across a 404 error or a URL that doesn’t exist anymore.  The people behind perma.cc studied the problem and found over 70% of the links used in the citations of a sample of legal journals (published between 1999-2011) and 50% of the links cited in Supreme Court opinions are now dead or go to the wrong place. This link rot puts the basic information supporting our legal system at risk.

To address this problem perma.cc works with law libraries and law authors to build a system where authors can create links to archived versions of their journals. There are a number of other  projects and services working on this problem as well. Perma.cc is a recent addition to the scene and it provides clear evidence that the web does not archive itself.  Librarians, archivists and researchers need to take action to ensure these resources are fully available in the future.

What?

Space Jams website

Space Jams website

The NBA playoffs are a good excuse to bring in this next example, which originally went viral in late 2010Space Jams is a 1996 Warner Brothers movie starring cartoon characters and 1990s basketball stars like Michael Jordan and Charles Barkley. In a conversation about web archiving a friend mentioned that this movie’s website is still in its “original format.” And indeed, the screen capture here is from the live website for the movie and is identical to the website captured by the Internet Archive in November of 2003, when the website was first saved.

I can’t say why this website continues to exist but it is unique. Other popular movies from 1996 such as Independence Day, Scream and Fargo have no trace of a website–one may never have existed. The live website for Space Jams is not an example of a web archive but it’s a good example of what web archives are filled with.

Other sites from this era are only available in web archives and this site’s surprising existence points to how digital content created in new forms and formats are often at-risk.

Looking Back on a Defining Experience: The Residency Class of 2014

The following is a guest post by Margo Padilla, Program Manager for NDSR-NY. With a month remaining in the inaugural term of the National Digital Stewardship Residency program, the cohort is busy putting the finishing touches on projects, participating in closing program events and planning future endeavors. Since arriving in Washington DC last September, residents […]

A Report On the Personal Digital Archiving 2014 Conference

Cinda May, a key organizer of the Personal Digital Archiving 2014 conference, is one of a growing number of information professionals helping to digitally preserve personal and community history. May, chair of Special Collections at Indiana State University Library, is a co-creator of the Wabash Valley Visions & Voices Digital Memory Project and, as such, she […]

The Meaning of the MP3 Format: An Interview with Jonathan Sterne

What does the history of the MP3 format mean for those interested in ensuring long-term access to our digital cultural heritage? In this installment of the NDSA’s Insights interview series I talk with historian Jonathan Sterne about his book MP3: The Meaning of a Format. You can read the introduction to his book, titled “Format […]

NDSR Symposium: Pushing the Digital Envelope

The following is a guest post by Kris Nelson, Program Management Specialist at the Library of Congress and Program Coordinator of the National Digital Stewardship Residency. “If you want to do important work, you have to work on an important problem.”  With these words, Betsy Humphreys, Deputy Director of the National Library of Medicine, effectively […]

Protect Your Data: Information Security and the Boundaries of your Storage System

The following is a guest post from Jane Mandelbaum, co-chair of the National Digital Stewardship Alliance Innovation Working group and IT Project Manager at the Library of Congress. The NDSA Levels of Digital Preservation are useful in providing a high-level, at-a-glance overview of tiered guidance for planning for digital preservation. One of the most common requests received […]

Teaching and Learning About Digital Stewardship

Gaining the knowledge, skills and experience required to manage digital assets and provide access to them over time can sometimes feel like trying to hit a moving target. Almost all heritage organizations now have a responsibility to steward some kind of digital content be it e-books or journals, digitized materials, electronic records, digital photographs, data […]

Exploring Computational Categorization of Records: A Conversation with Meg Phillips from NARA

Continuing the insights interview series, I’m excited to share this conversation with Meg Phillips, External Affairs Liaison at the National Archives and Records Administration. A few years back we “un-chaired” CURATEcamp Processing: Processing Data/Processing Collections together. Meg wrote a guest post reflecting on that event for the Signal titled More Product, Less Process for Born-Digital […]