Skip Navigation

- Alexis Madrigal is a senior editor at The Atlantic, where he oversees the Technology channel. He's the author of Powering the Dream: The History and Promise of Green Technology.
More

The New York Observer calls Madrigal "for all intents and purposes, the perfect modern reporter." He co-founded Longshot magazine, a high-speed media experiment that garnered attention from The New York Times, The Wall Street Journal, and the BBC. While at Wired.com, he built Wired Science into one of the most popular blogs in the world. The site was nominated for best magazine blog by the MPA and best science Web site in the 2009 Webby Awards. He also co-founded Haiti ReWired, a groundbreaking community dedicated to the discussion of technology, infrastructure, and the future of Haiti.

He's spoken at Stanford, CalTech, Berkeley, SXSW, E3, and the National Renewable Energy Laboratory, and his writing was anthologized in Best Technology Writing 2010 (Yale University Press).

Madrigal is a visiting scholar at the University of California at Berkeley's Office for the History of Science and Technology. Born in Mexico City, he grew up in the exurbs north of Portland, Oregon, and now lives in Oakland.

How Much of the Web Is Archived? Truth Is, We Don't Really Know

By Alexis C. Madrigal
Comment

Somewhere between 35 and 90 percent of the web has at least one archived copy. That's a pretty big range.

5270376192_d903b07243_z.jpg

Yosemite James/Flickr

Here's the challenge: new Internet is being made all the time. Oftentimes, these new pages are added to existing networks on Tumblr or Facebook or Twitter or Livejournal. But other times, someone fires up a web server that's off the standard map, and it the web's crawlers, try as they might, may not find that page for a while, if ever.

That means some percentage of the web is not being archived by anyone (or anything, really), not even the Internet Archive's invaluable Wayback machine.

And certainly, few sites are being archived with any kind of regularity, even those (like TheAtlantic.com) that are changing constantly. So, how much of the web is humanity missing?

Researchers took a step towards answering that question in a paper submitted to the arXiv repository late last month. They found two things for sure:1)  the Internet has a memory problem and 2) we don't know how big it is.

"The results from our sample sets indicate that range from 35%-90% of the Web has at least one archived copy," they write. Think about how different those two numbers are. Either we're capturing almost all the web or we're capturing barely more than a third of it.

I can tell you one thing: the archiving of the public web can and should be better. And there's basically one way that's going to happen: the Internet Archive gets more money.

You missed their big fundraising push at the end of the last year, but that's no reason not to donate now. If you have any doubts about the people or their commitment to public service, just check out this profile of Brewster Kahle. This is a serious civilizational endeavor, and I hope it gets funded that way.



Presented by

More at The Atlantic

How the Fiscal-Cliff Deal Will Define Obama's Second Term How the Fiscal-Cliff Deal Will Define Obama's Second Term
Hunters Need to Stop Letting the NRA Speak for Them Hunters Need to Stop Letting the NRA Speak for Them
Lost Photo of the Hiroshima Mushroom Cloud Rediscovered Lost Photo of the Hiroshima Mushroom Cloud Rediscovered
The Video-Game Industry Has Already Lost Out in the Gun-Control Debate The Video-Game Industry Has Already Lost Out in the Gun-Control Debate
One Dad's Ill-Fated Battle Against the Princesses One Dad's Ill-Fated Battle Against the Princesses

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register.
blog comments powered by Disqus

The Biggest Story in Photos

Summer Down Under

Subscribe Now

SAVE 65%! 10 issues JUST $2.45 PER COPY

Facebook

Newsletters

Sign up to receive our free newsletters

(sample)

(sample)

(sample)

(sample)

(sample)

(sample)

ATLANTIC MEDIA

Elsewhere on the web