Jason Scott's Archive Team Is Saving The Web From Itself (And Rescuing Your Stuff)

On Feb. 15, the Archive Team, a loose collective of programmers and netizens, received its equivalent of a 911 call: The founder of Posterous, a blogging platform, announced the site was shutting down -- and taking its users' content down with it.

After years spent convincing people to trust Posterous as the repository for their baby photos, recipes, musings and travelogues, the company gave its over 15 million users just ten weeks to back up their information before it would be permanently deleted.

A handful of Archive Team volunteers quickly convened in a chatroom to figure out -- like they had many times before in similar situations -- how to save Posterous' millions of posts from disappearing with the site itself.

The Internet never forgets, goes the oft-quoted saying. But it erases with ruthless efficiency, and the Archive Team is frequently the only group that will intervene to prevent years of information from being lost.

"[I]t is exceedingly easy for digital objects to become collateral damage as tech companies change and grow," Robin Camille Davis, the emerging technologies and distance services librarian at the Lloyd Sealy Library, wrote in an email. "They [the Archive Team] do an enormous service on behalf of Internet users."

The last two decades have seen an explosion of sites that try to accumulate as much of people's lives as quickly as possible. Yet as the demise of once-humming blogging and photo-sharing services like Posterous and Webshots have revealed, companies often feel little compulsion to preserve the materials they've so effectively extracted. Spooked users are realizing that "forever" often means "for now," and the Archive Team, by backing up sites' data and putting it online for anyone to access, is helping to save individual memories and rescue the history of the web.

The group's work underscores a contradiction in the web world: Decreasing storage costs coupled with the growing quantity of personal information posted online makes it increasingly feasible to store everything, indefinitely, about a moment in time. All that data shouldn't have to be discarded -- but it is, often with little thought and even less recourse.

Doug Reside, a digital curator at the New York Public Library, notes that websites are currently treated as something disposable, -- the same way old manuscripts and artifacts, now valued as important historical documents, were seen in their day. He calls the Archive Team's work “essential.”

“So much of our cultural experience … happens online and is recorded in a digital form that we need to have people who are taking the preservation of that work seriously,” said Reside.

Jason Scott is the creator and outspoken public face of the Archive Team. Depending on the day, he describes his role as “mascot,” “archivist” or “loudmouth.” He's said the group, which has no official status as a business or nonprofit, operates by three virtues: “Rage, paranoia and kleptomania.”

"I'm a bit of a chaos agent in the world," says Scott, who has thick black mutton chops and has been known to show up for conference keynotes in a top hat and tails or, more recently, a medieval leather vest and cape, with a flask of Red Bull strapped to his chest.

Since 2009, Scott and the Archive Team's international group of volunteers, many of whom have never met in person, have been backing up sites just before they're erased. When Yahoo pulled the plug on Geocities, for example, the Archive Team raced to download a decade's worth of fan sites and photos (Scott calls it a "cultural artifact that needed to be saved"). The Archive Team has rescued 498 terabytes of information in total, more than all the web archive data collected by the Library of Congress. Because of privacy concerns, the Archive Team copies only web pages that are publicly available.

A core group, deemed the "Golden Twenty," lead most of the efforts -- though "tourists" can also briefly lend their bandwidth or disk space to the backups. Hundreds of people sometimes contribute to preserving a site, and Scott observes it's "almost like the Red Cross or Burning Man." Once a site has been downloaded, it's re-uploaded to the Internet Archive, a non-profit organization that's creating a kind of Library of Alexandria of the web.

In deciding what sites to save, "We'll ask ourselves, 'What out there, if it went away, would seriously wreck people?'" explains Scott.

Though the demise of Google Reader has seriously wrecked people, the Archive Team hasn't yet found a way to access the data necessary to create a backup of its information. In Scott's view, Google is guilty of more than disappointing its users: In the process of building Google Reader, it effectively crushed rivals and now leaves few alternatives in its wake.

"It's like, 'Thanks for free stuff, but you are murdering markets by doing it," Scott said.

There's a lot that bothers Scott about the way Internet companies behave.

He's dissatisfied with what he perceives as a general disregard for preserving web history and people's personal data. As he sees it, users remain "the most ignored factor in a website."

He's irked by the cheeriness with which entrepreneurs announce that because of an acquisition or change in strategy, terabytes of user data will be deleted. A friend of the Archive Team recently created a Tumblr, “Our Incredible Journey,” that highlights companies' attempts to spin their closure as a blessing for all involved -- though the culmination of a "fun and exciting ride" can mean mass erasure of personal information.

"It sounds like you're holding hands with your userbase on the beach and walking with them into the sunset, when in fact you're choking them to death in the ocean," says Scott. "There's the fake civility written in shutdown messages that reflects people trying to act like somebody who cares."

And Scott especially isn't crazy about startups, which he sees as Archive Team's enemy number one. He says the feeling is mutual.

While "thinking like a startup" has become a mantra inside companies large and small, and startups are cheered for their founders' risk-taking ways, Scott is one of few dissenting voices troubled by the constant reinvention such a mindset entails. The decreasing costs of running and building sites means new products can be born faster. At the same time, they can die off more quickly, too.

"The startup world does not like people like me because their attitude is 'fail often, fail frequently, sell quickly,' and I don't come from that world and I don't like that world," explains Scott. "Startups are not made with a long-term goal. Their goal is to make something that can be sold, and that attitude pervades everywhere: 'Do it for a year and if it's not working, kill it.' And I just don't like that because it leads to these unannounced shutdowns and the loss of user content."

Though he notes the Archive Team doesn't frequently face resistance from the sites it tries to download, Scott recently clashed with the founder of Punchfork, a recipe-sharing site, over the Archive Team's move to copy user data before the service's closure. Punchfork's Jeff Miller tweeted at Scott asking the Archive Team to back off. Miller was "archiving my own user's data for them [sic]," he said, though he hadn't specified as much in announcing the startup's sale and pending shutdown.

The Archive Team went ahead with their work. Neither Miller nor Punchfork responded to a request for comment.

Privacy and policy experts see few causes for immediate concern in the Archive Team's work. Julian Sanchez, a research fellow at the Cato Institute, a libertarian think tank, notes that copyright issues or privacy complaints could theoretically arise -- but said the group's efforts so far have provided a helpful service.

"You could imagine situations where individuals might object to hypothetical instances of archiving in the future," said Sanchez. "But I'd assume that the much more common response would just be that users of these sites -- and everyone else who gets benefit from that information -- would regard it as a great boon to not suddenly lose access to years' worth of material they anticipated would be available indefinitely."

Scott, despite his grievances, is optimistic that companies will eventually be required to take better care of individuals' digital property.

"The fact is that for user data, we're still smoking in the baby's face. Eventually people will recognize that you can't do this to people," says Scott. “I would love to be put out of business."

Be Careful Plugging In Your DC Jack

Tips And Tricks To Help Your Laptop Last Longer

Popular in the Community