back to

After Trump win, Internet Archive to back up data in Canada


The San Francisco-based Internet Archive, creator of the Wayback Machine, has announced that the organization aims to create a copy of its archive in another country. Due to the election of Donald Trump and his militant (if uninformed) views on controlling the internet, the Internet Archives aims to build an Internet Archive of Canada–no doubt welcomed by the liberal Canadian Prime Minister Justin Trudeau.

Read Lucas Mearian’s report in partial below, in full via Computer World.

“This year, we have set a new goal: to create a copy of Internet Archive’s digital collections in another country. We are building the Internet Archive of Canada because, to quote our friends at LOCKSS, ‘lots of copies keep stuff safe,’” Internet Archive Founder Brewster Kahle said in a blog today. LOCKSS is an open-source, peer-to-peer network that allows libraries to collect and share Web-based data.

As an organization, the Internet Archive has been a proponent of a free and open Internet, which it believes may be in jeopardy.

Kahle said an Internet Archive of Canada would help keep its cultural materials safe, private and perpetually accessible.

“It means preparing for a Web that may face greater restrictions. It means serving patrons in a world in which government surveillance is not going away; indeed it looks like it will increase,” Kahle wrote. “Throughout history, libraries have fought against terrible violations of privacy—where people have been rounded up simply for what they read. At the Internet Archive, we are fighting to protect our readers’ privacy in the digital world.”

The Internet Archive, which also houses the Wayback Machine web-page repository, is home to more than 15 petabytes (15 million gigabytes) of online data. It is asking the public for donations to build the Internet Archive of Canada, which it said will cost millions of dollars.

“On November 9th in America, we woke up to a new administration promising radical change. It was a firm reminder that institutions like ours, built for the long-term, need to design for change,” Kahle stated. “For us, it means keeping our cultural materials safe, private and perpetually accessible. It means preparing for a Web that may face greater restrictions.”

The Internet Archive’s Wayback Machine, which went live in 2009, is a digital time capsule that stores more than 150 billion archived versions of Web pages - 750 million a week – dating back to 1996.

Based in the Presidio in San Francisco, the Internet Archive and its Wayback Machine use an algorithm that repeats a Web crawl every two months in order add new Web page images its database. The algorithm first performs a broad crawl that starts with a few “seed sites,” such as Yahoo’s directory. After snapping a shot of the home page, it then moves to any referable pages within the site until there are no more pages to capture. If there are any links on those pages, the algorithm automatically opens them and archives that content as well.

*Image of the Internet Archive building via Mental Floss