Penny Crossland Google's news archive project comes to a stop
Jinfo Blog

6th June 2011

By Penny Crossland

Item

The Google News Archive, an ambitious project originally launched in 2006 has been mothballed by its owners; that is to say, content is no longer being added to the archive. As reported by LiveWire in 2009 and 2010, Google’s project was seen as the most comprehensive historical news archive on the web: the original plans were to scan and index up to 250 years’ worth of newspapers’ microfilm. Apparently the archive to date consists of around 60 million pages from 2,000 newspapers.

Reaction to the end of the project has been mixed. While many applauded Google for trying to digitise history in the first place – after all there are few companies that could have embarked on such a mission – others, such as the Technologizer blog have wondered why Google did not do more to publicise the news archive. In fact, the archive search page is hidden away and takes some searching to find it. Quite why Google has pulled the plug on this project is not clear. All the company has said on the matter is that it wants to concentrate its effort on projects that help newspaper publishers sell content over the internet, such as via its Google One Pass facility, described in this LiveWire posting.

Apparently, Google intends to hand over all archived content to its original owners, and publishers with their own digital archives will be able to add their content to the Google news archive via sitemaps, however the bottom line is that Google will no longer be spending any of its money on the venture.

I have to confess that I never searched the news archive, assuming that old content would appear on a general news search. Of course, it is not too late to investigate the archive.

Meanwhile, national newspaper collections, such as the British Library’s Colindale collection are being digitised at a pace. It was a year ago that LiveWire reported on the BL’s newspaper project, which aims eventually to make 40 million news pages available online. One year on, The Guardian reported that around 500,000 pages have now been scanned; the plan is to launch a searchable site by the time 1.5 million pages have been scanned and uploaded. At the moment, the British Library is avoiding the thorny issue of copyright by concentrating its efforts on 18th and 19th century news items. Once it gets to the 20th century, no doubt we shall be hearing from the likes of News International on the subject. It was almost a year ago that James Murdoch expressed concern about the BL’s digitisation project and the matter of providing access to Times articles.

« Blog