Thursday, January 21, 2010

Consultation on the New Zealand Web Harvest 2010

In October 2008 the National Library conducted its first whole of domain web harvest, collecting 4 terabytes of data from over 100 million URLs.

While the Library was pleased with the final outcome of the harvest, some readers will remember that the harvest didn’t necessarily go that smoothly but that we tried hard to make amends.

This year we’re working hard to improve our communications, and hope to work with site owners, administrators and other stakeholders to lessen the impact of our harvesting activity.

Our proposed timeline for the 2010 web harvest is:

  • January – Consultation with stakeholder groups.
  • February – Technical planning.
  • March – Communications and notifications about the upcoming harvest.
  • April – The harvest.

We are beginning by seeking feedback on options we’ve identified to address concerns raised during the 2008 harvest, particularly:

  • Notification: The harvest was initiated without prior notification to affected parties.
  • Robots policy: The harvester was configured to ignore the robots.txt convention unless the website owner contacted the Library to request that it be honoured.
  • Location of the harvester: The harvest was operated by the Internet Archive from the United States, and some website owners are charged more for international traffic.

You can read the full announcement or download the Options Paper on the National Library website.

Feedback should be sent by email to web-harvest-2010 AT natlib.govt.nz by 9am Monday 8 February.

Questions can also be sent to that address. We can answer your question individually and privately, but we're also planning to publish a weekly update of answers to questions we've received on the National Library website & here.

If you're at the NZNOG 2010 conference later this month, Gordon is giving a short presentation about the plans and taking feedback.

We really encourage your comments on the options we've prepared, and it would be great if you could help us spread the word about this consultation.

The outcome of the consultation will be published on this page on the National Library website, and we'll republish the information here.

Gordon Paynter (Programme Manager Digitisation) and Courtney Johnston (Web Manager) are the New Zealand Web Harvest 2010 team. You can contact us via web-harvest-2010 AT natlib.govt.nz

0 comments: