Thursday, October 22, 2009

Redirecting moved URLs - surely it's not forever?

Our Find service has been up for four months now and we're about to start moving some of our other services into it. This means some of our old platforms (and their URLs) will be retired.

We want to make sure anyone can continue to use the old URLs for accessing the services, but we don't want to have to maintain URL redirects forever (we're an archival institution, when we say forever, we really mean it). We have thousands of pages covering each of our online collection items, so one-to-one URL redirects involve datasets and programming scripts, all of which need maintaining. We also want to do everything possible to make it easy for website managers to update any links they have made to our services.

So we're planning on using a three-phase approach.

Phase one - Full automatic redirection service

All page requests return an HTTP 301 (moved permanently) message pointing to the URL for the equivalent page in the new site - most browsers will automatically refresh to the new location indicated. However, as a fallback it will also deliver a web page indicating that the page has moved, what the new URL is (as a clickable link), and the page will automatically redirect after 10 seconds (using a 'meta' tag refresh).

Some information pages are being removed during the migration. If requests for these pages just redirected automatically to the new homepage, it could be confusing. Instead requests for those pages will return an HTTP 404 (page not found) page explaining that the site has changed and provide a link to the new homepage.

Phase two - Notification-only service

All page requests return an HTTP 404 (page not found) page. This page explains that the site has changed and the expiry date for the old URL; it also includes the URL for the equivalent page in the new site. There is no automated redirection (but the new URL can be clicked on).

This alerts website managers that they need to take action, but gives them the exact information they need and the deadline.

Phase three - Switch off the domain

Hopefully all bookmarks will have been updated to the new locations. At this point the domain name will be retired. Any attempts to access it will return 'DNS Host name resolution failed'.

An alternative is to keep the domain alive but not provide specialised redirects - any request for any page on the domain is automatically redirected (HTTP 301) to only the new homepage.

Timing

We are planning on running the redirects for 12 months - 6 months in phase 1, 6 months in phase 2, then the domain name will be retired.

Does this seem reasonable?

2 comments:

Con said...

Hey Douglas!

Why not just run the redirector forever, and forget phases 2 and 3?

It just seemed to me that if you're going to the bother to set up a comprehensive and user-friendly redirection service in the first place (good on you for that!), then you may as well just leave it going. Surely it would be less work than returning to it later to progressively disable it?

In several years you will be doing the same thing again with your new URL space, so it's not like the redirection exercise is a one-off.

Douglas Campbell said...

Thanks Con,

'In perpetuity' is our middle name, we don't commit to anything without considering whether we can continue doing it for at least 100 years. In our experience, perpetuity is a really long time... and it costs.

As you say, this won't be our last migration, so each future one would then need to adjust all the previous redirects too, scripts would need to be adjusted as new versions of programming languages come out and deprecate certain functions, it would all need migrating to the latest platforms, etc., etc.

Plus, this redirection policy is likely to be applied in the future to all our digital assets, which number in the millions - I don't fancy having to maintain multiple datasets (mapping each from old location to new) like that.

I will happily make the effort to go back and disable a service that will drain resources through this kind of ongoing maintenance. Especially if in 20 years time no-one will be hitting the service with today's URLs anyway.

The obvious solution is to assign persistent identifiers. Then we would only need to maintain the current internal location for a given URL. We are working on this, but in the meantime I'm trying to not create a rod for own back. But I'm happy to hear if this really is being unreasonable or has too short a timeframe.