Monday, November 23, 2009

How beta is a beta release? Quality versus Delivery

This week our DDI development programme has been brought to you by the letters P and M - of Project Management.

So the big drama this last week has been deadlines and prioritisation - the stuff that gives project managers job security. We really want to get the upgraded Timeframes preview out ASAP, but we've found there's more work to be done than there is time available - I'm sure we're the first people in the world to strike this problem ;-)

But is the problem that we're slow developers or that we're bad at estimating how long the work will take? From my observations, at the end of the day we're basically good old-fashioned, optimistic Kiwis. There's that software development saying of "Take your estimate [of the time it will take to do the work], then double it, then double it again, then you're starting to get close to knowing how long it will take". Well, we apply that rule, then say "nah, surely it can't take thaaaat long, it'll only be half that... she'll be right".

Mind you, estimating accurately is doubly-hard when you're working on a system that you are still learning. We've found Primo is well optimised for university and public libraries, but as a national library we have additional needs - both in the content we have and how we present it. This means we've been sidetracked into developing a number of 'workarounds', though thankfully Primo has a plugin architecture so it hasn't been too mind-stretching.

We've now got something that mostly works, but isn't quite how we'd really like it to be (or how we know it could be). We're running up against our self-imposed deadline but some of the ducks aren't in the row yet. What to do?

It's at this point that all project/product managers hit that age-old dilemma: which would users prefer - almost reasonable today or really good next week? You only get one chance at first impressions. Users usually say they'd prefer 'anything' now, but then grumble when it wasn't what they hoped. I think I'd like to phone a friend on this one.

Given it's only a beta release, how much incompleteness and confusingness in the interface will users put up with? The Google (et al) Labs and 'perpetual beta' phenomena means people are open to the idea of using a site that is a work-in-progress, but we haven't quite set up our environment that way yet so we're on the back foot. While delivering something early may reduce the development workload over the week, it increases the 'comms' workload - communicating what state it's in, known issues, tips for completing tasks, etc., and handling increased user queries (from users that didn't read all that carefully crafted comms stuff).

Not an easy decision... Mainly because there's no 'right' answer.

Separately we've been working on a service model for managing our digital services (remind me to cover that in a later post). While the customers' needs are paramount, we believe there are three perspectives to balance:

  • Business needs
  • Customer needs
  • Technology needs.
This was a good opportunity to give the model a whirl in a field-test. We brought together representatives from all three perspectives and reviewed and prioritised each of their needs. We gained a shared understanding of each other's needs and came to a mutual decision. Since this is only a preview (running in parallel with the production system) we decided really there are only two 'must-haves', and once those are ready we'll release it. We'll then move on to tidying up the remaining high priority needs before we cutover the old site - after all, the tidyup won't take long based on our estimates (!!).

Surprisingly we received no feedback (positive, neutral, or negative) for the recent cutovers, so that bodes well for this preview, right??

Stay tuned.

Friday, November 20, 2009

The Source: news about digital libraries and library innovations from around the web

Introducing The Source


Report of the Task Force on (Harvard) University Libraries
(Note: PDF)

From the Office of the Provost, Harvard University website

Harvard’s library system now includes 73 separate libraries with 1,200 full-time employees, 16.3 million volumes, 12.8 million digital files, over 100,000 serial titles, and millions of manuscripts, photographs, musical recordings, films, and artefacts of all kinds, making it by far the largest university library in the world.

Statement on the Report of the Task Force on University Libraries (Note: PDF)

The Core Recommendations of the Task Force are:
  • Establish and implement a shared administrative infrastructure
  • Rationalise and enhance information technology systems
  • Revamp the financial model for the Harvard libraries
  • Rationalise system for acquiring, accessing, and developing materials for a “single university” collection
  • Collaborate more ambitiously with peer libraries and other institutions

Making the case for European research libraries (Note: PDF)

From the Ligue des Bibliothèques Européennes de Recherche (LIBER) website

The Ligue des Bibliothèques Européennes de Recherche (LIBER) Strategic Plan 2009-2012 provides a framework for the LIBER Strategy in the coming years. In 2009-2012 LIBER will give priority to the following areas:
  • Scholarly communication
  • Digitisation and resource discovery
  • Heritage collections and preservation
  • Organisation and human resources
  • LIBER Services

Social isolation and new technology: how the internet and mobile phones impact Americans’ social networks (Note: PDF)

From the Pew Internet & American Life Project website

This survey is the first ever that examines the role of the internet and cell phones in the way that people interact with those in their core social network. Key findings challenge previous research and commonplace fears about the harmful social impact of new technology.

Wednesday, November 18, 2009

Papers Fast

I've just finished writing up a project we finished earlier this year: Papers Fast.

Some background

Papers Past was re-launched in 2007 with a new look and new features -- particularly search -- and quickly become the National Library’s most popular website. In the first year the number of visits per month increased 20-fold, and then it kept growing. But even when it was re-launched, Papers Past was not a fast website. And as time passed, and the number of users grew, and the number of pages increased, we noticed it was becoming slower and slower.

To start with we had an easy solution: when we noticed the site was slowing down, we added another web server to share the load. We started with three web servers. By the time we got to eight this approach had stopped working: adding new web servers did not make Papers Past any faster. Worse, we had built up a backlog of almost half a million pages of searchable text that we could not put online because we were worried the whole system would grind to a halt.

Drastic action was necessary.

So the Papers Fast project was launched. Its goal: to make Papers Past fast.

What’s the problem?

After talking to people who might know, we identified four factors that might be causing problems:

  1. Application. As far as we know, Papers Past is the biggest and most-used Greenstone installation in the world. Maybe Greenstone cannot scale up far enough?
  2. CPU. Papers Past was running on old Sun SPARC servers that were due for a refresh. Maybe new servers would do the trick?
  3. NFS. Most of the Papers Past data is served up using the Network File Service protocol. Is this a good choice for Greenstone?
  4. Network. The Papers Past data is stored on a different part of the network from the web servers, behind a firewall. Is this a problem? Which was it?
To find out, we borrowed a massive computer with 24 terabytes of disk from GEN-i, copied over all our digitised newspaper data, and asked DL Consulting to install a fresh copy of Greenstone, setting up an entirely separate copy of Papers Past.

Then built a fake collection with 2.5 million searchable pages, used Jmeter and our Apache logs to put the test system under twice as much load as we've ever seen before, and watched to see what would happen.

We found the problem was... all of the above.

So what to do?


The first fix was to upgrade Papers Past search to use Apache Solr instead of Apache Lucene. The second was to replace our eight aging webservers with two new Sun Blade Servers with AMD CPUs. Third, we switched to local disk for the metadata and indexes (we'll upgrade to a fibre-attached SAN by the end of the year).

Then we built a new fully-searchable collection (including three new titles) and re-launched on 22 June 2009, two days ahead of schedule!

And no technology project would be complete without a little scope creep. In this case, we had to support the METS/ALTO journal profile so we could add Kai Tiaki: the Journal of the Nurses of New Zealand to the collection, and to extend the image server to support new titles digitised in greyscale. DL Consulting made these changes, and a few more, along the way.

Did it work?


Yes. We've been serving more traffic, and response times have been faster.

For Papers Past, we track traffic from Google separately from everyone else (it's along story, but the core problem is that we serve so much data to Google that our aging web statistics package can't crunch the numbers).

So here's the number of hits we served to everyone other than Google for four weeks before and eight weeks after the launch.

And here's the number of pages we serve up to Google (via Google Webmaster Tools).

You can see that requests from everyone is way up -- especially Google, who have slurped up about 700,000 pages per day lately, peaking at over a million. Before the upgrade, we had a lot of trouble getting Papers Past fully indexed in Google News Archive, but now it is pretty much all there.

Despite this increased traffic, Papers Past response times are much improved. We have been monitoring response times since 2007, and set out very clear performance targets before we kicked off Papers Fast. Here's the performance targets, and the times we observed before and since the changes were made. (All times are in milliseconds.)

Performance measure

Target

Before changes

Since changes

Average response time for generated page request: measured by Google Webmaster Tools

1000

3000-5000

600-800

Average response time for generated page request measured by the Library

1000

> 3500

402

Average response time for search page request measured by the Library

1500

6639

1055

Average the Library time for image server request measured by the Library

6000

11158

3574


Let's take a look at the changes graphically. Here's out internal tracking of response times.

Here's how the response times were tracked by Google.

It's quite a change.

Finally, it has made a big difference for our infrastructure. Here's how the NFS traffic to one of our fileservers changed when we moved the Papers Past metadata and search indexes away. It's also freed up corresponding network capacity.

Summary

On 22 June 2009 Papers Past users not only got half a million more searchable pages, they got a big speed bump. Traffic is up since then, but response times have remained low, and we have a plan to handle more data (the SAN) and more users (extra front-end servers).

Saturday, November 14, 2009

Three website upgrades

Following Courtney's challenge, I'm gonna take a crack at weekly updates on our current major website developments.

As the Digital Service Manager for Find, which is the poster child for a larger internal programme called DDI (Discover, Deliver, Interact), I'm supposed to hold it all together. We'll see if I can hold it together for a weekly update on progress...

Anyway, that's the end of my intro/disclaimer/apology if these posts peter out. Where are we at?

We've been migrating a lot of our metadata records to the new Primo software platform, and we released our first cut in July as the new Find search service. Our main priority has been migrating services off older software which has reached the end of its life.

Last Monday (the 9th) we cut over three of our websites to their (much faster) upgraded versions:

  • The GLAM organisations who are members of Matapihi have most of their content loaded into the growing giant DigitalNZ, so it made sense to move Matapihi's back end to the DigitalNZ engine. We also conveniently have all the Matapihi content loaded in Find
  • findNZarticles contributed content is also in Find, so the back end has been migrated to the Primo platform, it continues to have its own website
  • PublicationsNZ content (a.k.a. the National Bibliography) is also in Find, but it is effectively a subset of our National Library catalogue, so it no longer has its own website, instead there is a PublicationsNZ entry page on Find.
There's still some tidy up work to do, but these seem to be running reasonably well at their new locations.

It took us quite a while to come to terms with Primo's internal 'PNX' record format and how metadata records are converted during import; it loves MARCXML and simple Dublin Core records, but it coughs loudly when you throw more complex XML (especially with namespaces) at it, but we're finally starting to understand how to wrangle it. There's also a hugely complex maze of mapping/lookup tables - slowly we're piecing together the chains of lookup codes and documenting their inter-relationships so it's easier to maintain.

Our eyes are now focussed migrating two remaining services - Timeframes and Discover. We are planning on releasing previews for these before the end of November. You can check the current timetable on our Online Services Changes page.

Friday, November 13, 2009

The Source: news about digital libraries and library innovations from around the web

Introducing The Source


Copycats? Digital consumers in the on-line age (Note: PDF)

From the Strategic Advisory Board for Intellectual Property Policy (SABIP) website


Huge economic losses are being sustained due to large-scale unauthorised downloading, generated by widespread confusion about copyright law in the online world. This UK report examines online consumer behaviour in the UK and its potential impact on business and government policy. It is the first piece of research to look at evidence from across the copyright industries and across all age ranges.
The report has two further objectives:
  • To inform a SABIP workshop at which a selected group of attendees with a direct interest in the issue will consider the implications of consumer behaviour on IP and make recommendations for further areas of SABIP research
  • To highlight any further SABIP research that is required to ensure that all agencies of Government have the fullest understanding of the issues
Key findings:
  • The world of the digital consumer is an environment, indeed a series of ‘eco-systems’, subject to rapid change; change that means many predictions about the future of the Internet and digital convergence (and how these are ‘consumed’) made even two, and certainly five and ten years ago seem quaintly dated – a fact that should be held in mind as predictions are made for the future of not just ‘Digital Britain’, but also the ‘Digital World’
  • Within ten years we have seen the widespread domestic use of high-speed broadband and multichannel (and often High Definition) digital television with the facility to time-shift, copy and view programmes on other devices, and to upload these files to websites such as YouTube; the arrival of wi-fi in the high-street, the library, the office, university and the home; the rapid expansion of open source and Creative Commons publishing; at least four iterations of file-sharing technologies; the birth of mainstream blogging as a broad social phenomenon; the arrival of social media as a significant medium of authorship, sharing, and communication; the shift by the younger digital consumer towards the mobile phone as not just an aural communication tool, but also a medium for text messaging, music and video consumption, and as a gateway to post messages, photographs and other types of content to social media websites
  • Most recently the large expansion in use of ‘microblogging’, to websites such as the text-based Twitter and the image-based Tumblr, has once again surprised many who suspected these services were a fad. Finally, the recent successful launch of the BBC’s authorised programme-streaming service, iPlayer, and the music streaming service, Spotify, has demonstrated that new forms of business models may be possible in the world of ‘free things’. Unsurprisingly, the literature review we undertook does not grasp the enormity and the speed of these changes. Each impacts centrally on intellectual property
  • The challenge for IP policy makers is to judge and, where possible, measure the changing social behaviours and attitudes brought about by the myriad rapidly evolving technologies and networks of the digital revolution, and map this against their economic, political and social objectives

'Authentic' learning experiences: What does this mean and where is the literacy learning? (Note: PDF)

From the aWAy with Words Conference website

Teachers are challenged to adopt practices that facilitate the development of “necessary” skills and strategies for learners. For many, however, what is required in policy and curricula is increasingly obscured and even confusing as teachers are bombarded with jargon prescribing seemingly similar (yet apparently different) approaches such as “rich tasks”, “big questions” and “fertile questions” that are to be "relevant”, “authentic” and “engaging” for the learner. Barton and Hamilton (2000) argue that literacy learning should take the learner beyond the transmission of technical skills in the classroom to an understanding of its role within a community’s cultural practices. These literacy practices are mediated by literacy events and it is engagement with these events and their diverse demands that allows learners to make strong connections to their own literacy practices.
Reported in this paper are the interpretations of four experienced primary school teachers as they plan, programme and facilitate authentic literacy experiences in their classrooms. These are examined within the framework of the principles of authentic learning, which is useful in gaining insight into the ways that experienced teachers make sense of the complex jargon associated with their profession for the development of deep and flexible knowledge that can be applied in a range of community settings. Evident in these teachers’ stories are the understandings, beliefs, contexts and competing tensions that underpin the conceptualisation, design and implementation of these experiences. The teachers’ stories reveal the complexity of teaching as they consider:
  • the individual contexts of their schools
  • their students’ own communities
  • the expectations of stakeholders in a child’s education
  • the availability of resources

Public libraries and the Internet 2008-2009: Issues, implications, and challenges

From the First Monday website

This paper presents an overview of methods, findings, issues, and implications from the 2008 ‘Public Libraries and the Internet’ national survey, including comparisons to data from previous studies. Since 1994, these surveys have chronicled the expansion of the Internet as a primary library service. The 2008 survey includes key data about the many facets of public libraries as community Internet access, training, and service centres, from the number of workstations and connection speeds available to the most common Internet services and training. The findings from the 2008 survey reveal impacts of the global recession on public libraries and their ability to meet the needs and expectations of patrons, communities, and all levels of government.

Wednesday, November 11, 2009

Engage Your Community - Social Media Workshop

On Friday 13 November I'm giving a workshop on social media at the Engage Your Community conference.

Workshop Format

As I'm not sure what the level of experience is across the people in the workshop, I've broken it into five sections. Each of these sections can be expanded or contracted, depending on the level of detail we need to go into. I'm hoping for loads of experience-sharing from the people in the workshop.

Introductions
How do we all use the web? How many of us are running personal social media accounts? How many are running accounts on behalf of their organisation? What happens when personal and professional use start to overlap?

This section is designed to get people talking, and to give me a chance to assess how familiar people are with social media tools. That will help me pitch the following sections at the right level.

Observations from Day 1
A few quick points from the presentations given by Colin Jackson, Nathalie Hofsteede and Chris Brown.

A tour of the social web
What's out there that people could be using?

- Listening in (RSS feeds, Google Alerts)
- Joining in (Twitter, Flickr, blogging)
- Community & collaboration (Facebook, wikis, Ning)
[All with examples from the not-for-profit sector]

The golden rules of social media
Things to ask yourself before embarking on any social media adventure (and certainly before picking a social media tool):

- Why do you want to do this?
- What are you offering?
- Who is this for?
- Who will be doing this?

And (numerous) steps for a successful launch.

Planning exercises
Depending on how much time we have, I've prepared an activity for people to break into small groups and plan a social media 'campaign' for a specific scenario.

Hopefully all this gives a bit more context for my slides



Resources & examples

I've also prepared a rather lengthy handout which I'm now going to reproduce here for ease of use.

Introductions to different kinds of social media

It’s hard to beat the team at Common Craft http://www.commoncraft.com, who make short, straightforward videos about all matter of web (and non-web) things.

These are all available on the Commoncraft YouTube channel

Listening in

Twitter search | http://search.twitter.com

Google Alerts | www.google.com/alerts

Google blog search:
  • Google your search terms
  • From options at top left of results page, choose Blogs from the ‘More’ drop-down menu
  • Scroll to the bottom of the search results

Useful reading
- Social media monitoring (State Services Commission)

Joining in

Blogger | http://www.blogger.com

Wordpress
http://wordpress.com (basic account)
http://wordpress.org (to do your own hosting)

Twitter | http://twitter.com

Flickr | http://www.flickr.com

Examples used:
- Whangarei SPCA blog
- Get in on! Twitter
- Rainbow Youth Flickr

Useful reading
- Twitter case study (National Library)
- Mashable’s Twitter Guidebook
- Twitter for non-profits (Mashable)
- Fundraising potential for Twitter (TechCrunch)
- Darren Rowse’s blogging lessons

Community & collaboration

Ning | http://www.ning.com

Wikis
http://www.wetpaint.com
http://pbworks.com
http://www.mediawiki.org

Facebook | http://www.facebook.com

Examples used
- Mt Cook Mobilised wiki
- Museums 3.0 Ning group
- Cancer Society’s Daffodil Day campaign

Useful reading:
- Case study on Daffodil Day campaign (Ideashop)
- Managing Facebook groups (Mashable)
- Wikis when and why (Nina Simon)

Community management

If you’re going to start spending time with your community online, you’re effectively becoming a community manager. This elderly post from Jeremy Owyang is still relevant if you’re trying to figure out if this is your new line of work.

Like any job, there are some personal qualities you’ll need to bring out in yourself, and some tactics you might find useful.

- A case study from the Brooklyn Museum
- A case study from (the early days of) Flickr
- My notes from Heather Champ and Derek Powazek’s 2009 ‘Designing and sustaining creative communities’ workshop

Planning

One of the most important things you need to ask yourself is – how much time do I (or my team of people) have available? How much time does Web 2 take (Nina Simon)

You’re likely to need some simple policies around how you/your team use social media sites in a professional capacity or on behalf of your organisation. I’m a big fan of the very simple guidelines from the State Services Commission, which were written for government, but which translate over well

The Guardian’s community standards are also helpful if you’re thinking about things like comment moderation

And this page aggregates links to social media policies

One piece of advice: these are your policies. Don’t try to second-guess everything that might go wrong & plan against them, or you’ll become paralysed. Read some of the material above, write some useful & sensible guidelines (aimed at helping the people doing your social media outreach to understand what’s okay and what’s not so okay, both in terms of their own behaviour and that of others) and then update as time goes by and circumstances change.

Generally useful, sometimes even inspiring, reading

Beth Kanter’s blog ‘How nonprofits can use social media’ (the title pretty much explains it)
- http://beth.typepad.com/beths_blog

Nina Simon’s Museums 2.0 (Nina is interested in people’s participation in museums & galleries, and frequently writes about social media projects)
blog
- http://museumtwo.blogspot.com

The Community section on A List Apart (but don’t stop there, please, this site is full of delicious reading)
- http://www.alistapart.com/topics/content/community

The Pew Internet & American Life Project regularly issues reports on people’s online activities and behaviour
- http://pewinternet.org/Data-Tools.aspx


Friday, November 6, 2009

The Source: news about digital libraries and library innovations from around the web

Introducing The Source

© the way ahead: A Copyright Strategy for the Digital Age

From the Intellectual Property Office (IPO) website

The aim of copyright is to encourage authors’ creativity and make their works available widely. It is a global system that provides incentives for authors and investors, while allowing access to works for educators, researchers, cultural institutions and users of all sorts, both in business and in the home. Copyright engenders strong emotions. It is about authors’ livelihoods and recognition and about financial rewards for rights holders. But it is also about access to the copyright works, which are essential to our values, our cultures and to the way we spend our work and our leisure time.
This work looks ahead to how copyright can tackle the challenges of the digital age, drawing on previous work including Digital Britain and the Gowers Review of Intellectual Property, on international perspectives including the European Commission’s and on discussions and submissions from stakeholders.


Digitisation of special collections: Mapping, assessment, prioritisation (Note: PDF)

From the Joint Information Systems Committee (JISC) website

Traditionally, digitisation has been led by supply rather than demand. While end users are seen as a priority they are not directly consulted about which collections they would like to have made available digitally or why. This can be seen in a wide range of policy documents throughout the cultural heritage sector, where users are positioned as central but where their preferences are assumed rather than solicited. Post-digitisation consultation with end users is equally rare. How are we to know that digitisation is serving the needs of the Higher Education community and is sustainable in the long-term?
Key Findings:
  • The communities of both intermediary and end users are willing to express their view on prioritising digitisation of special collections; the participation in the project was a matter of good will and the good response makes evident that there is definitely interest of the professional communities to express their opinion on the matter of digitisation needs. It should be noted here that the community of intermediaries sees collections on a finer level of granularity; end users often refer to super-collections such as the holdings of an institution
  • The top user-driven priority criteria that emerged from consultation with both intermediaries and end users are: Improve access; Enhance impact on research and/on studies; Enhance impact on teaching; Allow for collaboration; Improve access outside
  • The geographic and institutional boundaries of collections nominated for digitisation are wider – this study was aimed at the higher education institutions in the UK, but 14% of the nominated collections were from institutions outside of the higher education sector, and 6% were from overseas
  • The complementarity of collections is strongly favoured by both users’ communities
  • The criteria for digitisation nominated by intermediary and end users include general criteria but also a number of criteria where metrics can be applied; thus allowing to establish a ranking mechanism

Integrated Library System Platforms on Open Source / Stephen Abram (Note: PDF)

From Stephen's Lighthouse (Stephen Abram) blog


Stephen Abram discusses what he (and SirsiDynix) see happening when libraries get into talks about moving their Integrated Library Systems to open source platforms systems. What has been found is that they often are not aware of the heavy drawbacks of what open source systems cannot offer at this point in time. To help buyers become aware of the limitations of open source, he has set out to clarify what open source is, how it is different from proprietary software platforms, and why Integrated Library Systems (ILS) are not ready for open source at this point.


Testing the accessibility of Web 2.0

From the University of Southampton, School of Electronics and Computer Science website

Dr Mike Wald and E.A. Draffan are leading a project funded by JISC (Joint Information Systems Committee) TechDis which looks at how well people with disabilities can access web services such as blogs and wikis and social networking sites. The team have built an accessibility tool kit, which will enable users to test the accessibility of web 2.0 services. The accessible pen drive offers freely available assistive technologies that can be used to help with this evaluation.
Web2Access, part of the toolkit, provides an online checking system for any interactive web-based services such as Facebook. “We developed it because nowadays users contribute, as well as read, information and so you cannot just click on a button to see if websites are accessible and easy to use”, said E.A. Draffan.