Thursday, December 16, 2010

Join the search terms word cloud map mashup

Do you work in a library which has either an online search or OPACs with a catalogue search, or similar?


I’ve started a Google Map with links to word clouds of users’ search keywords. The map so far (http://bit.ly/dE3hrh) has just one set of search keyword clouds – it would be great to have more from around New Zealand (and beyond).


What you need:


  • Any kind of search tool or catalogue which produces a log of search keywords entered by users.

  • To be able to nominate a geographic location for the dataset.

  • Ideally – web statistics which include a list of the search keywords and (useful but not essential) their frequency. But as long as the data exists in some format (eg log files, or even just a list) it will still work.

If you’d like to contribute just email me (rebecca.cox @ Natlib) or comment here.


We collect web server log files and feed these into our web statistics software (Urchin, a version of Google Analytics which is installed and managed in-house.) From here you can export data in Excel format. I’ve cleaned this up, selected 500 terms from the top and bottom of the list, and created word clouds at www.wordle.net


Web stats give access to a wealth of data and can help identify audiences and behaviour which are not otherwise visible.


A while back, I checked the web stats for Papers Past to see how much “brand aware” search traffic the site was getting, and discovered there’s a significant number of people who appear to be searching the site for specific content using external search engines, eg site:paperspast.natlib.govt.nz “anti-opium association” or papers past deaths ashburton 1921.


You can look deeper by segmenting web stats by a range of criteria, from the number of words visitors use in their searches, to visitor domains (eg break out all the traffic from domains ending in .ac.nz), frequency of visit, number of pages viewed per visit, and more. For more on this, see Seb Chan’s Continuous Refinement and Data Driven Dynamic Personas from Webstock this year.


Another form of web visitor stats are heatmaps, which give a visualisation of where users are clicking on a web page (try Clickdensity or Clickheat). Here’s a heatmap showing the activity on our new homepage for the first few days after it went live.


National Library of New Zealand homepage heatmap

Tuesday, December 14, 2010

Adding Closed Captions to YouTube

We’ve recently had our first go at adding closed captions to our YouTube videos. Closed captions aid hearing impaired users in understanding the content of our videos and are extremely helpful for users that don’t have sound enabled on their computers. Closed Captions are also required under Guideline 1.2.2 of the Web Content Accessibility Guidelines (WCAG) 2.0.
The process is actually quite straightforward and less time-consuming than I would have thought. It does help if you’re provided with a transcript of the original content though.
Closed Captions are expected to describe all significant audio content including non-speech information such as the identity of speakers and their manner of speaking as well as music and sound effects. In this particular case it was fairly simple as the audio was largely only a voice over describing the content of the video.
YouTube currently supports two format options for closed captions, either .SRT of .SBV.
The .SBV format is YouTubes own format and is slightly simpler than .SRT so we have used it for this example.
The .SBV format is just a basic text file that follows a time format of hour:minutes:seconds.milliseconds. The times are delimitated by a comma and are followed by a line break and then the text to be displayed during this time. Two line breaks indicates the end of the caption and the start of the next time code.
Here’s how the first twenty seconds of the closed caption file looks:
0:00:01.000,0:00:02.000
Hi, I'm David Reeves
0:00:02.000,0:00:05.000
I'm the Associate Chief Librarian at the Alexander Turnbull Library in Wellington
0:00:05.000,0:00:14.000
We've undertaken a huge project to digitise a number of our photographic collections during 2010 and 2011
0:00:14.000,0:00:18.000
while the National Library building has been undergoing some major refurbishment
0:00:18.000,0:00:22.000
we've been able to dedicate around 20 staff to this special project.
Getting the captions aligned to the right time code can be slightly tricky and I found that it was easiest to play through the video and pause every now and then to pick the best start and finish time for each time code. It’s also important to keep line lengths reasonable as otherwise YouTube can cut of the captioning text. I found that no more than 15 words per line worked as a rough guide. This often means that you’ll need to break up longer sentences into several shorted time codes.
If you find that directly editing a .SBV text file is too much work then there are also sites out there such as Caption Tube which help make the captioning process easier.
Here’s how our original video looked:
And with closed captions (CC) turned on:

Have a look at the completed Pictures Online video on our YouTube channel.

Matt O'Reilly

Monday, December 13, 2010

Handwritten newspapers

One of the distinguishing features of a newspaper is that it is printed (on newsprint). So you may be surprised then to learn that two out of 300,000 newspaper issues in Papers Past are in fact handwritten.

The Victoria Times of 15 September 1841

Many of you will know that five hundred copies of the first issue of the Victoria Times were published in Wellington on 15 September 1841. These were lithographed, rather than letter-pressed like most newspapers.

The first three pages are handwritten text, and the last is a fascinating plan of Wellington in 1841. Note that Lambton Quay is actually a quay (i.e. constructed along the edge of a body of water) and that Basin Reserve is a “proposed basin” linked to the water by a “proposed canal”. In some issues (but not ours) the map was hand-coloured. This was not an economical way to run a newspaper, apparently, as the first issue was also the last.

It is such an interesting issue, however, that we scanned it in colour to make the handwriting more legible. In Papers Past, it is one of two publications displayed in shades of grey rather than simple black and white (the other is Kai Tiaki: the Journal of the Nurses of New Zealand). You can't OCR a handwritten document, but Planman (our OCR vendor) were able to transcribe it for us in a format we can load into Papers Past. We've provided the full issue PDF file in colour: the map on page four in particular looks fantastic (6 MB PDF).

The Oamaru Times and Waitaki Reporter (a.k.a. North Otago Times) of 21 April 1864

Our other handwritten newspaper has an even more unusual provenance. A few years ago when we checked a batch digitised of newspapers we found one issue where the OCR accuracy was basically zero. This was unusual, so we took a look, and found that the 21 April 1864 issue of the North Otago Times (which was known as the Oamaru Times and Waitaki Reporter at the time) is a carefully-created collage, reconstructing what the original issue must have looked like. Take a look at page 1 for example:


We didn't know what to make of this, so we went back to the original scans, and found that this really is what the pages look like on the preservation microfilm. So we went back to the source: Dunedin Public Library. Here's what they reported:

I have just taken a good look at that 21 April 1864 issue of the "Otago Times".

Yes, it is a transcription but with some bits (part of title, etc., coats of arms, picture of ship, picture of Singer sewing machine) meticulously cut from another issue and pasted in.

On the front of the cover of the original binding for the 1864 issues it is noted that No. 9 (i.e. 21 April) is missing. The person who has made the transcription and who gave us all the 1864 copies was W.H.S. (William Henry Sherwood) Roberts. He must have located another copy of No. 9 and transcribed it. There is a pencil note on the inside of the original cover stating that No. 1 was given to Roberts on Nov 8 1908 and the other 11 (i.e. 2-8, 10-13) numbers on 4 April 1901.

We have a lot of material that came from W.H.S. Roberts, including many scrapbooks and this is undoubtedly in his hand. One thing that won't show up on the microfilm is that on the last page of the transcription he has used red ink to make an x in two places - to indicate where he had omitted some text in the first instance and where he was noting a mistake in the original in the second instance (though he has copied the original complete with mistake). He was very exact in his copying and has clearly tried to maintain the layout as it was printed.
Mystery solved.

The Nokomai Herald of 1871

These handwrittewn newspapers were brought back to mind earlier this month when our friends at New Zealand Micrographic Services pointed out an article in the Mataura Ensign of 18 May 1897 about the handwritten 'Nokomai Herald' of 1871 in Papers Past:


Ironically, the Nokomai Herald was apparently published for about a year (1871-1872). We have no plans to add this paper to Papers Past, but there are apparently a few paper copies around, and you can see a scan of the first page of the first issue on rootschat.

Friday, December 10, 2010

A gentle introduction to UX & Usability

This is the first of a series of posts which will cover topics from my UX & Usability session at LIANZA Conference 2010. I’m hoping these will be a little tidier than the session, which was rewritten continuously as I listened to the other presentations! I’ll also include some links to blogs, meetups and other conferences which may be useful to anyone new to the UX community.


UX translates as “user experience” and has a somewhat fluid meaning. It may be seen to encompass activities such as user research, design, and ongoing customer support. It overlaps with marketing and market research, and with much of the day to day work librarians do with customers. Use of social media to connect with customers might also be seen as a form of user research, with a continuous feedback loop in action.


User research has two important outcomes:



  1. It enables the people who design, build and operate a service to step away from their vantage point and see the service from their customers’ point of view.


    At the Wellington UX Barcamp last month, Nick Bowmast gave presentations about the role of the user researcher in enabling a design team to empathise with the users of a product, and, how he’s been presenting the research in a visual format rather than as written reports, with great results (example at the end of this post).


  2. It produces and/or analyses data about usage of the service and the outcomes of this usage, which can be used to support decision making (e.g. new features or content to be added, budget allocations, support requirements).


    If you missed it, Carol Tenopir’s LIANZA keynote “Sharpening the Value Edge of Academic Libraries” took us through measuring usage, outcomes (for example the effect reading articles provided by a library has on research output), and ROI. Download the presentation from the LIANZA website, or see LIBvalue for further information.



The next posts in this series will cover exploratory user research such as interviews and surveys, usability testing, and using web statistics, all in relation to peoples’ experience of searching and browsing library collections online.


More on UX and user research:



Finally, below is one of Nick Bowmast’s graphical presentations of user research. This is from a 2008 study looking at how early adopters of technology were finding, viewing, storing and sharing digital media content. Presenting this visually allowed complex information to become digestible and approachable as a basis for discussion with stakeholders.


How early adopters of technology were finding, viewing, storing and sharing digital media content - graphic

The Source: news about digital libraries and library innovations from around the web

Introducing The Source


The Memento Project - Time Travel for the Web - wins major international award for digital preservation

From the Digital Preservation Coalition website


The Institute for Conservation and the Digital Preservation Coalition (DPC) have announced that the Memento Project, led by Herbert Van De Sompel and colleagues of Los Alamos National Laboratory, and Michael Nelson and colleagues of Old Dominion University, USA, has won the Digital Preservation Award 2010.
The Digital Preservation Award is one of five awards organised by a working party of the Institute for Conservation (ICON), known collectively as The Conservation Awards. Each award celebrates different aspect of the highest standards of conservation skills, innovation and research, collections care and digital preservation. The Awards, which were launched in 1991, are supported by Icon and sponsored by The Pilgrim Trust, the Digital Preservation Coalition (DPC), and the Anna Plowden Trust. Since 2005, the Awards have also been generously supported by Sir Paul McCartney.


Technology developments in the digital economy (Note: PDF)

From the Australian Communications and Media Authority (ACMA) website

This government report looks at recent developments in the three key communications/information technology areas of Infrastructure, Smart Technology and Digital Community. Subtopics include diverse but related topics such as home network technologies, digital identity management, smart-phones, ICT energy efficiency, location-aware communities, mobile payment and mobile coupon technologies, augmented reality and social media influence. A useful glossary is included, as well as numerous links to further readings. This is a useful overview to help keep up to date with big picture developments in ICT, as well as to plan future strategic library services.


The size distribution of open access publishers: A problem for open access?

From the First Monday website

I stumbled across the question of publisher size while preparing for an earlier article. From the viewpoint of an economist, the size distribution of open access publishers looked inefficient. In this article I first explore reasons to be sceptical to a situation with a large number of small publishers. Then I go through the numbers from the Directory of Open Access Journals, also discussing problems inherent in the material. The results are then compared to similar data about toll access publishing. A conclusion is that, even though numbers may lack in exactitude, there seems to be a need for institutions to look at how they organize their publishing activities.


The impact of open access outside European universities (Note: PDF)

From the Knowledge Exchange website


The potential impact of open access is understood in many communities but requires a greater volume of open access content to be available for the full potential to be realised. The Open Access movement has encouraged the availability of publicly-funded research papers, data and learning content for barrier-free use of that content without payment by the user. The impact of increasing availability of content to researchers in European universities is understood in terms of easier access to previous research and greater exposure for new research results, bringing benefits to the research community itself. A new culture of informal sharing is evident within the teaching and learning communities and to some extent also within the research community, but as yet the growth in informal sharing has not had a major effect upon the use of formal publication choices.
This briefing paper explores the impact of open access upon potential users of research outputs outside the walls of research-led European universities, where the economic value of open access may be even greater than the academic value within universities. The potential impact of open access is understood in many communities but requires a greater volume of open access content to be available for the full potential to be realised. More open access content will become available as the opportunities in open, internet-based digital scholarship are understood.


Factors affecting the frequency and amount of social networking site use: Motivations, perceptions, and privacy concerns

From the First Monday website

The purpose of this study is to explore the factors that affect the use of social networking websites. In doing so, this investigation focuses on two dimensions of social networking site use frequency (i.e., how often people use social networking sites) and amount (i.e., how much time people spend on social networks). Integrating the technology acceptance model with uses and gratification and other consumer characteristics, this study found that interpersonal utility, perceived ease of use, privacy concerns, and age predict the frequency of social networking site use. Interpersonal utility motive, escape motive, and Internet experience explain the time spent on social networking sites.