Friday, March 27, 2009

The Source: news about digital libraries and library innovations from around the web

Introducing The Source


Going Grey? Comparing the OCR Accuracy Levels of Bitonal and Greyscale Images / Tracy Powell and Gordon Paynter

From the D-Lib Magazine website

Newspaper collections are the subject of an increasing number of large-scale digitisation projects. In Papers Past, a collection of over a million newspaper pages, the introduction of full-text search has made a wealth of information findable that was previously hidden. The search feature is dependent on text extracted from the newspaper page images with Optical Character Recognition (OCR), so any improvement in OCR accuracy will add value to the collection by improving our users' chances of finding useful information.
Accepted wisdom is that greyscale digitisation produces higher OCR accuracy than bitonal digitisation. To test this assumption, we digitised three reels of microfilmed historic newspapers in both bitonal and greyscale, had them OCRed, and carried out a hand-count of the OCR accuracy on a random set of text samples. The experiment had a clear and surprising outcome: using our existing business processes, there was no evidence of any improvement in OCR accuracy from greyscale digitisation.


Deep Packet Inspection Puts Open Internet at Risk (Note: PDF)

From the Free Press website

The uncertainty surrounding Net Neutrality has given rise to a technology known as Deep Packet Inspection (DPI) that offers Internet service providers unprecedented control over Internet content. A recently released paper argues that the use of DPI technology by Internet service providers should raise serious concerns for both users and lawmakers. The paper asserts that the emerging DPI business model, marketed for its ability to monitor, control and ultimately charge subscribers for every use of an Internet connection, poses a major threat to the open Internet. In just one of many examples, DPI manufacturer Allot describes how its DPI product “enables service providers to project potential revenues and profits from setting up a tiered service infrastructure” and allows providers to “reduce the performance of applications with negative influence on revenues (e.g. competitive VoIP services).”


Towards Implementation of Library 2.0 and the e-Framework (TILE)
(Note: PDF)

From the Joint Information Systems Committee (JISC) website

A new briefing paper has just been published by JISC, which informs dialogue around the current and future role of the library in a rapidly changing technological landscape. The paper resulted from JISC’s TILE4 project (Towards the Implementation of Library 2.0 and the eFramework); a programme of work with two key aims. Firstly, TILE investigated how libraries have incorporated web 2.0 applications and services into what they already do. Secondly, it sought to develop a draft conceptual framework (Library Domain Model) based on services it has specified for the international e-Framework. The project also makes recommendations on how the library community could the make the best use of web 2.0 approaches.


On trusting your socks to find each other (Note: PDF)

From the Yahoo! Research website

This articles addresses design issues that may arise as a result of the deployment of networks of devices that will constitute the “Internet of Things”. Addresses issues in particular around the trustworthiness of information exchange and transparency in such networks.


Mashing up research and connecting with learners through social media (Note: Podcast)

From the Joint Information Systems Committee (JISC) website

Ewan McIntosh, Digital Commissioner for 4iP, talks with Rebecca O’Brien from JISC via Skype about mobile gaming, how thinking from the learner’s perspective is key, and how universities have a vital role in mixing logical thinking with inspiring creativity.


Profiling Social Networks: A Social Tagging Perspective

From the D-Lib Magazine website

The web is rapidly becoming both more open and more social through the provision of technologies that make it easier for end users to access resources and join in social networks. Social networks have pioneered online communities, allowing users to contribute to collective knowledge by tagging online resources. Tagging behaviour increased dramatically between 2005 and 2007. This article reports on an investigation of social tagging using data gathered from Delicious, Flickr and YouTube for the years 2005, 2006 and 2007. Preliminary findings indicate both that it is possible to profile a social network through the analysis of tagging data and that Delicious is a more representative venue for analysing the social tagging behaviour of users than either Flickr or YouTube.

Friday, March 20, 2009

The Source: news about digital libraries and library innovations from around the web

Introducing The Source

UC [California]-eLinks Direct Linking Usability Report (Note: PDF)

From the California Digital Library website

In the past, the research process began with two, distinct phases: discovery and access. After determining a topic, a researcher would enter the discovery phase, in which he or she would look through library catalogs and article indices to identify resources that might pertain to his or her research. The goal of the researcher during this access phase was to get a physical copy of the resource. The Internet and advancements in search engine technology and library information systems have made research easier in some ways and more difficult in others. The change that has the greatest implications for UC-eLinks – and for library services in general – is the collapsing of the discovery and access phases into a single workflow.


Measuring the Internet Economy - The ICT Development Index
(Note: PDF)

From the International Telecommunication Union website

This report, published in 2009 by the ITU, compares developments in information and communication technologies (ICT) in 154 countries over a five-year period from 2002 to 2007. The Index combines 11 indicators into a single measure that can be used as a benchmarking tool globally, regionally and at the country level. These are related to ICT access, use and skills, such as households with a computer, the number of Internet users and literacy levels.


Australia: Internet access in public libraries survey 2008 (Note: PDF)

From the Australian Library and Information Association website


This report is a follow-up to similar survey reports in 2002 and 2005 and a more specialised report on internet filtering in 2007 and provides current information on how public library internet services are managed, delivered and used in responding libraries. This report of internet services in public libraries in Australia is made possible by the voluntary participation of a significant number of public libraries across Australia.


Economic analysis of literary publishing in Australia (Note: PDF)

From the Australia Council for the Arts website

This Australia Council-commissioned research report is a study of the economics of the Australian literary publishing sector. A priority area for the Australia Council is arts marketing and audience development to create a higher level of demand from arts consumers and to develop new audiences and readership. In turn, it is intended to help authors earn a living wage from their work. While there are reports into the nature of Australian publishing in general and the ABS provides statistics on the Australian publishing industry, there is very little specific data or information relating to literary publishing.


The Swinburne national technology and society monitor 2008 (Note: PDF)

From the Swinburne University of Technology website

The Swinburne National Technology and Society Monitor (SNTSM) provides an annual 'snapshot' of public perceptions of new technologies, science and technological change. The main findings include:
* Australians are comfortable with the rate of technological change in general, but the degree of comfort varies for specific technologies
* Australians trust scientific institutions and the non-commercial media for information about new technologies. They have little trust in government institutions, major companies or the churches. They have the least trust in the commercial media

Thursday, March 19, 2009

Notes from Joshua Porter's Webstock workshop on Social Design: from Strategy to Interface

Joshua Porter is the founder of Bokardo Design, an interface design and strategy shop focusing exclusively on social web applications. He recently wrote the book Designing for the Social Web. I was lucky enough to attend Josh's recent workshop at Webstock 'Social Design: From Strategy to Interface'. Here are my notes (all images grabbed from Josh's slides):

Strategy

Most businesses have strategy:

  • Corporate Strategy: concerned with the overall purpose and scope of the business to meet stakeholder expectations
  • Business unit strategy: concerned with how a business competes successfully within a speci!c market
  • Operational strategy: concerned with how each part of the business is organized to deliver value to the rest of the company
In a nutshell: how to gain competitive advantage and make money easier/faster than others.

So, instead of this (where customers must first sign up before creating their own page):

Social Design - user happy only after signup

Aim for this:
Social Design - user happy before signup
Characteristics of design strategy

  • Long-term thinking: create value for people over time, and they’ll give you their business
  • User experience first: the user’s experience is the most important thing of all, it drives all other decision making
  • Optimize for use: instead of planning how to make money, plan for how to increase regular, positive use of design
In a nutshell: how to gain competitive advantage by making software/products that people love to use.

Signs that you have a strategy:
  • The strategy doesn’t change much (target doesn’t move)
  • You have a long-term plan for success.
  • You know what not to do.
  • You know what to focus on.
  • You know why your design is different from your competitors.
  • You know what success looks like.
  • The strategy is the litmus test for everything you do; every activity supports the strategy.
Problems with strategies:
  • competing interests
  • political infighting
  • short-term thinking
  • buzzword bingo
  • no ongoing evaluation
  • fake strategies
Classic Question: Who are your users?

Better Question: What are your users doing?

Alternatives:
What do people have to do to make you successful?
What are you making people better at?
What are your users passionate about?

Examples:
1. Amazon
Primary Activity: Finding Good Products
Social design strategy: “through accepting preferences of customers and then observing their purchase behavior over time, so that you can get that individualized knowledge of the customer and use that individualized knowledge of the customer to accelerate their discovery process.”

2. Google
Primary Activity: Search
Social design strategy: By analyzing linking and behavioural patterns of web users, Google provides relevant search results (and relevant advertisements to display alongside)

3. Patients Like Me
Primary Activity: Treating disease
Social design strategy: By designing an application for people to catalog, monitor, & share their treatments and symptoms with each other, PatientsLikeMe can make living with and treating a disease easier.

4. Netflix
Primary Activity: Finding great movies
Social design strategy: By designing an application for people to record their movie watching preferences and aggregate them with others, Netflix provides a better way for people to find great movies.

5. Facebook
Primary Activity: Staying up to date/keeping in touch
Social design strategy: By designing an application where people can connect, record their activity, while seeing each other’s activity, Facebook provides an easy way for friends to stay up to date with one another

Define your social strategy - use The Commander's Intent (from
Made to Stick: why some ideas survive and others die by Chip Heath and Dan Heath)
“if we do nothing else, we must...”
“the single, most important thing we must do is...”
and, importantly,
“no plan survives contact with the enemy”
Defining primary activity & describing social design strategy:
  • What is the one thing that people need to do to make you successful?
  • How are you leveraging the social interaction of your users?
  • Do people have to change their behaviour much?
The AOF Method
Activities, Objects & Features
  • What is your primary activity?
  • What are the objects people use in that activity?
  • What do people do with the objects? (what are the verbs?)
A social object influences social interaction - it changes the way people behave. Mediates ties between people eg iPhone, flickr for photos, youTube for video – commenting, favouriting etc
The term “social networking” makes little sense if we
leave out the objects that mediate the ties between
people. Think about the object as the reason why people
affiliate with each specific other and not just anyone.

Jyri Engeströ
Research methods
  • Interviews: good to find out motivations and a general overview of what people do
  • Usability testing: good for testing existing interfaces with real people
  • Contextual Inquiry: good for diving deep into the details of activity, much of what you learn is unspoken
  • Boards, feedback, etc: good for finding the current temperature of things, how your software is being used and liked/disliked
  • Self observation: good for design in that you know a lot about what the issues are, but can be myopic at times
Find your verbs!

Social Design - find your verbs
Social websites = Objects + Verbs
Social features = When the verbs involve more than one person eg share, invite, add etc

For example, on Amazon's Product page there are up to 16 social features:
Product ratings, Share Your own product images, Add to wish/registry lists, tell a friend, People who viewed this...buy this, Amazon sales rank (social proof), Submit a product manual, Customers who bought this also bought..., Help others find this item, Tag this item, Rate this item, Customer reviews, Customer discussions, Offsite reviews, Listmania, So you'd like to...

Questions to consider when designing your main application screen:
  • What activities, objects, and verbs are you dealing with?
  • What verbs make obvious features?
  • What are you *not* going to support?
  • Are there features that you think might be better added later?
Metrics
Optimising your strategy by measuring progress & success

90/9/1 Rule - study by Yahoo found approx 90% of users lurk, 9% participate, 1% are leaders (create groups, invite users etc)

The relevancy of advertising = Holy Grail eg Facebook connect – can use FB login for different sites allowing more targeted advertising

Do not make the assumption that people will use your site every day. What is the reasonable return time to site? How many per month?

Do your own metrics – cross-site comparisons can be unreasonable/unhelpful as the niche is different.
Create your own baseline and work from there.

Unless we monitor how we are doing, we don’t know how we are doing – makes design decisions difficult – need to design so can get metrics

Social Design - The Usage Lifecycle

Metrics for Pirates (Dave McClure) - the AARRR model

Acquisition
How do you get people using your web application?
1. In-linking
2. SEO/SEM
3. Affiliate marketing (referral programs)
4. Mentions on blogs, in news articles
5. Social media/networks
6. Invites from passionate users (referral)
7. Emails to existing customers

Follow user behaviour – how many mentions in blogs/twitter etc, where do users come from?

Activation
What do people do the first time they use your app?
1. Sign-up/Register
2. Click on something!
3. Add/Invite friends
4. Respond to existing member
5. Watch tutorials/getting started content
6. Hopefully create something of value eg http://www.geni.com – signup process initiates family tree creation

Good example = freshbooks.com – signup is called ‘One Time Setup’, phone number visible on page

The drop off in users who register & never come back = HUGE

Retention
How do users come back?
1. Automated emails
2. RSS/news feeds
3. Bookmarklets/bookmarks
4. Desktop app/3rd party app integration

Cohort analysis = good tool ie how many users who sign up return after one month, two months etc

Referral
How do users refer others?

1. Email invite
2. Shared via Delicious/Digg/stumbleupon
3. Widget/embeds
4. Word of mouth

The viral loop - the loop of activity that happens between the time when a person becomes a new user and they invite others to join.
Methods:
· Word of mouth
· Embed a widget
· Mimic an action eg friends see new application added by a contact on facebook & try
· Forced Sign-up eg can’t see my pictures on myspace unless you register
· Direct invite

The final R is for Revenue, which we didn't cover.

Facebook white paper

Facebook have jsut released a white paper (not publicly available yet AFAIK - if you know where it is, please link in the comments) on the engagement of new users. They wanted to track all users who created an account on the same day. Highlighted that certain design elements increase behaviour & interaction.

How do people learn about new software? They came up with 4 hypotheses:
  1. Social learning – observe what someone does & copy the behaviour
  2. When people are singled out by their friends – eg when photos are uploaded – would people be more likely to upload their own photos if they had already been tagged?
  3. Feedback – if feedback is given does it drive more usage?
  4. Distribution – if a picture has wide distribution does it drive more usage?
Study found social learning had a positive effect on the early (those who uploaded photos within the first 2 days) & non-early uploaders

Singling out worked for the non-early uploaders

Feedback was a significant factor for early uploaders

Distribution was a significant factor for early uploaders

Social Design - the viral coefficient
The Conversion Funnel

Conversion Funnel is general rate of user movement through a known sequence. These are hypothetical numbers but roughly true! For a free service, only approx 8% of users will become regular users.
Social Design - the conversion funnel
Use this when there are discrete steps (eg registration form, landing page conversions, trials, sharing) in a known sequence, preferably within one session.

Tips for Improving the Funnel:

1. Create an engaging experience first!
2. Create a baseline from which to measure
3. Remove any unnecessary levels (screens)
4. Start at the top of the funnel
5. Go down the funnel level by level

Develop a one-page business model - pick a user type. What is that user's Activation, Retention and Revenue?

Social Design - One Page Business Model

The problem with metrics is that you get what you optimize for.


Questions to consider when defining metrics & their priorities:

· What activities make you money (keep you in business)?
· Does one metric rest on another, more important one?
· What sources will be valuable for acquiring users?
· Are there different types of users? (lurkers, contributors, passionates?)

Reputation
Your reputation is equal to the sum of your past actions within (a) community.
- Bryce Glass, interaction design lead for Yahoo Reputation Platform
The profile must fit the domain ie different profiles between Amazon, LinkedIn, Facebook etc. Don’t ask for information irrelevant to your community

Have multiple indicators eg friends, reviews, fans, number of people who find reviews useful etc e.g yelp.com

Optimise for value-added behaviour – just sending an invite is NOT adding value to the system. Can promote competition initially but remove it when it gets too much.

Allow for reciprocity eg comment on user reviews at Amazon, x of x people found this review helpful.

Create a community-specific identity e.g. Amazon’s ‘Real Name’ for reviews carries more authority. On ebay – reputation is independent of identity & based on data mining (history of activity)

What can’t you do? i.e. on Amazon you can’t rate as profile from the user profile page , only when in context of the product page. Try & stop the ability to game the system.

Reputation is an evolutionary process. Your site will be somewhere on the competitive spectrum – how competitive will your users be, or is it just helping each other out?

Competitive Spectrum= Caring -> Collaboration -> Cordial ->Competitive -> Combative

Problems with reputation

Social problems don’t have technical solutions (trolling)
Cumulative reputation can annoy
‘Competitive’ cumulative reputation can be good in the beginning stages of a community, though after some time it will be impossible for newbies to reach the heights of original members => dissatisfaction. Often best to just stop e.g. Digg’s removal of Top Diggers, or recalculate e.g. Amazon vs Harriet Klausner

Questions to consider

- What makes for reputation within your community?
- How valuable is knowing the history of someone?
- Are there different levels of reputation?
- How do you accrue reputation?

Finally, an excellent resource for all kinds of design patterns & code is the Yahoo Design Pattern Library It's well worth a look!

Friday, March 13, 2009

The Source: news about digital libraries and library innovations from around the web

Introducing The Source


Industry that pays, and art that doesn't (Note: PDF)

From the Griffith University website

This paper considers how and why we must build a resilient creative society capable of nurturing artists beyond the boundaries of the creative industries.


Cultural and linguistic inclusion? Literature review on social inclusion, cohesion and culture (Note: PDF)

From the National Ethnic Disability Alliance (NEDA) website

This paper explores the definitions and measures of social exclusion, social inclusion and social cohesion through an analysis of literature and indicators from Australia and the United Kingdom. It also explores how measures of cultural diversity can be built into understanding and measuring social inclusion in Australia.


An Awfully Big Adventure: Strathclyde's Digital Library Plan

From the Ariadne website

Derek Law describes how the University of Strathclyde is choosing to give priority to e-content and services instead of a new building.


Time to Change Our Thinking: Dismantling the Silo Model of Digital Scholarship

From the Ariadne website

Stephen G. Nichols argues that humanists need to replace the silo model of digital scholarship with collaborative ventures based on interoperability and critical comparison of content.


Supporting eResearch: The Victorian eResearch strategic initiative

From the Ariadne website

This article reports on VeRSI and its aim to accelerate and coordinate the uptake of eResearch in universities, government departments and research organisations within the State of Victoria.


Innovative learning measures for older workers (Note: PDF)

From the European Centre for the Development of Vocational Training (Cedefop) website

Addressing the issue of an ageing European workforce not only requires public socioeconomic measures to promote the employment of people over their life course, but also the commitment of workplaces to ‘age management’. From an organisational perspective, continuous learning and development are necessary for survival in increasing competitive markets but they also have an impact on the quality of working life and its attractiveness from the point of view of workers.


First look at new 'green' library

From the BBC website

Funky furniture, listening hubs, a grass roof and a mini grand piano - no it's not the latest Big Brother house but Cardiff's new library. The six storey building is a world away from the dark, dank and dusty libraries of old and has already been rewarded for its green design credentials. The idea, according to Cardiff council, is to make the library as attractive and welcoming as possible to visitors.


2009 Digital Music Report

From the International Federation of the Phonographic Industry (IFPI) website

The International Federation of the Phonographic Industries has released its latest report on the state of digital technology in the music industry. While the report finds that the industry has succeeded in changing its business models, its biggest challenge is still illegal music downloads


Building Australia’s Research Capacity (Note: PDF)

From the Parliament of Australia website


High quality research training is essential for a sound innovation system in Australia. This inquiry aimed to identify the key flaws in the current research training system and this report suggests measures to remedy those flaws.


Finding Context: What Today's College Students Say about Conducting Research in the Digital Age (Note: PDF)

From the Project Information Literacy website

A report of preliminary findings and analysis from student discussion groups held on 7 U.S. campuses in Fall 2008, as part of Project Information Literacy. Qualitative data from discussions with higher education students across the country suggest that conducting research is particularly challenging. Students’ greatest challenges are related to their perceived inability to find desired materials. Students seek “contexts” as part of the research process. A preliminary typology of the research contexts is developed and introduced. Our findings also suggest that students create effective methods for conducting research by using traditional methods, such as libraries, and self-taught, creative workarounds, such as “presearch” and Wikipedia, in different ways.


The Science Commons

From the Digital Curation Centre website

Many readers will be familiar with Creative Commons, its ethos and the suite of licences it provides. An organisation they may be less familiar with is Science Commons, a branch of Creative Commons that aims to make the Web work for science the way that it currently works for culture. It is a non-profit organisation aimed at accelerating the research cycle which they define as "the continuous production and reuse of knowledge that is at the heart of the scientific method." Its work is of relevance to anyone within the scientific cycle looking to reduce legal and technical barriers to research and discovery.


New Opportunities: fair chances for the future (Note: PDF)

From the Official Documents website, UK House of Commons

The truly global economy of the 21st century brings new opportunities and new risks. The way the financial crisis has swept across every economy in just a few months has underlined how interconnected our world now is. But beyond today’s global slowdown lies a world of new opportunities for which we must prepare. If we put in place the right foundations now, the prize is not just a richer country but also a fairer society. By positioning the UK successfully to grasp these opportunities, we can generate a new surge in social mobility, characterised by more and better jobs being available and everyone having a fair chance to access these jobs and fulfil their talent.


Classifying Tags using Open Content Resources (Note: PDF)

From the Yahoo! Research website

Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data.

Tuesday, March 10, 2009

Streaming our photos to your screensaver or desktop

Today is my very sad last day, and while tidying my desk I realised that there was a capability in Manuscripts & Pictorial that had never been explored, and if I didn't blog about it right now it might slip beneath the waving goodbye and be lost forever.

The capability is humble enough:

Every time you do a search in Manuscripts & Pictorial, you can retrieve your results as a Media RSS file by clicking the
Get these results as RSS at the bottom of the search page.
What use is this? I hear you ask. Well it turns out that there are screen savers and desktop backgrounds that will happily take this RSS feed and display the images on your screen, as a montage or in sequence, for your enjoyment and education.

The URL looks like
http://mp.natlib.govt.nz/rss/?numResults=100&f=collection%24Heritage+Images&q=sea
where sea is the query term you would like images from.

You can, of course, put any word or words in place of sea based on your interest. With about 60,000 images, you'll be unlucky not to find any results.

For users of Windows XP and Vista, I recommend the free Google Photos Screensaver which is part of the Google Pack.

If you want a slide show that cycles through your results in the background for a kiosk or presentation, the full screen mode in CoolIris is a good bet. Just do a search in M&P, click the CoolIris icon in the top right corner, double-click an image and click the play button.

Sunday, March 8, 2009

Designing & Sustaining Creative Communities - notes from the Webstock workshop

So, I meant to get this out during Webstock week, but late is hopefully better than never. Here are some notes & thoughts from Heather Champ and Derek Powazek's workshop 'Designing and Sustaining Creative Communities' (with thanks to Douglas, who has shared his notes with me).

[Any misinterpretations or muck-ups are my own].

Right now, the National Library is more involved in what I'd call communities-within-communities, like our @NLNZ twitter account and taking part in The Commons on Flickr, than occupied in setting up communities ourselves. However, I think it's safe to say that the immediate future will see our involvement in community sites growing.

Before we kick off though, here are two of my favourite articles about community wrangling:

Community: From Little Things, Big Things Grow, by George Oates

Building an online community at Brooklyn Museum: A timeline
, by Nicole J. Caruth and Shelley Bernstein

A definition

Web communities happen when people are given tools to use their voice in a public and immediate way, forming intimate relationships over time.

The essential questions to ask yourself

  • Who is the site for?
  • What can they do?
  • Why will they want to do it? Without offering benefits for the people who join, why would a vibrant, long-term community grow?

The building blocks of community sites

Aside from the contributions from the workshop attendees (an unusually vocal bunch for a Kiwi crowd; I think we're getting braver) this was the most valuable part of the session for me, so I'm going to focus on it here.

The workshop gave me a real appreciation for the hard thinking you need to do before you unleash your site on the world. These are the foundation garments of your community: not sexy, perhaps, but underpinning and holding up everything that's layered on top of them.

1. Privacy policies and Terms of Use/Service

The Twitter privacy policy was recommended as a thorough policy with a good human interface.

Heather and Derek noted that as people are getting more experienced online, they're reading privacy polices and terms of service statements much more closely and out-cry can follow if people are unhappy with what they see (or, I guess, think they see).

Creating these policies means finding a balance between protecting your assets and respecting the people who are choosing to contribute to your site. You will need to get lawyers involved, you will need to advocate for your users, and you will need to keep revisiting these over time.

You must also make these policies easily available to people who are visiting your site but who are not (yet!) members. People should never have to log in before seeing this kind of information.

In her presentation at the conference Heather expanded on these points - check out this article for some of the detail.

2. Copyright and ownership

10 years ago, people thought that everything that was on the web was "free". Now, people are much more savvy and will read ToS closely to see what you're doing with their content (and remember - this is from photos they upload to comments that they leave) and what others are allowed to do with it.

An ongoing issue is that people will share material that they don't own the copyright to. Have clear take-down policies that make it easy for people to make a complaint and enhance trust on the site. Part of the trust is vetting the requests for take-downs, as this system can be abused.

Another part of the trust is understanding that there can be a difference between what's 'legal' and what's 'right': what you can do and what you should do. Which leads nicely into ...

3. Community Guidelines

When a site is small, it's relatively easy to model behaviour. As it gets bigger, this gets harder - and that's where community guidelines can set the tone of the site and act as the human face of Terms of Service and Terms of Use statements.

I've long admired Flickr's community guidelines, especially the immortal "Don't be creepy". It's pointedly more of a "do" list than a "don't do" list, and it has won the community over, to the point where Flickr members enforce the guidelines themselves. Heather and Derek emphasised that you can't make guidelines that will foresee every possible permutation of behaviour, and that's why it's important to treat these as evolving documents.

4. Abuse Grid

This gem by itself made attending the workshop worthwhile for me. Before you launch, sit down and think of all the things you will not tolerate on the site. Now draw these up in a spreadsheet.

-> The first column is the "bad thing".

-> The next column is a detailed description of said "bad thing" that people can use to identify whether the thing they suspect is bad is, indeed, that same bad thing.

-> The final column sets out exactly what site administrators should do when they spot a bad thing. This includes whether warnings are given, what happens to the content, what happens to the account, what changes if it's a repeat violation, whether any external agencies need to be notified of this behaviour.

-> Bear in mind you may need to run this past a lawyer.

An abuse grid has two major benefits. Firstly, it means that if a site has a number of administrators, they're all following the same system of identifying and reacting to badness. Secondly, it prevents you from being forced to make policy on the fly during a crisis.

Design and structure

Design can set the tone and use of the site (it's true! Derek cited all sorts of studies - check his site for links) so you need set your design to the tasks you're trying to encourage and support. You've got to keep reminding yourself that you're designing a tool, not just web pages.

Barriers to entry

Derek noted that there's a strong drive in web design towards inclusiveness, but that communities by their very nature are exclusive (I guess maybe that's the difference between "population" and "community") . The boundaries can be set using design.

We need to think carefully about where we set barriers to entry in all elements of the design, from demanding certain versions of certain browsers be used to the amount of information you ask for during site registration. There's a temptation to set the barrier very low, for the sake of user-friendliness and also to get stuff happening on a site; however, this risks inviting a pretty crappy level of interaction. Set the barrier higher and you're likely to get higher quality contributions, but you might put people off along the way (I had an experience this year where I was asked to hand over my full name, email address and date of birth in order to even see the homepage of a site that I was being encouraged to join.) There's also the fact that community sites evolve: you may set the barrier lower when the site is new and your community managers have more scope to individually interact with new members, setting the tone through human contact; as the site grows and the ability to be this hands-on decreases, the barrier may go higher.

The wisdom of crowd - with a nod to James Surowiecki

Surowiecki suggests 4 elements that define wise crowds ...

  • Diversity (of people, opinion and input: homogeneous groups often fail)
  • Independence (you offer up your own thoughts, and don't feel compelled to agree with the group)
  • Decentralisation (there isn't a top-down authority that drives the group)
  • Aggregation (where community sites often fail - not enough is done to find commonalities).
... and interface is everything when it comes to encouraging these elements on your community site.

To help yourself out:

Give people small, discrete tasks to make crowd-sourcing work

Derek gave the example of the Assignment Zero project that tried to crowd-source articles for an "experiment in pro-am journalism" (read a full recap here). After a disappointing result, the site's managers realised they had asked too much of the community, and tried a new tack, asking for suggestions for a list of people to interview for the project, and then asking people to carry out the interviews - a task that was more enthusiastically picked up.

Avoiding groupthink

When site members put the groups needs and opinions ahead of their own, they stop speaking in their own voices and definitely stop saying or doing anything that might rock the boat. This can cause a site to stultify (and, if you work at NASA, might lead to a shuttle disaster).

Online communities are self-selecting: like tends to attract like. To prevent a site from getting too homogeneous you need to design for a diversity of members; to bring in new members and to support minority opinions.

Design for selfishness

Derek noted that people tend to participate for selfish reasons, but that this can be good in a wisdom of the crowds fashion. For example, people don't create hyperlinks for altruistic reasons, but when aggregated the links support Google's pagerank algorithm. Likewise, people tag in Flickr for 'selfish' reasons, but when aggregated these tags become a powerful tool.

For this reason, asking yourself "what is the selfish reason for participating in this site" is a key early question in the creation of your site.

Scores create games

Once you assign a score to an action or a judgement, you create a game; and once you create a game people will want to play to win, in ways that may be detrimental to the overall health and enjoyability of the site. Design decisions can encourage or discourage gaming behaviour.

Derek used the Heisenberg uncertainty principle as a metaphor for the challenge of surfacing interesting things happening within the community without unduly influencing behaviour, and suggested design solutions that can be used to prevent this.

Favrd has a very fast decay, which stops the leaderboard from being hijacked.

Online polls reveal tallies only after you've voted; Threadless witholds voting tallies until the voting period is finished.

Introduce some randomness (Flickr's 'interestingness' algorithm) to prevent gaming.

Allow yourself some curatorial control, and bring back the human element to presenting content (think of the Flickr blog; or conversely, the often disappointing nature of 'most viewed' and 'most shared' lists of articles on newspaper sites).

The Brooklyn Museum's Click! exhibition (an experiment on crowd-sourcing and crowd-selecting photographs for a show, which both Derek and James Surowiecki consulted on) is a fascinating experiment in attempting to minimise influence. I really recommend this series of posts on the project (kicking off wth Surowiecki himself).

If you do use an algorithm, Derek advised testing thoroughly before release and then tweaking as necessary.

Community Managers

As Heather said in her opening to this section of the workshop: being a community manager is like being a piñata: people will beat you with sticks and you still need to give them candy.

Finding and supporting community managers

Community manager roles should be separate from customer care. These are the people who set the tone for the site, not the people who help solve problems with uploading or browsers.

Community managers need to have good judgement, to be diplomatic, to have a sense of humour and thick skin. They also need to be active users of your site, meaning that recruiting community managers from existing users is often a sensible move.

When you have more than one community manager, there needs to be consistency in their behaviour. Flickr describes the tone of its community managers as "human, friendly, inclusive, authoritative, transparent, honest, witty, funny and clear". Give your community managers guidelines for behaviour in situations which often crop up (e.g. outages, or changes to the site) so they have something to fall back on.

Taking it offline

Private communication is one of the most valuable (and often forgotten) tools at a community manager's disposal. Think about it: if one of your friends was being a dick at a party, you probably wouldn't get him thrown out - you're more likely to have a quiet word with him. Use back-channels to thank people who are helping you out, or to check in with people who are being jerks.

Offer members tools to self-moderate

Tools for flagging inappropriate or suspect content are common on community sites; tools for reporting problems can help site administrators notice when something is broken (without everyone running to tell the community manager about it).

Tools for managing your interactions with others on a site can also help people define their own experience, and add finer levels of control so that your community guidelines can stay at a more general level. Tools for members to block other members are a good example of this: they allow people to stop other members who they don't like from interacting with them, whilst relieving you from having to negotiate on an individual-by-individual level. You don't need to notify people when they've been blocked (and admittedly, this would be like sending someone an email reading "Hey, guess what? Bob thinks you're a freak and has banned you from contacting him") but do make an explanation available (couched in "it's not you, it's them" terms) if people want to follow up on what's happened.

Reporting problems

Make it easy to report problems and things such as copyright violation. Have consistent footer links throughout the site, and let non-members use these as well.

Heather recommended controlled lists of options for complaints/problems forms, and that you ensure forms are robots nofollow. Include a service level agreement so people have an idea of when their complaint is likely to be acted on.

Bubble up the good

Example content sets a tone far more effectively that terms and conditions statements or community guidelines ever could. Bring good and interesting content to the fore; shine the spotlight on community members (if they're willing!) through mechanisms like the member interviews on the Flickr blog.

Help forums

Member forums can reduce the workload of answering individual requests, but can also run the risk of getting out of hand as members use these forums as a place to express anger with the site. Make sure it's also the job of someone on the tem to get into the forums regulalrly and help things along (without being all pushy about it).

You can also consider outsourcing this using a site like Get Satisfaction.

Owning up to your mistakes

There is an expectation of openness and transparency on community sites today. Things will occasionally go wrong on your site, and when they do, it's best to own up, explain, and apologise. If it's a problem that's hanging around for a while, post clear and timely updates.

Managing change

If you need to make a significant change:

  • Announce
  • Be clear
  • Allow 6-8 weeks for people to get prepared
  • Offer an out-option if possible
  • Make the change.

Have a strategy for the change: get together and plan out worst-case scenarios and how you'll deal with them (being cognisant that no-one can always second-guess humanity's great inventiveness).

Immediate feedback on changes is likely to be knee-jerk and negative(or perhaps just par for the course); over the next few weeks more considered responses are likely to filter in. You can give yourself a hand here by identifying influential community members and getting them to trial soon-to-be-released features: this can generate good-will and excellent feedback.

Wrapping up


Phew. Writing all this up reminds me how packed the session was with great advice, insight, and awesome contributions from the floor (we were an unusually talky bunch). I haven't even covered off the last two chunks of the day: managing trolls, and the fascinating, difficult-to-describe phenomena that is the merging of the digital and the physical world (think papercamps, Arduino, Flickr meet-ups, online mags that are moving into print-on demand, bridges and toasters that Twitter ...)

So my apologies that I'm not going to get this all down. I really recommend these other reviews of the workshop, which contain material I'm likely to have missed or glossed over, and different points of view.

Dean Stringer, Waikato University Centre for eLearning


Julie Starr, Evolving Newsroom

Sarah Jones, Lunchbox: software and digital media for learning

Friday, March 6, 2009

The Source: news about digital libraries and library innovations from around the web

Introducing The Source

The Future of the Book (Note: Podcast)

From the OCLC website

Print books or e-books? Uplift or download? Writers and readers or interactive interchange? What is your view on the way that changing technologies and life styles are affecting books, publishing, information and the way that we read? In November 2008, the network of Edinburgh-based libraries (ELISA) organised a panel discussion and open debate on this very subject as part of Edinburgh’s Festival of Libraries. A panel of five very well-informed people working at the cutting edge of their respective professions, presented and discussed the issues at stake from a wide range of perspectives.


Digital repository development (Note: Videos)

From the Scholarly Publishing & Academic Resources Coalition
(SPARC) video channel

Experts and advocates examine the state of the art in digital repositories. The video series was taped in November 2008 and underscores the central role of repositories across library services. Particular emphasis is placed on the added value they contribute to the institution and on the importance of funding repository development even in lean economic times. The clips feature three full-length plenary addresses plus seven short interviews with leading-edge repository implementers.


Information Seeking Behaviour and User Satisfaction of University Instructors: A Case Study (Note: PDF)

From the Library Philosophy and Practice (LPP) electronic journal

Information-seeking behaviour remains an important research area. Libraries and other information providers strive to understand users’ information needs and how they try to fulfil these needs. This understanding helps design and offer appropriate user-centred information systems/services. In the digital era, research on information-seeking behaviour has taken on even more importance worldwide.


Top Web 2.0 Security Threats
(Note: PDF)

From the Secure Enterprise 2.0 Forum

This document outlines web application security threats unique or typical to Web 2.0 and should serve as a guideline for assessing risk in Web 2.0 applications.


Guidance on the Management of Controversial Material in Public Libraries

From the Museums, Libraries and Archives Council (MLA) website

This publication provides support for public libraries in making difficult choices when managing books, information and internet content that may be deemed to be controversial. The publication addresses issues that may impact on the public library’s responsibility for the selection and provision of information by outlining current legislation on terrorism, local government, freedom of expression and human rights, race relations and equality.


What Today’s College Students Say about Conducting Research in the Digital Age (Note: PDF)

From the Project Information Literacy website

A report of preliminary findings and analysis from student discussion groups held on 7 U.S. campuses in Fall 2008, as part of Project Information Literacy. Qualitative data from discussions with higher education students across the country suggest that conducting research is particularly challenging. Students’ greatest challenges are related to their perceived inability to find desired materials. Students seek “contexts” as part of the research process. Our findings suggest that students create effective methods for conducting research by using traditional methods, such as libraries, and self-taught, creative workarounds, such as “presearch” and Wikipedia, in different ways.


Making a Library Catalogue Part of the Semantic Web (Note: PDF)

From the National Library of Sweden

Library catalogues contain an enormous amount of structured, high-quality data; however, this data is generally not made available to semantic web applications. In this paper we describe the tools and techniques used to make the Swedish Union Catalogue (LIBRIS) part of the Semantic Web and Linked Data. The focus is on links to and between resources and the mechanisms used to make data available, rather than perfect description of the individual resources. We also present a method of creating links between records of the same work.


Archiving the Web: Does Whole-of-Domain Archiving = Information Overload?

From the ALIA Information Online 2009 website

This paper, presented at Information Online 2009, presents a study comparing results of searching the whole of Australian domain harvest 2007 undertaken by the National Library of Australia in 2007 and selective archiving in the PANDORA web archive. The authors explore the question of the value of whole domain harvests compared to selective archiving.

Monday, March 2, 2009

Subversive git

I’m finishing up at the library in just over a week, so this will be my last post to LibraryTechNZ, and I intend it to be mercifully brief.

But I want to touch base with you all, to thank you for being such a great audience, and to say how much I've enjoyed this foray into blogging on digital libraries.

To recap, my "primary responsibility" over the past couple of years has been the digitisation of the Donald McLean Papers and the development of a website to host them. The project, our approach, and the results are described in some detail in the presentation that David Colquhoun and I gave at LIANZA last year. Check it out if you’re into that kind of thing.

But what I want to talk about today is version control, and how it can bring you peace and serenity.

Subversion and git are two free open source version control systems. I’m going to describe subversion, but feel free to use git or mercurial if it pleases you.

First, I guess I should admit that not everyone needs version control in their day-to-day life. In fact, if you never work with data or metadata, you can sign off right here.

The rest of you either already know about the magic of version control, or I’m doing you one huge favour by cluing you in, right now. Either way, read on.

With version control, you never need to fear questions like:


  • Where are the latest versions of all the files? (A: "They're in the subversion repository.")

  • Can I edit them? (A: "Sure! Just check them out, make your changes and commit them back in, with a note describing the change.")

  • Where are the versions of all the files signed off by the steering group? (A: "Just check out the steering-group-approved-march-09 tag.")

  • What changes have been made since then, and why, and by whom? (A: "View the log and do a diff.")

  • Can we make a slightly different version of the files for xyz? (A: "Of course! Just create a branch.")


Version control is just so elegant and right that when you say "the files are all in the subversion repository" it's like saying "the water is in the tap", except it's better because you decided to put the water there and it's obviously where it belongs.

So how does it work?

  1. To start with, you'll need a "subversion repository". If there isn't one available to you in your organisation, you can ask your technical people to set one up, find a hosted solution or even go ahead and install one yourself on your PC.

  2. Then you import your files into the repository. From that moment on, you can breathe a sigh of relief and say "All the files are under version control. ftw."

  3. You then check out your files to a local working directory. I’m a Mac user, so I tend to check out files to a folder on my cluttered desktop, but you can put them wherever you like.

  4. Now that you have a local working copy of the files, you can edit them and work with them just like you always have. The only thing is that every time you make a significant change to a file, you should commit the change back to the repository. There’s no law about how often you do this but it’s Good Practice to commit your changes frequently.



With this small investment of effort, you can achieve magic. Because unlike in space, under version control nothing is ever lost™. Provided you back up your repository, of course.

What kinds of files belong in version control?
Both text files (.txt, .xml, .conf, rtf, etc) and binary files (.doc, .odt, etc) can be kept under version control. With text files you can see the exact changes that were made to which lines of the file, whereas with binary files you only know that the file changed.

In the McLean Papers Digitisation Project we used version control for:

  • all the TEI (Text Encoding Initiative) xml full text transcriptions and translations

  • the prototype delivery system scripts in php

  • a snapshot of the mysql database as a mysqldump .sql file

  • the database schemas

  • the solr configuration files and schema

  • the java tomcat delivery system and configuration

  • the apache reverse proxy config httpd.conf

  • various xsl files to do a variety of unholy things

  • and much, much more!


Revisions
Every time you commit a file, it gets a new revision number. The previous version can still be retrieved if you ask for it by revision number, but by default you get the latest version.

Deleting
You can delete files, but you can also retrieve the previous, undeleted version, if you ask for it by revision number. This means nothing is ever lost, but things don't get cluttered either.

Repository layout
It's common to structure your repository as:

  • trunk

  • branches

  • tags


The trunk is where the latest main version is kept. If you just want the latest config files or whatever you're using the repository for, trunk is the natural place to go.

The branches are where you or others can take a version of your files and develop a new version for some purpose, without affecting the trunk.

The tags are where you record a particular set of your files as being a "release" or an approved version. In the example above, the steering group approved a set of the configuration files in March. We need to still be able to retrieve the exact files they approved, as well as being able to work on the files to fix all the issues they missed, and keep track of these changes. When we tag a set of files as steering-group-approved-march-09, we capture a moment in time forever, allowing anyone to download those exact files even while we continue to develop and change them.

Interested? Go read the excellent and free online book and discover for yourself the peace of knowing where all your files are, what's been changed, when and by who, and to be able to work on your files at the same time as other people without messing things up for each other.

Subversion, for a better world. Bye for now.