We’ve been working on a new project called ‘What’s In The Library?‘, installed at the Wellcome Trust in the Library web team, and today we’d like to try something new. Even though we’re only half way through the project, we’re sharing it with you as is. You can also read the project blog, which we’ve maintained since the beginning of the work.
We’re set up to explore four main themes over four weeks:
- Week 1: Scope of the Collection
- Week 2: Show The Thing
- Week 3: Content and Context (we are here)
- Week 4: Scalability.
As every good project should, we began with a short scoping phase on site at Wellcome, where we dug into everything we possibly could. It’s so important to hit the ground running at the beginning of prototyping work, and one of the ways to do that is to make sure you have your working data in hand, in this case, a static MaRCXML dataset of some 962,701 records. Thanks to Joao for making that happen! We also took an escorted wander through the stores and stacks. It’s always a thrill to go behind the scenes and see how much effort and skill goes into preserving our heritage.
The Wellcome team knew early on that they wanted to explore those four main themes, so ended up using that brief to structure the whole project. It was exciting that we’d have an opportunity to extend some of the ideas and themes we’d already been working on in previous R&D projects too. The V&A Spelunker was very much about showing the scope of the catalogue, and Two Way Street showed the things in the British Museum in a completely new light. Our emerging practice around “no search box, and what happens when you don’t allow yourself one” is represented here too. We’re trying to show the shape of the thing, in a variety of dimensions.
MaRC is weird. I’m sorry, but it is.
Week 1: Scope of the Catalogue
Even though I’d been thoroughly doused in MaRC in my work at Open Library, the rest of the G,F&S team — Frankie Roberto and Tom Stuart — were new to it. So, we used that naivety to the full, and approached the big blob of MaRC with neutrality and fresh eyes. I must admit to encouraging the guys to not worry too much yet about the tendrils and foibles of it, and to try to keep the utter uncontrolledness at a distance, since therein madness lies! Ahem. So, what do we start with? Showing counts of things, ordered lists of things, draw everything, show whatever’s there to help you get a feel for the shape and grain of it.
We quickly spotted that there is also more than one subject classification scheme at work: at least the widely-used Library of Congress Subject Headings (LCSH), Medical Subject Headings (MeSH), and the Barnard Classification System, specifically for the history of medicine collection. Then there are the internal knobbly classification bits, resulting from individual cataloguers, historians and archivsts, and their own personal styles.
(We also ended up writing a MaRCXML to PostgreSQL ingester available on Github, if that’s your bag.)
Then we started drawing pictures of the data, exploring overall MaRC field usage. We were thinking about a theory that the amount of characters in any one field could indicate some blunt level of quality. This turned out to be not useful (arguably because of the somewhat unruly data).
Here’s a map of all the MaRC fields used across the dataset (184 in total). The black cells indicate the fields that are used most, which, at Wellcome Library, turn out to be mostly ID-related.
Tom wrote a great post on all the other visualisations we were making in Week 1. There’s a lot to look at, like this page that summarises the metadata about Daniel Defoe, and we hope you can’t find any dead ends, but instead find yourself tumbling around the MaRC.
I’ll never forget Open Library advisor Karen Coyle‘s observation that library metadata is diabolically rational. It’s a constant quest to put things in boxes, and agree on the boxes, make new ones when you need them, or appropriate other people’s boxes for your own purposes. In my experience, when you make humans do that that, you get mess, not order. (Incidentally and possibly in contradiction to that, I was interested to see that Julian Assange is now talking and thinking about cultural diversity and digital colonialism, but that’s another story, or blog post at least.)
Breaking Open a Cultural Dataset
Week 2: Show The Thing
This week was about just what it said on the tin. We tried to show as many things as possible, using two basic dimensions, subjects and time.
You can see some attributes of this specific catalogue, like the giant peak in 1800 (result of a specific collection of 18th century books), and dips in the publishing history in general in the two world wars last century.
There are more lists of subjects and orders and views, now linking through to digitised materials where we can, instead of surfing the data structure itself. Here’s a path to a book about animals and medicine:
Search Logs are Fascinating (and anonymous)
I think there’s a bit of a myth surrounding the power of search in cultural collections, and in particular, this odd idea that people always know what it is they’re looking for, even if they’re looking at a collection for the first time. It’s clear as a bell that this isn’t really the case when you look at the search logs for the Library’s search box. About 98% of the search terms are broad, like images, alchemy, shell shock, medicine, art, anatomy and the like. Lots of the search terms are medical in nature, which is good. That tells us that most people know Wellcome is a medical collection. There are also some fun ones like cafe, jobs, and OCLC, which might show that when people see search box, they don’t care where it’s pointing.
Richness, Context and MaRC
Week 3: Content and Context
We’re on the Tuesday of Week 3, so there’s not much to show yet. We’re taking a step back from the computers and metadata to a certain extent, and coming back to humans and what they know about the things in their collection. We’re working on hand-crafting an interesting and deep webpage about James Gillray, famous London satirist and caricaturist alive from 1756-1815.
Much more than a MaRC record, and something that helps people connect with broader themes he represents, and to dive around in the rest of the collection, and even pop out into the broader web.
The hope is that this work might inform a content development approach for Week 4, when we try to scale some of what we learned up to work across the whole catalogue. Not sure if that’s going to be possible yet, or not.
That’s it then. There’s a ton of stuff to read on the project blog, and of course you can click around in the project site — please be gentle and have low expectations of performance! — it’s prototypical code, and not tested by more than a handful of people.
Thanks to Jenn Phillips-Bacher, Alex Green and the team at Wellcome for encouraging us to go public with this work in progress. Jenn has also blogged today over at the Wellcome Library blog.
Now we’re all curious to hear what you think!