In part one of this two-part series about how we went about building an Alpha website for Wellcome Library, we looked at how we turned ‘subject headings’ into webpages.
This post looks at the second major type of aggregation pages we settled on: people.
At first we were tempted to refer to these as ‘authors’, using the language of books, but of course the library isn’t just books, and so sometimes the people might be editors, collaborators, artists (the library has an art collection too), scientists credited on academic papers, and so on.
Within the MARC metadata we were given, people are referenced mostly in the 100 field (‘Main Entry–Personal Name’), but also in the 700 field (‘Added Entry-Personal Name’). As far as we could make out, there’s only ever one person in the 100 field (with only a couple of exceptions), but there could be many in the 700. It wasn’t clear to us what the semantic difference was, so we took the decision to merge them all together.
Each person field contains a bunch of sub-fields for the person’s name, title (Mr, Mrs, Sir, etc) and dates (normally just birth and/or death), as well as some other lesser-used sub-fields like ‘numeration’ (e.g. the ‘II’ in Pope John Paul II) and ‘attribution qualifier’ (used for describing someone as the ‘pupil of’ an artist, when the actual artist is unknown).
One awkward stumbling block was that the name of the person followed the library tradition of being in ‘surname-comma-firstnames’ format. This convention makes it easy for computer systems to sort by surname, which historically has probably been the most useful order for readers. But we felt strongly that it is the least user-friendly way of actually reading people’s names, as it inverts the natural order of the way we pronounce people’s full names (no-one talks about ‘Hawking Steven’, but ‘Steven Hawking’ is a household name). Switching the order back sounds like a simple task (split the string at the last comma, then reverse the order), and mostly is, but there are always exceptions – and where we encounter strings like “Peter, of Celle, Bishop of Chartres,ca”, it’s a bit harder to turn these back into more readable names.
With our goal being to make the library catalogue browsable (rather than just searchable), our next task was to find ways to enrich the information about the people in the database, helping readers to find out more about them (which may in turn shed some light on what the content of the book is likely to be).
Like with subjects, many of the 100 and 700 people fields contain an ID linking the person to an external authority file. Unlike with subjects though, we only encountered a single authority file in use: the Library of Congress Name Authority.
Where they existed, we could use these IDs to make sure that multiple books by the same person would appear on the same single person page, even if their name was spelt out or punctuated differently on the different records.
It would have been tempting to use these Library of Congress IDs within the URL structure of the Alpha site. But because they weren’t always present (either because that person isn’t in the LOC authority file, or just because the record has been matched up), we couldn’t do that, and so minted our own IDs instead. For simplicity’s sake, these are simple numbers, but preceded by the letter ‘P’ (for person).
We discovered an existing project called VIAF, which aims to link together name authority files from many different institutions across the globe. By querying this database with the Library of Congress IDs, we collected up all the other IDs that were available. This means we can construct links from the people pages on the Wellcome Library website to the equivalent pages on other catalogues, such as the national libraries of France, Germany, Spain, Canada, and many more.
Pleasingly, VIAF has also collected IDs referencing Wikipedia pages. As Wikipedia allows others to uses its content under a Creative Commons licence, we could query the site (using its API) and display the content on our person pages. We decided to display the first two sentences (with a link to Wikipedia to read the full biography), on the basis that that’s usually enough information to get a sense of what the person is mostly known for. We also removed any text from Wikipedia in parentheses, as these are normally dates (which we show elsewhere), a pronunciation guide to their name, or other minor details that weren’t needed for a quick read.
As well as text, we also collected the images from the Wikipedia page, and use the first one (if there are any) within a circle to illustrate the person on both their person page and aggregation pages. This mostly works – where it’s a photo or drawing of the person, or even if it’s a scan of one of their works – but does sometimes show a slightly misleading image.
There was a small amount of concern over using Wikipedia as a source of content (although most were positive). One issue is what might happen if we pull the content from Wikipedia at a point in time when that page has been vandalised. We could mitigate that to some extent by regularly updating our content on a rolling schedule (and relying on the community to resolve) – but to allow for any major issues to be resolved more quickly than that, we added an admin feature to immediately refresh the content from Wikipedia. So if someone at Wellcome spots a page where the Wikipedia introduction is inaccurate or contains vandalised content, they can fix it on Wikipedia itself, and then have those changes reflected on the Wellcome Library page.
As well as the Wikipedia intro, we added a feature allowing Wellcome staff to add a separate intro to be displayed alongside it. Our rule of thumb here was that this intro should be specific to the Wellcome institution, rather than repeating the sort of general information that might be on a Wikipedia biography. So things like that person’s relationship to Wellcome (e.g. if it’s Henry Wellcome himself) or noting what sort of material from that person was available at the Wellcome Library (which could be quite a lot, if it’s one of the people whose personal archives are held there).
After these context-setting introductions and photo, we display some data about that person collected from the catalogue itself: things like the subjects their works are mostly about, a timeline of when their works were created/published and what format their works are mostly in. More experimentally, we tried displaying some links to other people who are the “contemporaries” of that person. This query changed a few times as we tried to refine it, and ended up being something along the lines of “people who have produced works about the some of same subjects and who were born within 10 years”. It sometimes works well, sometimes doesn’t.
Finally, we added the ability to highlight ‘interesting’ people to appear on the homepage.
Our last and most recent step was to go back and use an additional type of metadata that we originally missed: field 600 which contains people, but who are the subject of a work rather than its creator. Pleasingly for these ‘person-as-subject’ pages we could re-use the simple URL structure for subject pages (/subjects/S1234) but replacing the S-number for the person’s P-number. (One key benefit of differentiating your IDs for different types of things).