Archive

Monthly Archives: January 2015

This is probably a stupid political move, but…

This morning, I tweeted somewhat off-handedly that “It’s almost like we need funding to take the chance to win the @nesta_uk Heritage and Culture prize.” I thought perhaps it might bear a little more explanation and expansion, and hopefully discussion.

Good, Form & Spectacle is tiny and new. There’s about £4k in the bank. Nonetheless, we’ve bashed out three (internal) projects now, and collaborated with six different people. I worked on a short client contract towards the end of last year, the proceeds of which have gone towards rent and paying my collaborators in cash or lunch or coffee. Did I mention it’s tiny, and new? Funny coming from San Francisco, where start up can mean a company with 300 staff and millions in the bank. Nonetheless, I’m having a blast, and am so pleased that there are people around me also interested to work with me.

I was really attracted to the NESTA prize, simply because it feels and sounds much more active than some of the other grants and funding I’ve researched in the four months I’ve been in London. I’m an active maker type, so it seemed like a good fit.

I’ve already assembled a provisional team of six, including me, who’ve already given me their time and minds to think about an idea we could submit together. But here’s where it gets tricky, for me. As one of my collaborators commented in our discussions, the grant seems to be encouraging a sort of “you know that thing you were doing anyway? submit it for this!” instead of “invent the right answer,” which, as a new, exploring group, is a much better fit.

The prize is structured like this:

9th February – Applications Close

Applications should be submitted by midday 9th February 2015 using our Heritage & Culture group on Collabfinder. All completed projects on our Collabfinder group at this time will be considered.

After considering all applications at this time we will then notify teams via Collabfinder to let them know whether their idea has been selected to come along to the Creation Weekend.

17th February – Pre-Weekend Meetup – Venue TBC, London

This meetup is mainly for those teams who have been invited to attend the Creation Weekend as we will brief them on how to prepare the weekend and set them further deadlines.

In addition, for those who haven’t been invited to the Creation Weekend, it is an opportunity to meet successful teams and see if there are opportunities for collaboration. We will also announce our Judges for the Challenge.

28th February + 1st March – Creation Weekend – British Museum, London

At the Creation Weekend teams will further build and test their product and service. Teams will be also be assessed against the judging criteria and given coaching on pitching. On the Sunday afternoon, teams will then pitch to a panel of judges who will then chose three teams to progress further. These finalists will subsequently receive a £5,000 prize as well as a tailored package of support.

March – May – Incubation

The three finalists will further develop their product during this period. They will also receive a tailored package of support provided by Nesta and ODI including help to think through their business model, as well as further service design skills training.

May – Winner Announced

The three finalists will present one final time to the panel of judges who will then decide which team will be awarded the grand prize of up to £50,000.

– See more at: http://www.nesta.org.uk/heritage-culture-open-data-challenge

What are the expectations about level of effort from NESTA?

  • What is a “tailored package of support”?
  • Who are the types of people you have in mind as your target?
  • Do you think this I should be treating this like a side project?
  • Do you expect that most of the applicants will already have built what they submit?
  • Are you hoping to attract experienced, commercial practitioners?
    • What is a “tailored package of support” for experienced practitioners?
  • Do people who go for these prizes do this in the course of their work day, paid by an employer?
  • What kinds of organizations can bear this kind of financial risk?

In my head, I’d done rough calculations about the level of effort that a crack team of experienced practitioners could make in order to make something impressive. Let’s just say a team of six experienced folks would cost me about £10k a week, and that’s a very reasonable rate. That’d give us about a month’s worth of time to make something we could stand behind. Maybe I could ask the group to spend way less time on it, and we try to work in our spare time or on weekends, for say, 4 days instead of 30 to try to lessen the huge risk to all of us. But, knowing what I know about how wily software development can be, and how difficult it is to build something good in 4 days, I just don’t think that would be satisfying.

There are certainly some good reasons to keep going through with the whole shebang. It would be a great prize to win, since it would get the name out there, and the idea we’ve come up with is also a good continuation of the first two internal R&D projects we’ve already put out, but, I’m torn because to me, it’s a £50k bet, something which a tiny, new company can’t afford.

I don’t know what to do! I hope this post opens a discussion.

Have you worked on one of the previous NESTA open data prizes? How did you absorb the risk?

Advertisements

New office for Good, Form & Spectacle!

This week we moved into our first office space, in Shoreditch, in London. It’s arguably one of the gravitational centres of the “Silicon Roundabout” too. This office space has had some hallowed tenants, including BERG, the Really Interesting Group and Makie.

It’s exciting to have a home again after several months working from home(s), coffee shops, museums and libraries around the city. It’s also great to be in the same place as a new digital product team starting up out of one the grand daddies of the tech scene in London, MOO Print.

Nice start to 2015!

This is a guest post from Tom Armitage, our collaborator on the V&A Spelunker. It’s our second internal R&D project, and we released it last week.


Early on in the process of making the V&A Spelunker – almost a few hours in – I said to George something along the lines of “I’m really trying to focus on sketching and not engineering right now“. We ended up discussing that comment at some length, and it’s sat with me throughout the project. And it’s what I wanted to think about a little now that the Spelunker is live.

For me, the first phase of any data-related project is material exploration: exploring the dataset, finding out what’s inside it, what it affords, and what it hints at. That exploration isn’t just analytical, though: we also explore the material by sketching with it, and seeing what it can do.

The V&A Spelunker is an exploration of a dataset, but it’s also very much a sketch – or a set of sketches – to see what playing with it feels like: not just an analytical understanding of the data, but also a playful exploration of what interacting with it might be like.

Sketching is about flexibility and a lack of friction. The goal is to get thoughts into the world, to explore them, to see what ideas your hand throws up autonomously. Everything that impedes that makes the sketching less effective. Similarly, everything that makes it hard to change your mind also makes it less effective. It’s why, on paper, we so often sketch with a pencil: it’s easy to rub out and change our mind with, and it also (ideally) glides easily, giving us a range of expression and tone. On paper, we move towards ink or computer-based design as our ideas become more permanent, more locked. Those techniques are often slower to change our minds about, but they’re more robust – they can be reproduced, tweaked, published.

Code is a little different: with code, we sketch in the final medium. The sketch is code, and what we eventually produce – a final iteration, or a production product – will also be code.

As such, it’s hard to balance two ways of working with the same material. Working on the Spelunker, I had to work hard to fight the battle against premature optimisation. Donald Knuth famously described premature optimisation as ‘the root of all evil‘. I’m not sure I’d go that far, but it’s definitely an easy put to fall into when sketching in code.

The question I end up having to answer a lot is: “when is the right time to optimise?” Some days, even in a sketch, optimisation is the right way to go. If we want to find out how many jumpers there are in the collection – well, that’s just a single COUNT query; it doesn’t matter if it’s a little slow.

I have to be doubly careful of premature optimisation when collaborating, and particularly sketching, and remember that not every question or comment is a feature request. My brain often runs off of its own accord, wondering whether I should write a large chunk of code, when really, the designer in me should be just thinking about answering that question. The software-developer part of my brain ought to kick in later, when the same question has come up a few times, or when it turns out the page to answer that question is going to see regular use.

For instance, the Date Graph is also where the performance trade-offs of the Spelunker are most obvious. By which I mean: it’s quite slow.

Why is it slow?

I’d ingested the database we’d been supplied as-is, and just built code on top of it. I stored it in a MySQL database simply because we’d been given a MySQL dump. I made absolutely no decisions: I just wanted to get to data we could explore as fast as possible.

All the V&A’s catalogue data – the exact dataset we had in the MySQL dump – is also available through their excellent public API. The API returns nicely structured JSON, too, making an object’s relationships to attributes like what it’s made of really clear. A lot of this information wasn’t readily available in the MySQL database. The materials relations, for instance, had been reduced to a single comma-separated field – rather than the one-to-many relationship to another table that would have perhaps made more sense.

I could have queried the API to get the shape of the relationships – and if we were building a product focused around looking up a limited number of objects at a time, the API would have been a great way to build on it. But to begin with, we were interested in the shape of the entire catalogue, the birds’ eye view. The bottleneck in using the API for this would be the 1.1 million HTTP requests – one for each item; we’d be limited by the speed of our network connection, and perhaps even get throttled by the API endpoint. Having a list of the items already, in a single place – even if it was a little less rich – was going to be the easiest way to explore the whole dataset.

The MySQL database would be fine to start sketching with, even if it wasn’t as rich as the structured JSON. It was also a little slow due to the size of some of the fields – because the materials and other facets were serialized into single fields, they were often quite large field types such as LONGTEXT, which were slow to query against. Fine for sketching, but it’s not necessarily very good for production in the long-term – and were I to work further on this dataset, I think I’d buckle and either use the API data, or request a richer dump from the original source.

I ended up doing just enough denormalizing to speed up some of the facets, but that was about it in terms of performance optimisation. It hadn’t seemed worthwhile to optimize the database at that point until I knew the sort of questions we want answered.

That last sentence, really, is a better answer to the question of why it is slow.

Yes, technically, it’s because the database schema isn’t quite right yet, or because there’s a better storage platform for that shape of data.

But really, the Spelunker’s mainly slow because it began as a tool to think with, a tool to generate questions. Speed wasn’t our focus on day one of this tiny project. I focused on getting to something that’d lead to more interesting questions rather than something that was quick. We had to speed it up both for our own sanity, and so that it wouldn’t croak when we showed anybody else – both of which are good reasons to optimise.

The point the Spelunker is right now turns out to be where those two things were in fine balance. We’ve got a great tool for thinking and exploring the catalogue, and it’s thrown up exactly the sort of questions we hoped it would. We’ve also begun to hit the limits of what the sketch can do without a bit more ground work: a bit more of the engineering mindset, moving to code that resembles ink rather than pencil.

“Spelunker” suggest a caving metaphor: exploring naturally occurring holes. Perhaps mining is a better metaphor, and the balance that needs to be struck digging your own hole in the ground. The exploration, the digging, is exciting, and for a while, you can get away without supporting the hole. And then, not too early, and ideally not too late, you need to swap into another other mode: propping up the hole you’ve dug. Doing the engineering necessary to make the hole robust – and to enable future exploration. It’s a challenge to do both, but by the end, I think we struck a reasonable balance in the process of making the V&A Spelunker.

If you’re an institution thinking about making your catalogue available publicly:

  • API access and data dumps are both useful to developers depending on the type of work they’re doing. Data dumps are great for getting a big picture. They can vastly reduce traffic against your API. But a rich API is useful for integrating into existing platforms, especially if they make relatively few queries per page against your API (and if you have a suitable caching strategy in place). For instance, an API is the ideal interface great for supplying data about a few objects to a single page somewhere else on the internet (such as a newspaper article, or an encyclopedia page).
  • If you are going to supply flat dumps, do make sure those files are as rich as the API. Try not to flatten structure or relationships that’s contained in the catalogue. That’s not just to help developers write performant software faster; it’s also to help them come to an understanding of the catalogue’s shape.
  • Also, do use the formats of your flat dump files appropriately. Make sure JSON objects are of the right type, rather than just lots of string; use XML attributes as well as element text. If you’re going to supply raw data dumps from, say, an SQL database, make sure that table relations are preserved and suitable indexes already supplied – this might not be what your cataloguing tool automatically generates!
  • Make sure to use as many non-proprietary formats as possible. A particular database’s form of SQL is nice for developers who use that software, but most developers will be at least as happy slurping JSON/CSV/XML into their own data store of choice. You might not be saving them time by supplying a more complex format, and you’ll reach a wider potential audience with more generic formats.
  • Don’t assume that CSV is irrelevant. Although it’s not as rich or immediately useful as structured data, it’s easily manipulable by non-technical members of a team in tools such as Excel or OpenRefine. It’s also a really good first port of call for just seeing what’s supplied. If you are going to supply CSV, splitting your catalogue into many smaller files is much preferable to a single, hundreds-of-megabytes file.
  • “Explorer” type interfaces are also a helpful way for a developer to learn more about the dataset before downloading it and spinning up their own code. The V&A Query Builder, for instance, already gives a developer a feel for the shape of the data, what building queries looks like, and clicking through to the full data for a single object.
  • Documentation is always welcome, however good your data and API! In particular, explaining domain-specific terms – be they specific to your own institution, or to your cataloguing platform – is incredibly helpful; not all developers have expert knowledge of the cultural sector.
  • Have a way for developers to contact somebody who knows about the public catalogue data. This isn’t just for troubleshooting; it’s also so they can show you what they’re up to. Making your catalogue available should be a net benefit to you, and making sure you have ways to capitalize on that is important!

This is a copy of a blog post I wrote for the Victoria & Albert Museum Digital Media blog. I thought it would be nice to pop a copy here for posterity.

As part of our ongoing research practice, we’ve made a new toy to help you explore the wondrous Victoria and Albert Museum’s catalogue, the V&A Spelunker.

Spelunking is an American word for exploring natural, wild caves. You might also say caving, or potholing here in the UK. I hope using this thing we’ve made feels a bit like exploring a dark cave with a strong torch and proper boots. It’s an interface to let you wander around a big dataset, and it’s designed to show everything all at once, and importantly, to show relationships between things. Your journey is undetermined by design, defined by use.

The V&A Spelunker’s Skeleton

In some ways, the spelunker isn’t particularly about the objects in the collection — although they’re lovely and interesting — it now seems much more about the shape of the catalogue itself. You eventually end up looking at individual things, but, the experience is mainly about tumbling across connections and fossicking about in dark corners.

The bones of the the spelunker are pretty straightforward. It’s trying to help you see what’s connected to what, who is connected to where, and what is most connected to where, at a very simple level. You have the home page, which shows you a small random selection of things of the same type, like these wallpapers:

You can also look around a list of a few selected facets:

And at some point, you’ll find yourself at a giant list of any objects that match whatever filter you’ve chosen, like hand-sewn things, or all the things from Istanbul:

Just yesterday, we added another view for these lists to show you any/all images a little larger, and with no metadata. It’s a lovely way to look at all the robes or Liberty & Co. Ltd. fabrics or things in storage.

If you see something of interest, you can pull up a dumb-as-a-bag-of-hammers catalogue record view which is just that. Except that it also links through to the V&A API’s .JSON response for that object, which shows you some of the juicy interconnection metadata. Here’s a favourite I stumbled on:

(Incidentally, I was thrilled but slightly frightened to see this “Water cistern with detatchable drinking cup, modelled as a chained bear clasping a cub to its breast” from Nottingham in person in the absolutely stunning ceramics gallery, in person.)

The Beauty of an Ordered List

If you choose one of the main facets like Artist, or Place, you’ll get to a simple ordered list of results for that facet. It’s nice because you can see a lot of information about the catalogue at a glance.

You can see that the top four artists in the catalogue are Unknown (roughly 10%), Worth (as in the House of Worth, famous French couturiers), Hammond, Harry (‘founding father of music photography‘) and the lesser-known unknown.

I was curious to learn, at a glance, that most of the collection appears to come from the United Kingdom. (I might be showing my ignorance here, but this was a surprise to me.)

Here are the most common 20 places, with UK in bold:

London 59,661
England 42,178
Paris 36,890
Britain 32,388
Great Britain 27,486
France 23,540
Italy 11,562
Staffordshire (hello, Wedgwood?) 11,007
Germany 6,666
China 5,275
Europe 5,260
Japan 4,005
Royal Leamington Spa (3,857 hand-coloured fashion plates, from the ‘Pictorial History of Female Costume from the year 1798 to 1900’) 3,859
Iran 3,411
India 3,302
Jingdezhen (known for porcelain) 3,261
United Kingdom 3,098
United States 3,045
Rome 2,961
Netherlands 2,943

Catalogue Topology

Those simple sorts of views and lists start to help you make suppositions about the collection as a whole. Perhaps you can start to poke at the stories hidden in the catalogue about the institution itself. I found myself wanting to try to illustrate some other aspects of the catalogue that just its contents, and that’s when this happened…

The Date Graph

The Date Graph has three simple inputs, all date related. The V&A sets two creation dates for each object: year_start and year_end. Each record also gets a museum_number, which, as in the case of our weird bear, looks like this:

1180-1864

Those last four digits there normally represent the year the object was acquired. So, we snipped out that date and drew all the three dates together in a big graph.

The more you look around the date graph, the more you start to see what might be patterns, like this big set of stuff all collected in the same year. Often these blocks of objects are related, like prints by the same artist, or fragments of the same sculptural feature:

Some objects in the collection have very accurate creation date ranges, while some are very broad, even hundreds of years wide. The very accurate ones are often objects that have a date on it, like coins:

It’s also interesting to see how drawing a picture like this date graph can show you small glitches in the catalogue metadata itself. Now, I don’t know enough about the collections, but perhaps this sort of tool could help staff find small errors to be corrected, errors that are practically impossible to spot when you’re looking at big spreadsheets, or records represented in text. Here’s an example, from the graph that shows objects in the 2000-2014 range… see those outliers that look as if they were acquired before they were created?

Asking Different Questions

I kept finding myself wondering if the Date Graph style of view could show us answers to some questions that are specific to the internal workings of the V&A. Could we answer different sorts of questions about the institution itself?

  • When do cataloguers come and go as staff? Do they have individual styles?
  • Can we see the collecting habits of individual curators?
  • Does this idea of “completeness” of records reveal how software could change the data entry habits used to make the catalogue?
  • Do required fields in a software application affect the accuracy of “tombstone” records?

A new feature I’d like to build would be a way to add extra filters on the date graph, like show me all the Jumpers acquired by the museum that were made between 1900-1980.

It’s a Sketch

No Title by Jamini, Roy

Even though we put in a good effort to make this, it’s still a rough sketch. Now that it’s built and hanging together there are all sorts of things we’d like to do to improve it. If anything, it’s teased out more questions than answers for us, and that’s exactly what this sort of thing should do. My collaborator, Tom Armitage, is also going to write a post over on the Good, Form & Spectacle Work Diary about “Sketching and Engineering” in a little while too, so stay tuned for that.

We hope you enjoy poking around, and we’d love to hear of any interesting discoveries you make. Please tweet your finds to us @goodformand.

Go spelunking!

theresa-going-in

theresa-going-in by Theresa – CC BY-NC-ND
2.0