As humans we find it easy to quickly read a variety of date formats and almost instantly understand what they represent. We can glance at dates written as “23rd Jan 1894”, “Jan-Mar 1856”, “c. 1960” or “1945-1949” and we know how big the date range is and how accurate it might be. These are just a few examples of dates formats we found in the Postal Museum collections.
To work with those dates effectively in our software we needed to find a way to parse them and represent them in a single format. There are many software libraries designed for translating between standardised date formats (eg. ones used by different countries), but parsing formats commonly used in archives is a slightly less popular problem to solve. Archives tend not to have an enforced, fixed way of writing down dates, so there can be a surprising variety of notations. This isn’t a bad thing — it gives the archivists the flexibility they need to represent their knowledge about the objects under their care. Each collection might have its own quirks.
Some smaller software libraries do take on a challenge of parsing dates from more natural, human-readable formats, but we decided to devise our own way. We had a very specific set of formats and couldn’t find an existing solution that could deal with all of them easily.
I wrote DateRanger, a Ruby library which takes in those formats and translates them into a data structure which represents the start and end of the date range. It makes it straightforward to understand the accuracy of the date — the wider the range, the less accurate or specific the date was to begin with. We’d love to see contributions from anyone interested in expanding how many formats DateRanger can work with.
I used automated tests to build up the code in stages, starting from parsing really simple dates, and culminating at testing even obscure format combinations that we didn’t quite encounter in our data sample. The tests-first approach meant I managed to notice and catch some pretty confusing bugs really early on.
We used DateRanger on the Postal Museum touch table, to help us determine where on the timeline to place the collection records. We did, however, use the original date formats from the archive to display to the viewer. After all, those are already perfectly human-readable.