Last Thursday I went to Utrecht for the Haren Hackathon, a gathering of data and social media nerds to investigate a dataset of tweets concerning the rioting in Haren last month. Lots of fun stuff got made, but I only just now finished my contribution and put it online.

Basically I built an hour-by-hour “trending topics” visualisation, from the dataset of 550,000 tweets that Harro Ranter brought to the Hackathon. The intention was to get a sense of the emotional tone through the course of the day. I must admit that this doesn’t come through as clearly as I had hoped (although the sudden emergence of “afgebroken” and “wereldoorlog” at 8pm is clear). The technique does clearly show the appearence at 10pm of the rumour that a 19-year-old girl had been trampled to death (thankfully this turned out to be untrue), and the cleanup discussion starting at 7 the next morning.

There is one improvement that I would like to make (although probably won’t get around to). The analysis I made simply went hour-by-hour, but the volume of tweets in those hours varies wildly, from less than 1000 in the early hours of 21/9 to 110,000 in the hour before midnight of the same day. The statistical technique I used for the trending topics requires a reasonably large volume to give good results, but there is certainly room to divide up the hours from 9pm until midnight into smaller time periods, which would give a more fine-grained picture.

Another feature that the visualisation really needs, although I don’t see an easy way to provide it, is a back-link from each term to some indication of the tweets that it represents. Sadly, twitter search doesn’t give old results, and the code I used for the calculations doesn’t keep track of which documents the terms came from.1

If anyone has related ideas, better design chops (not hard), or just wants to play, check out the javascript of the page: the data is in an included file.2


  1. Even if it did, I’m pretty sure displaying all of them would violate Twitters Terms & Conditions. []
  2. Hackathon members: there are CSV textfiles in the Dropbox if that’s your preferred format, under sentiment_data/terms/. []