Look before you leap – avoiding irrelevant Tweets in your analysis

April 30 2010

The recent UK election debates have provided a great platform for monitoring social media networks to get ‘instant’ reaction to both the personalities and the issues of the future prime ministerial candidates.

This analysis adds a new perspective to the tried and trusted methods of opinion poll analysis. It (social media monitoring) in no means competes with the traditional opinion polls which are backed up with rigorous methodology and proven record. However, to ignore what people are saying in an unconstrained and free environment is missing out on an important dimension.

Twitter provides this sort of environment and as you can see below we have been doing a lot of work in finding out what Twitterers were saying during the recent election debates.

Having done the debate analysis over the past few weeks, combined with months of research on other subjects (Haiti earthquake, Swine flu vaccine) previous to that, we’ve learnt a lot about the nature of tweets and how to extract meaning from them.

One of the challenges of Twitter text analysis is to remove irrelevant noise. Take a look at the graph below which plots positive sentiment of each of leaders during the BBC debate on 29th April 2010.


Figure 1: positive sentiment towards each of the leaders during the BBC debate on 29th April 2010

It seems there was huge surge for David Cameron at about 8.45pm. What did he say or do that caused such an up swell of public opinion? Did he make a cutting comment? Did he reveal a revolutionary new policy? If only…

Twitter, due to its very nature, is prone to ironic and witty remarks which can easily be misinterpreted by some systems (and humans). This spike was caused by someone tweeting “@mrchrisaddison: Sky poll just in! David Cameron won the debate! …”.

It was a sarcastic remark that sparked a huge amount of re-tweeting. People were re-tweeting as a joke and some must have thought it was real ground breaking news. Chris Addison is a comedian who has around 24,000 followers.

Here is what the graph looks like with the irrelevant noise filtered out:

Figure 2: with noise filtered out

The moral of the story? Use the right tools to filter out rubbish. Make sure you have a good understanding of the data.

Finally never ever forget that human analysis is always needed as a final step, using good quality tools will reduce the human effort but not take it away completely.