Blog

This years Spring Users Conference is being held in Cambridge, UK between 17th-19th May.

It promises to be another great event with attendance from many of the top names in the sciences industry attending. As usual it provides a great forum to share ideas, learn about the developments in text mining and provides a great opportunity to mingle with professionals from the industry.

Find out more.


Linguamatics has released I2E 3.2 – a faster, more powerful version of its award-winning text mining software.

The company continues its record of innovation and growth in life sciences with the latest release of its market leading text mining solution.

Find out what's new in I2E 3.2


By looking at the popularity of the leaders during each of the televised debates is it possible to make a prediction on who would be the eventual winner in the actual general election?

This is a difficult question to answer but if we look at the statistics there are some conclusions that appear to be possible from the extrapolation of the data that was mined.

 

Percentage of leader popularity during televised debates

These graphs show the percentage popularity of each the leaders during the three televised debates as well as the final election result in terms of percentage of overall votes cast.

The most striking thing to note is that if the Twitter data is extrapolated out (linearly) it corresponds very closely to the actual election results. The extrapolated result for Cameron was 37%, the actual election results was 36% (Conservatives’ share of total votes cast). The extrapolated result for Clegg was 25%, the actual election result was 23% (Liberal Democrats share of total votes cast).

Clegg’s declining popularity and Cameron’s correspondingly increasing popularity stands out quite clearly from the number of positive Tweets about each potential leader. The TV popularity effect of Clegg did not translate into actual votes but the trend analysis if extrapolated would have predicted that. The increasing popularity for Cameron was conclusive as history now shows.


It was nice to see our Twitter analysis of the final UK election debate make it onto the BBC’s Rory Cellan-Jones blog.

RCJ’s post is an interesting one which reports several analyses from different sources, raising the question about what to publish from the Twitter feeds.

In his blog, RCJ /was surprised that we found “Nick Clegg had 37% of positive tweets, followed by Gordon Brown with 32% and David Cameron in 31%,” finishing with “My hunch is that the volume of tweets may have been higher for Gordon Brown than David Cameron – and for all those positive ones, there were actually more that were negative.”

Our issues-based analysis found nearly *4 times as many positive tweets* as negative ones.

Rory was correct that Gordon Brown had a higher volume than David Cameron overall but not on the positive vs negative scores.

 

Figure 1: volume of positive and negative tweets

Here is a quick breakdown of what we found:


The recent UK election debates have provided a great platform for monitoring social media networks to get ‘instant’ reaction to both the personalities and the issues of the future prime ministerial candidates.

This analysis adds a new perspective to the tried and trusted methods of opinion poll analysis. It (social media monitoring) in no means competes with the traditional opinion polls which are backed up with rigorous methodology and proven record. However, to ignore what people are saying in an unconstrained and free environment is missing out on an important dimension.

Twitter provides this sort of environment and as you can see below we have been doing a lot of work in finding out what Twitterers were saying during the recent election debates.

Having done the debate analysis over the past few weeks, combined with months of research on other subjects (Haiti earthquake, Swine flu vaccine) previous to that, we’ve learnt a lot about the nature of tweets and how to extract meaning from them.

One of the challenges of Twitter text analysis is to remove irrelevant noise. Take a look at the graph below which plots positive sentiment of each of leaders during the BBC debate on 29th April 2010.


Figure 1: positive sentiment towards each of the leaders during the BBC debate on 29th April 2010