Text mining in online news & social media

Our high performance text mining software is ideal for analyzing news feeds, RSS and social media – including blogs and Twitter.

With the explosion of online content, it has become an increasing challenge to monitor sentiment and reactions to a new product, service or event.

With Linguamatics I2E text mining software, however, you can automate the extraction of whatever information you seek: reviewer comments, customer complaints or praise, even competitor claims from online media. The technology can be applied to any domain or domains.

Social Media Mining

I2E text mining software can also be used for real time social media analysis. It has the ability to mine large volumes of unstructured information in social media to extract facts, relationships, trends, opinions, relationships and networks of influence.

For instance, I2E has been applied to mining information from the Twitter feed. Its sophisticated and highly scalable natural language processing (NLP) based engine is capable of overcoming all the common barriers to effectively mining Twitter, such as:

  • Large volume of tweets (80 million tweets a day, 29 billion tweets a year)
  • The large amount of irrelevant 'noise' found in tweets
  • The corruption of topic hashtags by spammers

Keyword searches across this type of feed often result in the return of too much irrelevant information, which would take a long time to analyze. The more sophisticated NLP approach used by I2E enables you to:

  • Extract what people are actually saying about a subject, as opposed to just mentioning a subject, e.g. I2E can tell you what people are saying about 'Product X' - Do they like it, or not?
  • Cluster together the different ways people can say the same thing e.g. the language used to say ”I like this product” can be written in hundreds of ways but still express the same sentiment about the product
  • Categorize populations of tweeters according to their behaviours and opinions
  • Eliminate irrelevant or unwanted tweets by filtering them out.