There’s growing interest in the use of machine learning to solve challenges across the drug-discovery pipeline within the biopharmaceutical community. The availability of high quality data for training algorithms is vital to machine learning success - but much of this information is tied up in unstructured, or semi-structured text sources. Natural language processing (NLP) is the key to extracting the wealth of data hidden in unstructured text, and Linguamatics’ customers have been finding out first-hand what this approach can do for them.

Using Linguamatics I2E NLP text mining:

  • Eli Lilly researchers mine adverse event data to identify potential new uses for existing drugs.
  • A top-10 pharma company process and understand unstructured “voice of the customer” call feeds, to categorize the feeds and help build predictive models.
  • Roche and Humboldt University of Berlin identified MEDLINE abstracts containing both the protein target and specific disease indication of a known set of cancer therapeutics, and applied machine learning to predict the success or failure of drugs in Phase II or III with high accuracy.

Read the full “Data-driven NLP plus machine learning” application note to find out more about how NLP can support effective machine learning projects.


 “Sometimes I think we’re becoming more of a data analytics company than anything else”

Humana Chief Medical Officer Roy Beveridge, M.D

Why are top payers investing in analytics?

In a recent Fierce Healthcare interview, Humana highlighted their long term commitment to analytics. Humana’s focus on Medicare Advantage means that they are increasingly in data partnerships with hospitals to provide insights and support through population health tools. What is fascinating to me, is this type of partnership would never have been possible in a fee-for-service world, and reflects the move to more value-based-care.

Population Health analytics

Population health tools in this space need to work with the heterogeneous data sets payers receive and are used to characterize and support their members. Many groups I have spoken to stratify high risk individuals in these data sets; for example smokers with heart disease are identified and given guidance by payers on quitting and incentives for exercise programs. I especially admire the way value-based providers and payers are working together to allow advice on high risk individuals to be given directly to the clinicians.


I always like the Fall, the "season of mists and mellow fruitfulness". It was the Autumn Equinox recently, and here in the UK, we are enjoying that lovely balance of weather at the turning of the seasons, the slow change from summer to fall, autumn fruits ripening, nights starting to draw in.

And of course this means our thoughts turn to… the Linguamatics Text Mining Summit! Held on the East Coast, this is an opportunity for our text mining community, across both healthcare and pharma industries, to come together. Attendees share best practice, get some hands-on training, and listen to talks on how others are finding real value from their textual data, using the power of NLP-based text mining.

This year, we will be in New Castle, New Hampshire, and the main talks will be on Tuesday 2nd October and Wednesday 3rd October. As always we have a balance of talks from our life science and healthcare customers, and from Linguamatics presenters, providing updates on current and future developments and plans. Our customer speakers encompass a wide range of use cases, spanning drug discovery and development, and into clinical delivery of therapeutics and better patient care.


Linguamatics I2E 5.1 focuses on further increasing the power and scale of querying, while optimizing the users’ experiences of query building.

I2E 5.1 enriches and expands on the capabilities introduced in I2E 5.0, which made a big splash in NLP text mining technology.  

I2E 5.1 addresses the increasing variety of representations of the same concept in big data by finding more matches for terms in a document: variations in accented characters, spelling errors, and OCR artefacts are taken into consideration when matching. This ‘fuzzy matching’ returns greater search results and increases recall and accuracy.

One customer commented: ‘I am really looking forward to I2E 5.1’s spelling correction…you don’t realize how much you can miss in your search results because of typos and spelling mistakes.’

Data normalization in I2E, a key feature for tackling big data’s increasing variety, is now easier to use. Regardless of how the original document is written, you can define your numeric ranges in a different unit; for example, you can filter in pounds (as an upper or lower threshold or as a range) and display the results in kilograms.

I2E 5.1 introduces an integrated view of your query and a way of dragging queries around the editor, making it easier to design, tune and maintain your searches.


The 2017 Text Mining Summit (New Castle, New Hampshire, October 2-4) will be your first opportunity to take part in our new I2E Certificate Program.  The Level 1 Query User Certificate will be open to those who have just taken the “Introduction to I2E” hands-on workshops provided at the TMS, as well as more established users, who have taken the “Introduction to I2E” training on previous occasions. See the TMS Workshop Selection Guide for more details. It’s free to join in as part of your TMS registration.

Completing the different levels of the Certificate Program will allow you to validate, extend and improve your I2E skills. The Query User Certificate will focus on using and editing basic queries and Resource queries to:

  • Create simple queries with different constraints, morphological variants, preferred terms and alternative lists

  • Use classes to improve recall and precision of queries with linguistic classes, ontologies, and pattern ontologies

  • Work with results by using limits, output formats and displays

  • Use Resource queries to answer common questions

Those taking the Query User Certificate at the TMS will have access to:

  • In-class instruction

  • Practical, hands-on experience with I2E

  • Open question sessions with I2E Experts

  • A set of learning objectives

  • Learning materials, including

    • Tutorial booklets