There’s a lot of buzz in the healthcare community at the moment surrounding the use of artificial intelligence with machine learning for pattern identification, decision-making, and outcome prediction. The availability of high-quality data for training algorithms is vital to machine learning’s success - but a lot of this information is tied up in unstructured clinical notes. Natural language processing (NLP) is the key to extracting the “good stuff” from this vast trove of unstructured text. Combining that “good stuff” with already structured data helps healthcare providers to understand the patterns and trends in data via machine learning - and thereby enhance care, reduce costs, and improve population health.

Which type of NLP software is best?

The first question that healthcare users must ask themselves is “Which type of NLP software best suits my needs?”

Statistical NLP systems require example data to identify patterns in new data. The examples may come from dictionaries or ontologies - or they might need to be manually annotated by a clinician - which can be an extremely laborious and institutionally costly task.

Meanwhile, most rule-based NLP systems require a specialist to define the types of language rule or pattern that represent certain healthcare concepts. This approach can make them more accurate, but they will be limited only to the patterns that the specialist has thought of.


The combined value of NLP and Machine Learning – a concrete example

With the rising costs of de novo drug discovery, and increasing focus on rare diseases, there is continuous innovation for methods and solutions to find new uses for existing drugs.  I was interested to hear of a novel approach for this, published recently by Eric Su and Todd Sanger at Eli Lilly. In this paper, “Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov”, the authors describe the combined use of Natural Language Processing (NLP) and Machine Learning (ML), to extract potential new uses of existing drugs.

It’s quite astonishing how often in the last weeks and months I’ve been asked about the interplay between NLP, Artificial Intelligence (AI), and ML. It seems that everyone wants to understand more about the real potential (rather than the hype that is being shouted from the rooftops) that these tools will provide to impact healthcare, research, and many other areas of our lives, in the next decade.

So, let’s delve further into this concrete example of the combined value of NLP and ML. The innovative step here was to exclude trials for a specific indication, such as cancer, and then find trials with Serious Adverse Events (SAEs) classified as cancerous. The researchers then looked to see if the placebo arm had more cancerous SAEs. If the placebo arm had more cancer-related SAEs than the treatment arm, they hypothesized that the treatment has a positive anti-cancer effect.


It seems like there’s not much you can’t do, if you are as ingenious as our customers!  Want to understand your patients better? Use NLP. Want to visualise the world of chemical safety? Use NLP. Want to look at real world data for adverse events? Use NLP!

These were some of the topics presented by healthcare and life science customers, at our Spring Text Mining Conference in Cambridge UK (#LMSpring17). Attendees from the pharmaceutical industry, biotech, healthcare, academia, and partner vendor companies came together for hands-on workshops, round table discussions, and of course, excellent presentations and talks.

Huntsman Cancer Institute: Speeding Patient Care Alerts

Samir Courdy, Chief Research Information Officer and Director of Research Informatics at the Huntsman Cancer Institute, kicked off the day with a talk on “Navigating the Quagmire of Clinical Data in Free Text Reports”. The team at Huntsman are using NLP to structure and capture data that is currently only available in free text into their Clinical Cancer Research (CCR) database. This framework reduces the cost and work load for manual curation, and enables more effective identification of disease group patients. Queries developed at HCI have been shared with City of Hope cancer treatment and research center. Samir highlighted the importance and advantages of such community sharing of NLP resources for healthcare practitioners and for patients.

Figure: The workflow used at Huntsman Cancer Institute, to create structured data attributes from clinical notes, pathology & radiology reports, map the results into the HCI Clinical Cancer Registry. This workflow speeds patient care alerts.


What are the challenges facing life sciences and healthcare organisations, where text analytics can play a part?  This is one of the key questions that I ask myself and others regularly. There is so much buzz at the minute around big data, real world data, healthcare informatics, wearables; but what is really working, and what is just hype?

One of the ways we get input on this question is, of course, meeting our customers and hearing about their successes. Linguamatics hosts two user group meetings every year, and our European Spring Text Mining Conference is coming up rapidly. Held over 3 days in April, the conference provides scientists and clinicians interested in text mining to come for hands-on training workshops, round table discussions, and a day of talks from both Linguamatics staff and our customers.

This year, our customer speakers encompass a wide range of use cases, spanning the pipeline of discovery, development, and delivery of therapeutics:


Pharmaceutical companies can now extract competitive intelligence from Dow Jones Factiva content using Linguamatics advanced NLP-based text analytics

Cambridge, UK & Boston, USA – March 28, 2017 – Market leading NLP text analytics provider Linguamatics today announced a partnership with premium news content provider Dow Jones. The agreement allows pharmaceutical companies to extract key insights from Dow Jones Factiva utilizing Linguamatics I2E text mining technology.

The Linguamatics I2E platform is currently used by 18 of the top 20 global pharmaceutical companies. The Linguamatics-Dow Jones partnership helps users to derive key insights from Factiva content by leveraging advanced NLP to identify and extract critical concepts in a structured format for review and quick analysis. I2E eliminates the need for users to manually read through large quantities of documents to search for critical information. Instead, I2E rapidly connects relevant facts and relationships in a way that synthesizes knowledge and creates actionable insights.