Posts from May 2017

Text Mining Platform I2E features in Best Practices Final and as a Best of Show Award Contender; Linguamatics CTO David Milward a Featured Speaker

Cambridge, UK & Boston, USA – May 22, 2017 – Leading Natural Language Processing (NLP) text analytics provider Linguamatics today announced plans to highlight the latest version of its text mining platform at this week’s Bio-IT World Conference & Expo in Boston. Bio-IT World has named Linguamatics I2E 5.0 a contender for the Best of Show Award, and Linguamatics’ customer Pentavere Research Group a Best Practices finalist.

The Best of Show Awards showcase exceptional innovation in technologies used by life science professionals. As a Best of Show Award contender, Linguamatics is also eligible for the Bio-IT World People’s Choice Award, chosen by votes from the Bio-IT World Community. Voting for the People’s Choice Award is open from 5 pm ET Tuesday May 23 through 1 pm ET on Wednesday May 24.

Bio-IT World also chose Linguamatics' customer Pentavere Research Group as a Best Practices finalist, based on their work using I2E to mine unstructured data for real-world evidence to improve health outcomes. Best Practices finalists are recognized for their outstanding examples of technology innovation, from basic R&D to translational medicine. Pentavere deployed I2E to effectively mine unstructured EHR data, expediting delivery of their product daRWEn™ to the Real World Evidence market.


There’s a lot of buzz in the healthcare community at the moment surrounding the use of artificial intelligence with machine learning for pattern identification, decision-making, and outcome prediction. The availability of high-quality data for training algorithms is vital to machine learning’s success - but a lot of this information is tied up in unstructured clinical notes. Natural language processing (NLP) is the key to extracting the “good stuff” from this vast trove of unstructured text. Combining that “good stuff” with already structured data helps healthcare providers to understand the patterns and trends in data via machine learning - and thereby enhance care, reduce costs, and improve population health.

Which type of NLP software is best?

The first question that healthcare users must ask themselves is “Which type of NLP software best suits my needs?”

Statistical NLP systems require example data to identify patterns in new data. The examples may come from dictionaries or ontologies - or they might need to be manually annotated by a clinician - which can be an extremely laborious and institutionally costly task.

Meanwhile, most rule-based NLP systems require a specialist to define the types of language rule or pattern that represent certain healthcare concepts. This approach can make them more accurate, but they will be limited only to the patterns that the specialist has thought of.


The combined value of NLP and Machine Learning – a concrete example

With the rising costs of de novo drug discovery, and increasing focus on rare diseases, there is continuous innovation for methods and solutions to find new uses for existing drugs.  I was interested to hear of a novel approach for this, published recently by Eric Su and Todd Sanger at Eli Lilly. In this paper, “Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov”, the authors describe the combined use of Natural Language Processing (NLP) and Machine Learning (ML), to extract potential new uses of existing drugs.

It’s quite astonishing how often in the last weeks and months I’ve been asked about the interplay between NLP, Artificial Intelligence (AI), and ML. It seems that everyone wants to understand more about the real potential (rather than the hype that is being shouted from the rooftops) that these tools will provide to impact healthcare, research, and many other areas of our lives, in the next decade.

So, let’s delve further into this concrete example of the combined value of NLP and ML. The innovative step here was to exclude trials for a specific indication, such as cancer, and then find trials with Serious Adverse Events (SAEs) classified as cancerous. The researchers then looked to see if the placebo arm had more cancerous SAEs. If the placebo arm had more cancer-related SAEs than the treatment arm, they hypothesized that the treatment has a positive anti-cancer effect.


It seems like there’s not much you can’t do, if you are as ingenious as our customers!  Want to understand your patients better? Use NLP. Want to visualise the world of chemical safety? Use NLP. Want to look at real world data for adverse events? Use NLP!

These were some of the topics presented by healthcare and life science customers, at our Spring Text Mining Conference in Cambridge UK (#LMSpring17). Attendees from the pharmaceutical industry, biotech, healthcare, academia, and partner vendor companies came together for hands-on workshops, round table discussions, and of course, excellent presentations and talks.

Huntsman Cancer Institute: Speeding Patient Care Alerts

Samir Courdy, Chief Research Information Officer and Director of Research Informatics at the Huntsman Cancer Institute, kicked off the day with a talk on “Navigating the Quagmire of Clinical Data in Free Text Reports”. The team at Huntsman are using NLP to structure and capture data that is currently only available in free text into their Clinical Cancer Research (CCR) database. This framework reduces the cost and work load for manual curation, and enables more effective identification of disease group patients. Queries developed at HCI have been shared with City of Hope cancer treatment and research center. Samir highlighted the importance and advantages of such community sharing of NLP resources for healthcare practitioners and for patients.

Figure: The workflow used at Huntsman Cancer Institute, to create structured data attributes from clinical notes, pathology & radiology reports, map the results into the HCI Clinical Cancer Registry. This workflow speeds patient care alerts.