Posts from September 2015

It was great to see our paper on the i2b2 NLP challenge from last year published recently. The challenge looked at extraction of Coronary Artery Disease risk factors from unstructured patient data provided by the Research Patient Data Repository of Partners Healthcare. Having done previous i2b2 challenges, such as smoking cessation, after the competition had closed, we wanted to actively participate in the 2014 NLP challenge and see how we compared against other NLP groups in the competition. Linguamatics work with many academic medical centers and cancer centers and view collaboration as a key component of our customer relationships. As such, we wanted to share our success or failure with our peers and show how a commercial system can tackle these areas.

The i2b2 training set consisted of 790 annotated documents relating to 178 patients, which we decided to divide into training (70%) and development (30%) sets. The test set contained 514 documents from 118 patients. Contestants were set this task: extract CAD risk factors such as specific diseases (e.g. diabetes), medications, family history of CAD and lab results; also take into account when tests were carried out or whether a disease diagnosis was in the past or current.

Our team’s results were excellent and, at 91.7% Micro F-Score, were competitive with the best system in this challenge. I2E, being a rule based system, was well suited for the challenge compared to machine learning systems because:


There seems to be a certain buzz around rare and orphan diseases. Following the Findacure meeting I attended last month, there are two recent events I’d like to mention.

Firstly, I attended the first Cambridge Rare Disease Network summit, held in Cambridge UK, with a fantastic line-up of speakers from a range of professions to discuss current and new initiatives in rare disease. The debates ranged from the use of next generation sequencing for diagnostics, to crowd-sourcing both for science and funding, to drug repurposing, to the views of payers and the issues around pricing.

For me it was also a reminder, particularly from some of the parent speakers, of the impact that rare disease has on individuals and families. All too often we are so busy with the day-to-day of research and business that it's easy to lose sight of the ideal end-goal - treatments for all adults, all children, affected by these disparate and often devastating diseases.

Secondly, this month the FDA released new draft guidance “to navigate the difficult road to approval of drugs for rare diseases”.


I’m thrilled to see that Linguamatics I2E 4.3 is named as a KMWorld 2015 Trend-Setting Product.  Linguamatics I2E has a proven track record in delivering best of breed text mining capabilities across a broad range of application areas. Its agile nature allows tuning of query strategies to deliver the precision and recall needed for specific tasks, but at an enterprise scale.

According to customers, I2E gets to actionable results at least 10 times faster than a traditional keyword search. In many cases, I2E will produce successful results for projects that would otherwise be impossible or intractable.

Actionable information extracted using I2E can be presented in a variety of ways depending on your needs. NLP-based text mining provides the capability to look through unstructured text (typically in large sets of documents, from scientific reports, patents, or electronic healthcare records, pathology and radiology reports); and use sophisticated queries to automatically identify and extract out structured data (concepts and associations) to enable the system to interpret the meaning of the text. 

 


Linguamatics I2E natural language processing technology to automatically extract clinical attributes from pathology reports across eight hospital groups in Stratified Medicine Programme.

LONDON and CAMBRIDGE, UK, September 1st, 2015 – Cancer Research UK and Linguamatics announced today they will work on a joint project to apply Linguamatics’ natural language processing (NLP) text analytics platform, I2E, to automatically extract clinical attributes from cancer pathology reports and improve annotation of clinical samples relating to Cancer Research UK’s Stratified Medicine Programme (SMP). This project will allow the analysis of detailed patient characteristics alongside large volumes of genetic data, enabling more effective research into the causes and personalised treatment of cancer.

Dr Ian Walker, Director of Clinical Research and Strategic Partnerships at Cancer Research UK, said: “Pathology reports tell us a range of important information about a patient’s cancer, but the way this data is recorded can vary widely, which makes it harder to spot trends or other significant information that could have a bearing on treatment decisions or prognosis. This collaboration should help translate these reports into more meaningful data, which should help our researchers better understand the disease and accelerate advances in personalised medicine.”