Press release: Cancer Research UK and Linguamatics Collaborate to Improve Characterisation of Cancer Patient Data for Precision Medicine

Linguamatics I2E natural language processing technology to automatically extract clinical attributes from pathology reports across eight hospital groups in Stratified Medicine Programme.

LONDON and CAMBRIDGE, UK, September 1st, 2015 – Cancer Research UK and Linguamatics announced today they will work on a joint project to apply Linguamatics’ natural language processing (NLP) text analytics platform, I2E, to automatically extract clinical attributes from cancer pathology reports and improve annotation of clinical samples relating to Cancer Research UK’s Stratified Medicine Programme (SMP). This project will allow the analysis of detailed patient characteristics alongside large volumes of genetic data, enabling more effective research into the causes and personalised treatment of cancer.

Dr Ian Walker, Director of Clinical Research and Strategic Partnerships at Cancer Research UK, said: “Pathology reports tell us a range of important information about a patient’s cancer, but the way this data is recorded can vary widely, which makes it harder to spot trends or other significant information that could have a bearing on treatment decisions or prognosis. This collaboration should help translate these reports into more meaningful data, which should help our researchers better understand the disease and accelerate advances in personalised medicine.”

SMP was initiated to look at the use of genetic profiles in making cancer treatment decisions with a view to how personalised medicine would be implemented in the NHS and is a forerunner of the Genomics England 100,000 genomes project. The first project (SMP1) looked at breast, colorectal, lung, prostate, melanoma and ovarian cancers across eight hospital groups and 9000 patients. The second project (SMP2) is focussed on lung cancer.

Due to the complexity and variability of pathology reports, capturing key cancer characteristics (clinical attributes) as discrete data is currently a challenging and time-consuming manual task. The collaboration will involve using NLP to automatically extract key clinical attributes such as tumour size, TNM stage (Classification of Malignant Tumours), topography, histology grade and category, excision margin and use of biomarkers from pathology reports.

“As the healthcare industry moves towards precision medicine, rapid transformation of unstructured patient data, such as pathology reports, into structured insights is vital.” said Simon Beaulah, Director, Healthcare Strategy, Linguamatics. “This project will also demonstrate how to address the challenges from variable report structure and use of language across hospitals. We are delighted to be working with Cancer Research UK on such an innovative project as the Stratified Medicine Programme. Using NLP in this way will yield huge benefits to the cancer community by improving understanding of patient populations and ultimately cancer care.”