Innovations in Text Mining - the view from Cambridge

May 4 2016

Linguamatics hosted our Spring Text Mining Conference in Cambridge last week (#LMSpring16). Attendees from the pharmaceutical industry, biotech, healthcare, personal consumer care, crop science, academia, and partner vendor companies came together for hands-on workshops, round table discussions, and of course, some excellent presentations and talks. 

The talks kicked off with a presentation by Thierry Breyette, Novo Nordisk, who described three different projects where text mining provided signficant value from real world data.  Thierry took the RAND Corporation definition: "Real-world data (RWD) is an umbrella term for different types of data that are not collected in conventional randomised controlled trials. RWD comes from various sources and includes patient data, data from clinicians, hospital data, data from payers and social data."

At Novo Nordisk they have gained business impact by text mining a variety of souces, including: social media to find digital opinion leaders; conversation transcripts between medical liaisons and healthcare professionals for trends around clinical insights; and mining patient & caregiver ethnographic data to see patterns in patient sentiment and compliance.

Real-world data from PatientsLikeMe was one of the sources discussed by the next speaker, James Loudon-Griffiths from AstraZeneca. James presented on a project to understand better the incidence of nausea as an adverse drug reaction, from real world vs. clinical trial data extract. While nausea may seem like a mild adverse reaction to drugs, it is actually one of the most common causes of non-compliance, and understanding the incidence of nausea from marketed drugs both from clinical trial data (from FDA Drug Labels) and from patient reported outcomes (from PLM) is important for clinicians and pharma alike.

Getting even closer to the patient experience, we had a talk by Jonathan Hartmann, informationist/librarian at Dahlgren Memorial Library, the health sciences library at Georgetown University. Jonathan is a regular speaker at Linguamatics' conferences, talking about the use of NLP to glean insights from the scientific literature for immediate patient care. He uses an iPad to search Medline and full text journals whilst on ward rounds with clinicians. As well as an overview of the technological challenges, Jonathan talked about several specific patient cases, where his rapid searches have been able to immediately influence the clinician's decision and enhance patient care.

Use of real world data was a real theme through the conference.  Eleanor Yelland, working at University College London, is using I2E to extract information from transcripts from online cognitive behaviour therapy sessions. She is researching the use of an NLP approach to other text analytics metrics such as word count or dictionary methods. The aim is to build predictive models of therapy progress for mild to moderate anxiety and depression. This is an on-going research project, and so far Eleanor has found a significant relationship between selected language variables and outcome; however individual variability is a huge factor.

The last customer talk took a different tack. All text mining tools need good vocabularies and ontologies, and there are many public domain ontology sources. However, these often have their limitations, and tools to hunt and structure synonyms for internal use are important. Ralf Jaeger, Roche, described an approach to discovering synonyms and building dictionaries to assist in better text mining. He reviewed approaches such as simple brute force (manual extraction of all words), using hints from context, or rule-based approaches. To finish his talk, Ralf posed an interesting question, "Is writing scientific papers in a human readable format the only reasonable option to publish?"  It seems that until the scientific publishing industry changes, we will continue to need text analytics tools to gain full benefit from the ever-growing volume of research.

As always, the conference provided a couple of days filled with food for thought on the value and power of text mining, for a variety of applications within pharma, healthcare, and the broader life sciences. Thanks to everyone who contributed, and we hope to see you all at our other events across the year!