Innovative uses of I2E for life science: the view from Linguamatics Text Mining Summit

November 5 2015

At the October Text Mining Summit, we had speakers from pharma, biotech and academia presenting on an amazing range of different applications of text analytics to provide value within the drug discovery-development pipeline. Over a day and a half we heard from a dozen external speakers from healthcare and pharma, all sharing their enthusiasm for the value that text analytics can bring to the drug discovery, development and delivery environments.

Work presented by UNCC researchers using I2E to understand potential health effects of plant phytochemical: Network map of text-mined associations linking Plant to phytochemical; Phytochemical to human genes; Human genes to biological pathways; Pathways linked to human health phenotypes.

The life science applications ranged from safety, target discovery and alerting, genotype-phenotype annotations, clinical trial analytics, phytochemicals as potential nutraceuticals, and patent landscaping for antibody-drug conjugates.

Back by popular demand, Wendy Cornell (ex-Merck) presented on gaining value from internal preclinical safety reports using I2E, which we’ve discussed in blog posts here before.

Madhusudan Natarajan (Shire Pharmaceuticals) last presented at the TMS in Fall 2013, and he gave an update to his 2013 talk, on the progress from a proof-of-concept project to a “well-oiled machine”, using text mining for systematic examination of gene-disease associations. Shire develops and provides healthcare in the areas of behavioural health, gastrointestinal conditions, rare diseases, and regenerative medicine, and Madhu described work on text analytics disease severity and genotype-phenotype associations for Hunter Syndrome (also known as Mucopolysaccharidosis II), which is a rare disease caused by an X-linked deficiency in iduronate-2-sulfatase.  It was a great talk illustrating some of the challenges for R&D for orphan diseases, particularly around text mining for mutation and variant patterns, which can be reported in so many different ways in the literature. 


Text analytics for rare disease genotype-phenotype annotations:  Mucopolysaccharidosis II or Hunter syndrome is an X-linked deficiency in iduronate-2-sulfatase. Onset of the severe form usually presents at 2 – 4 years of age, and the disease presents with symptoms including bone deformities, hearing loss, frequent respiratory infections, cardiomyopathy, hepatosplenomegaly, and often some level of neurocognitive impairment.


We also had two very different talks on using I2E over patent corpora. Matt Crawford (Pfizer) posed a question for us all: “Full-Text Patent Mining: Can it beat manually curated database subscriptions?” He discussed the pros and cons of using curated databases to find novel target or disease intelligence that can be buried in the mountains of patent text, and how they have built a text-mining pipeline that “recovers a hefty number of patents that would have been missed using the combined curated databases”.

Julia Heinrich (BMS) also reprised her Spring User conference talk, and gave us an update on using I2E to address the question: “Can the infoglut of biotech patent publications be quickly reviewed to enable timely business decision?” Julia’s use case was to use a text mining approach to extract “analysis ready” data from the intellectual property publications around a particular technology – Antibody Drug Conjugates (ADCs) or immunoconjugates.

Richard Linchangco (University of North Carolina at Charlotte) told us why broccoli really is good for us. Richard described how he and others at UNCC are using I2E to research the health benefits (or otherwise) of phytochemicals such as carotenoids, flavonoids, and organosulfur compounds, by developing disease-diet networks to elucidate the molecular mechanisms involved.

Jon Hill (Boehringer Ingelheim) presented on an alerting workflow “Read Alert” that BI have built, to use I2E for automated and centralized alerting of new targets. This workflow deals with the challenges that any researcher has if they want to keep on top of scientific literature – access to different key data sources, ­suitable search query syntax for the different sources (literature, conference abstracts, and patents), and the appropriate clean-up and deduplication needed to give actionable information.

To wrap up the life science customer presentations, Eric Su (Lilly) gave a fascinating talk on using I2E to extract summary statistics from clinical trial databases (TrialTrove and around two therapeutic areas (oncology and diabetes). Published clinical trial records can provide insights to help design new clinical trials, and enable metanalysis by combining data from many trials. Done manually, this is a resource-intensive, repetitive and error-prone task. Using I2E, Eric showed query development to create structured tables for key oncology outcomes such as median overall survival, median progression-free survival, or metabolic indicators such as BMI, body weight change, etc.

All in all, a wide-ranging set of talks, demonstrating both the versatility of I2E and the inventiveness of our customers!