Skip to main content

Text Mining from Bench to Bedside: Perspectives from Industry

Spring Text Mining Conference 2018 text mining presentations from bench to bedside

Pharma and healthcare companies are data rich organizations; however a large proportion of this data isn't as accessible as desired, because the information is locked in unstructured text. Linguamatics I2E is used by our customers across drug discovery, development and delivery of therapeutics to tackle this problem, and our biannual user conferences are always a great way to hear updates on applications of text mining and best practise to help solve these data access challenges.

Last month, we had speakers presenting on where they are getting value from I2E, from bench to bedside. Attendees from the pharmaceutical industry, biotech, healthcare, academia, and partner vendor companies came to our Spring Text Mining Conference, for training sessions, networking, discussions, and of course, excellent presentations and talks.

Structure Activity Relationships from Patents, for Medicinal Chemistry

Starting in early discovery, Ortrud Steinfuehr (Information Manager Bayer AG) presented on “An Approach to provide SAR Data from Patents”. Structure Activity Relationships (SAR) provide information about how the 3D structure of a chemical compound impacts biological activity, such as effective dose or inhibitory concentration values for specific targets. SAR is very valuable for medicinal chemistry research around optimisation and modification of lead compounds. These data are often in patents, typically written in a way to make automated extraction of SAR very tricky.

Ortrud discussed the workflow that Bayer has developed with I2E, from chemical entity recognition to extracting the specific biological context. This workflow uses some important capabilities of I2E for patents, including extraction of information from complex tables, and the ChemAxon name-to-structure integration, and has significantly improved Bayer’s access to novel and up-to-date SAR information.

Rare Adverse Drug Events - the Proverbial Needle in the Haystack!

Safety is of course a critical aspect for pharma drug discovery and development. Pia Rosmanitz (Senior Information Professional at Merck EMD) works in scientific information services, and serves a broad user base across the business. Pia talked about one particular question from a safety team, who needed to know the answer to a very specific question around skin cancer adverse events of PD-1/PD-L1-inhibitors. These are new immuno-oncology drugs in development, so there is very little information published in scientific literature, and the skin cancers of interest are also very rare adverse events.

To make the search more complex, these inhibitors are also used as therapeutics for some of the rare skin cancers. Pia uses Linguamatics cloud-hosted instance of, and was able to use I2E to find reliable information in the “serious event” field from 500 trials, which are completely unsearchable via the public website. This enabled Pia to provide a comprehensive set of adverse event relations to answer the team’s safety questions, rapidly and effectively.

Text Mining at Mundipharma - IDMP Regulatory Compliance and Beyond

Moving along the drug development pipeline, many pharma customers are using I2E for a variety of applications within regulatory affairs. Jon Sanford (Head of Regulatory Information Management and Operations at Mundipharma Research) gave us an update on the production deployment of I2E AMP for extracting key data elements from 1800 Summary of Product Characteristics (SmPC) regulatory documents.

This was a joint presentation with John Haldoupis (Linguamatics) providing the technical aspects of using AMP for ETL, and Barry Hammond (Terminologeze) presenting the quality review. This workflow provides output from non-structured English, French, German and Danish SmPCs in a structured format to aid data review, with a significant reduction in time needed compared to manual processes (at least an 80% reduction in effort). Jon also talked about using these data in MDM (master data management) for reuse across IT platforms in clinical, regulatory, and elsewhere.

Exploring Text Mining in the World of Business News

The Novo Nordisk Global Information and analysis (GLIA) team supply information to answer business questions for R&D and commercial groups. GLIA have been powering their text searches with I2E for years, and Solmaz Gabery Adams (Senior Information Scientist) and Jes Hansen (Business Information Scientist) presented on using I2E with Dow Jones’ recent development for news feeds, DNA. News data are noisy, so using text-mining enables data extraction with ontolgoies, provides structure, and can extract valuable insights from news data. Jes and Solmaz showed some of the dashboards that visualise the I2E output, giving up-to-date views on trends to their business end-users.

Text and Data Mining Strategies at AbbVie

Matthias Negri (Knowledge and Data Scientist, AbbVie) is also used to providing information to a diverse range of internal customers, not just in R&D but across the company. Matthias talked about a variety of strategies they have employed at AbbVie, to get the best out of I2E for specific questions. This can include using bespoke vocabularies, building stop lists to help with poor or ambiguous synonyms (a common issue for genes and proteins), creating sub-corpora, and providing dashboards to empower end-users.

As always, as well as the customer presentations, the conference also provided the opportunity for everyone to hear about the latest updates to Linguamatics product portfolio, including new interfaces, I2E AMP workflows and automation, and how I2E can be used to feed machine learning models, in both healthcare and pharma. 

The Conference was held at the Moller Centre, Churchill College, Cambridge, and the attendees enjoyed the excellent facilities, wonderful food, and beautiful environment, and lively discussions. Thank you to everyone who contributed, and we hope to see you all at other events across the year. As always, if you would like to hear more about any of these use cases or updates, please do contact us

Ready to get started?

Request a Demo

Questions? Ask our experts