I attended a Big Data in Pharma conference recently, and very much liked a quote from Sir Muir Gray, cited by one of the speakers: "In the nineteenth century health was transformed by clean, clear water. In the twenty-first century, health will be transformed by clean clear knowledge."  

This was part of a series of discussions and round tables on how we, within the Pharma industry, can best use big data, both current and legacy data, to inform decisions for the discovery, development and delivery of new healthcare therapeutics. Data integration, breaking down the data silos to create data assets, data interoperability, use of ontologies and NLP - these were all themes presented; with the aim of enabling researchers and scientists to have a clean, clear view of all the appropriate knowledge for actionable decisions across the drug development pipeline. 

A new publication describes how text analytics can provide one of the tools for that data interoperablity ecosystem, to create a clear, clean view.  McEntire et al. describe a system that combines Pipeline Pilot workflow tools, Linguamatics I2E NLP linguistics and semantics, and visualization dashboards, to integrate information from key public domain sources, such as MEDLINE, OMIM, ClinicalTrials.gov, NIH grants, patents, news feeds, as well as internal content sources.


What if physicians could offer patients access to a potentially life-preserving test, but could not easily identify which of their patients were eligible?

That is the exact situation many providers have found themselves in since Medicare announced it would begin covering lung cancer screening for patients meeting a certain set of criteria.

In a decision memo published February, 2015, CMS agreed to make Medicare coverage available for a low dose computed tomography (LDCT) lung cancer screening for eligible patients. Patients who are between ages 55 and 77, asymptomatic, are either a current smoker or have quit within the last 15 years, and, have a tobacco smoking history of at least 30 pack-years can now qualify for an annual preventative screening.

CMS added the coverage after determining there was sufficient evidence that LDCT procedures were cost-effective for high risk populations. A study by the National Lung Cancer Screening Trial, for example, found that 12,000 deaths a year could be avoided if high-risk patients underwent a LDCT scan. Lung cancer is currently the leading cause of cancer-related death among both men and women in the US.


Linguamatics hosted our Spring Text Mining Conference in Cambridge last week (#LMSpring16). Attendees from the pharmaceutical industry, biotech, healthcare, personal consumer care, crop science, academia, and partner vendor companies came together for hands-on workshops, round table discussions, and of course, some excellent presentations and talks. 

The talks kicked off with a presentation by Thierry Breyette, Novo Nordisk, who described three different projects where text mining provided signficant value from real world data.  Thierry took the RAND Corporation definition: "Real-world data (RWD) is an umbrella term for different types of data that are not collected in conventional randomised controlled trials. RWD comes from various sources and includes patient data, data from clinicians, hospital data, data from payers and social data."

At Novo Nordisk they have gained business impact by text mining a variety of souces, including: social media to find digital opinion leaders; conversation transcripts between medical liaisons and healthcare professionals for trends around clinical insights; and mining patient & caregiver ethnographic data to see patterns in patient sentiment and compliance.


There's been a lot of excitement around recent studies on immuno-oncology in which the body’s immune defences are corralled to fight cancer. Experts consider it the most exciting advance since the development of chemotherapy half a century ago.

Many of our customers are involved in anti-cancer approaches based on modulation of immunosuppressive properties of immune cells; and are using I2E to help generate insight around immuno-oncology and the tumor microenvironment (TME). Cancers can be viewed as complex ‘rogue’ organs, with malignant cells surrounded by blood vessels and a variety of other cells, including immune cells, fibroblasts, lymphocytes, and more. The tumor cells and the surrounding non-transformed cells interact constantly, and developing a better understanding of these TME interactions is a valuable approach in immuno-oncology drug development.

Knowledge in this field is growing very rapidly which makes it very difficult for scientists to capture it manually, both because of the volume of publications, but also the variety and complexity of information.

Challenges include ensuring a thorough search to capture relationships between genes/proteins and their effect or correlation on or with a variety of cellular actors. These cellular actors included many of the immune system cells currently under investigation for immunotherapeutic approaches to oncology.

I2E provides the capability to find and extract these interactions from textual data, including capture of negation where needed. I2E allows efficient and effective searches over millions of text documents, and can harmonize the output to enable computational post-processing and visualization of these complex data.


The Linguamatics Booth #345 at this year’s Bio-IT Conference (April 5-7 in Boston) offers the ideal opportunity to catch up with the latest developments in text mining.

Here are 3 reasons to meet the market leader in text analytics for life science and healthcare: