There’s a Clay Shirky quote I like, that focuses on the importance of communication: “When we change the way we communicate, we change society”. There are so many different ways of communicating these days, with web meetings, calls, email, skype, twitter, and more. Indeed, global communication has never been easier, and making connections with colleagues, collaborators and customers across the world can be carried out from wherever we find ourselves, from our desks, coffee shops, airport lounges or even remote resorts. While “thinking global” is often almost taken for granted these days, it’s important to keep in mind the power of local networks and face-to-face human interaction. 

At Linguamatics, we are fortunate to be based within one of Europe’s key technology clusters. The Cambridge technology cluster includes many different networks engaged in research and development in vital areas including genomics, personalized medicine, rare diseases, big data technologies, and more.  

Linguamatics has recently become affiliated with the Milner Therapeutics Consortium, a new network dedicated to the conversion of basic science into therapies. Its mission is to accelerate academic research towards medical advancement by forging close collaborative interactions with industry. 

Shifting payment models based on quality and value are fueling the demand for insights into the health of populations. This demand requires the analysis of vast amounts of patient data. For example, before healthcare organizations can implement pre-emptive care programs, they must first identify the relative risk of their patient population. This is based on a variety of clinical, financial, and lifestyle factors, including:

  • Problem list of patients, especially chronic conditions
  • Procedures, medications and other hospital data
  • Claims information
  • Risk factors such as tobacco, alcohol and drug use
  • Availability and accessibility of health services and social support.

As illustrated in Figure 1, a healthcare population typically includes a relatively small percentage of the highest-risk patients, though these least healthy patients usually account for the biggest percentage of overall healthcare costs.


Figure 1: Level of patient risk associated with population segments and
their cost implications; a relatively small segment of the population
accounts for a disproportionate percentage of healthcare costs

With new and exciting technologies, it often happens that one particular application or use case leads the way initially… and then, when the euphoria turns into commercial reality, people start looking at other applications where the new technology can also bring value. In text mining, the same holds true. Pharma companies have now been using NLP text mining technologies for many years, in areas such as target validation, gene-disease associations, clinical trial optimization, and patent analytics, for example. As they become comfortable and, indeed, expert in these areas, attention has turned to areas where the core technology needs to be adapted or tweaked to meet a specific requirement.

For example, when looking to apply NLP to the time-consuming and costly business of discovering new, novel compounds, users hit a significant issue; trying to understand every single component part of some of the long chemical names. Not an insurmountable problem, but one that needed time, expertise and determination.

I attended a Big Data in Pharma conference recently, and very much liked a quote from Sir Muir Gray, cited by one of the speakers: "In the nineteenth century health was transformed by clean, clear water. In the twenty-first century, health will be transformed by clean clear knowledge."  

This was part of a series of discussions and round tables on how we, within the Pharma industry, can best use big data, both current and legacy data, to inform decisions for the discovery, development and delivery of new healthcare therapeutics. Data integration, breaking down the data silos to create data assets, data interoperability, use of ontologies and NLP - these were all themes presented; with the aim of enabling researchers and scientists to have a clean, clear view of all the appropriate knowledge for actionable decisions across the drug development pipeline. 

A new publication describes how text analytics can provide one of the tools for that data interoperablity ecosystem, to create a clear, clean view.  McEntire et al. describe a system that combines Pipeline Pilot workflow tools, Linguamatics I2E NLP linguistics and semantics, and visualization dashboards, to integrate information from key public domain sources, such as MEDLINE, OMIM,, NIH grants, patents, news feeds, as well as internal content sources.

What if physicians could offer patients access to a potentially life-preserving test, but could not easily identify which of their patients were eligible?

That is the exact situation many providers have found themselves in since Medicare announced it would begin covering lung cancer screening for patients meeting a certain set of criteria.

In a decision memo published February, 2015, CMS agreed to make Medicare coverage available for a low dose computed tomography (LDCT) lung cancer screening for eligible patients. Patients who are between ages 55 and 77, asymptomatic, are either a current smoker or have quit within the last 15 years, and, have a tobacco smoking history of at least 30 pack-years can now qualify for an annual preventative screening.

CMS added the coverage after determining there was sufficient evidence that LDCT procedures were cost-effective for high risk populations. A study by the National Lung Cancer Screening Trial, for example, found that 12,000 deaths a year could be avoided if high-risk patients underwent a LDCT scan. Lung cancer is currently the leading cause of cancer-related death among both men and women in the US.