In healthcare, the excitement about the potential for big data and machine learning is palpable, and there is more accessible electronic information than ever before.

The challenge for the healthcare community is that approximately 80% of the data in a typical electronic health record (EHR) is trapped within unstructured notes, which requires expensive human annotation to make it accessible to machine learning systems.

So what’s the solution? The use of Natural language processing (NLP), another artificial intelligence (AI) technique, can turn this unstructured text into a set of features for machine learning to use. Data-driven, rule-based NLP techniques can extract information from text using linguistic patterns and terminologies with high precision and recall —avoiding the need to manually annotate training data for the machine learning model.

Read the full PM360 article to find out more about how the combination of NLP and machine learning can be a powerful tool for developing predictive models in healthcare and life science.

Read the full article

About David Milward:

David Milward is Chief Technology Officer at Linguamatics. He is a pioneer of interactive text mining, and a founder of Linguamatics. He has over 20 years of experience in natural language processing (NLP) product development, consultancy, and research.


With a background such as mine - medicine/ information technology/ government/ military - you need to know your audience, and ensure acronyms are appropriate.

In healthcare alone, DOA can mean several things: degenerative osteoarthritis, date of arrival, drug of abuse, dead on arrival, etc. Most of which I REALLY don’t want to see in a healthcare analytical report for Rheumatology.

Although ETL is no exception, it is widely used in the world of healthcare now as “Extract Transform and Load” and - unless you are speaking to a someone in the area of pulmonary and respiratory diseases - it will seldom get confused with “expiratory threshold load” which helps determine respiratory muscle efficiency. Then there is AMP, which in medicine is most commonly known as a adenosine monophosphate a vital component in all living cells. But for Linguamatics Health users, AMP is an acronym that is vital in it’s own right and stands for Asynchronous Messaging Pipeline.

Here at Linguamatics we are grateful to have some very talented folks that can explain our technological world in a way that is (sometimes) less technical. Alex Richard-Hoyling ( Senior Solutions Developer) explained how he helps ensure reliable data extraction in large healthcare systems via the Linguamatics Community. Below, I take the subject a step further to cross the chasm of where tech meets med.


I2E Natural Language Processing advances research and care delivery by mining clinical insights from unstructured patient data

Cambridge, UK & Boston, USA – June 22nd, 2017 – Market leading Natural Language Processing (NLP) text analytics provider Linguamatics today announced the implementation of the Linguamatics Health enterprise NLP platform, powered by I2E, at the University of Pennsylvania Health System for the extraction of actionable insights from unstructured patient data.

“We look forward to working with Penn Medicine to help them unlock valuable insights from clinical notes in order to advance research initiatives and enhance the delivery of care,” said Simon Beaulah, senior director of healthcare at Linguamatics. “Our growing community of academic medical centers across the country have deployed the Linguamatics Health platform, and are taking advantage of its ease of use, powerful NLP capabilities, rapid query development and successful integration with enterprise systems. Our platform is particularly well-suited for this environment because it empowers organizations to work independently, and get the data they want without requiring extensive services.”


How do you ensure your healthcare company outshines the competition with so many choices out there? There’s an app for that! Well no - not yet, at least there wasn’t at the time I wrote this blog- I double checked. There is however, the National Committee for Quality Assurance (although no app, they do have a very informative Twitter account.)

The committee’s mission is to help continually ensure quality in health from all parties involved. For insurance companies, they use the Healthcare Effectiveness Data and Information Set (HEDIS) as it is “one of the most widely used sets of health care performance measures in the United States.”[1]. So rather than trying to compare two things that may sound like they are certainly similar, such as ‘pineapples to apples’, people now have a true method of payer comparison.

To learn more about big data analytics for population health, download our case study.

Download the case study

HEDIS consists of a set of measures around patient care and service. Measures vary from simple documentation of an adult Body mass index (BMI), a calculation involving only height and weight; to the more complicated documentation of comprehensive diabetes care.


Pfizer improves Patent Search 10-fold with Linguamatics I2E

Intellectual property is critical in the drug discovery process. Before initiating any new project it is important to understand the patent landscape around any particular disease area, check if there is freedom-to-operate, and assess patentability. The business case to assess commercial viability for a project must cover not just the biology, such as “is there unmet medical need” but also, “what is the IP position”.

Streamlining patent research with natural language processing (NLP) text mining

So, scientists and researchers need to be able to access the information on genes and diseases in patents. But patents can be hundreds of pages long and contain complex information constructions and interconnected facts.  Manual patent research is a time-consuming and costly process. More and more pharma companies, such as Pfizer, are looking to NLP text mining to keep up to date with their patent literature.

Pfizer researchers use Linguamatics Life Science Platform powered by I2E to find patents relating to specific diseases. The results feed a database to visualize gene targets, invention type, competitor organizations and overall patent “relevancy”.