How do you ensure your healthcare company outshines the competition with so many choices out there? There’s an app for that! Well no - not yet, at least there wasn’t at the time I wrote this blog- I double checked. There is however, the National Committee for Quality Assurance (although no app, they do have a very informative Twitter account.)

The committee’s mission is to help continually ensure quality in health from all parties involved. For insurance companies, they use the Healthcare Effectiveness Data and Information Set (HEDIS) as it is “one of the most widely used sets of health care performance measures in the United States.”[1]. So rather than trying to compare two things that may sound like they are certainly similar, such as ‘pineapples to apples’, people now have a true method of payer comparison.

To learn more about big data analytics for population health, download our case study.

Download the case study

HEDIS consists of a set of measures around patient care and service. Measures vary from simple documentation of an adult Body mass index (BMI), a calculation involving only height and weight; to the more complicated documentation of comprehensive diabetes care.


Pentavere Research Group of Toronto, Canada, was developing a platform to provide health insights from Real-World Evidence (RWE). Pentavere’s aim is to improve healthcare efficiency by allowing life science companies and healthcare providers to understand the impact of clinical decisions made in the primary care setting.

The company’s proprietary platform, daRWEn™, uses digitized, de-identified, and aggregated health information, but much of the valuable data that it wanted to include was locked inside free-form text, making it difficult to extract. Pentavere soon realized that it needed to incorporate natural language processing (NLP) capabilities into its platform in order to access these RWE insights. To achieve this in a timely and efficient manner, it chose to integrate the Linguamatics I2E NLP solution into daRWEn™.

Why Linguamatics? There were several important factors, including:


A project with Drexel University highlighted the importance of being able to apply Natural Language Processing (NLP) to assist with their cohort selection to support medical research, clinical trials recruitment and outcomes analysis.

Drexel was setting up a study into patients with HIV and Hepatitis C and needed to identify potential subjects from their AllScripts EHR. As many organizations do, they had five medical students spend four months (working part-time) trawling through patient records to identify 700 potential study candidates.

The process was particularly painful because simply looking for the ICD codes for HIV and Hepatitis C in structured fields was missing significant numbers of potential subjects. This was caused by variations in where the data was recorded; sometimes it was coded in structured fields; sometimes it was written in the patient narrative that he or she was positive for HIV or Hepatitis C; sometimes it was both.

Assessing the narrative is always a problem with variations in patient history vs family history and “tested for HIV, negative result” and “positive for HIV” requiring careful reading.

Drexel utilized our I2E NLP platform and had indexed a large collection of patient records by extracting documents from AllScripts via their analytical data warehouse.

The data sets were indexed with the usual domain ontologies covering diseases, medications, procedures etc. to support rapid searching in I2E.


There’s a lot of buzz in the healthcare community at the moment surrounding the use of artificial intelligence with machine learning for pattern identification, decision-making, and outcome prediction. The availability of high-quality data for training algorithms is vital to machine learning’s success - but a lot of this information is tied up in unstructured clinical notes. Natural language processing (NLP) is the key to extracting the “good stuff” from this vast trove of unstructured text. Combining that “good stuff” with already structured data helps healthcare providers to understand the patterns and trends in data via machine learning - and thereby enhance care, reduce costs, and improve population health.

Which type of NLP software is best?

The first question that healthcare users must ask themselves is “Which type of NLP software best suits my needs?”

Statistical NLP systems require example data to identify patterns in new data. The examples may come from dictionaries or ontologies - or they might need to be manually annotated by a clinician - which can be an extremely laborious and institutionally costly task.

Meanwhile, most rule-based NLP systems require a specialist to define the types of language rule or pattern that represent certain healthcare concepts. This approach can make them more accurate, but they will be limited only to the patterns that the specialist has thought of.


HIMSS 17

Information Technology AND Healthcare? Why on Earth would you combine such incompatible career fields?

I can’t tell you how many times I was questioned about this in my past. Early on in my career, no one ever told me that my early pursuits of combining my Computer Operations training in the Air Force with my decision to pursue medicine was actually a good idea. In fact, it was quite the opposite. And yet - this year I can give about 45,000 more reasons (the number of attendees at HIMSS 2017 [1]) on why the path led to a promising merging career field after all.

The “missing link” career - people divided by a common career field.