A project with Drexel University highlighted the importance of being able to apply Natural Language Processing (NLP) to assist with their cohort selection to support medical research, clinical trials recruitment and outcomes analysis.

Drexel was setting up a study into patients with HIV and Hepatitis C and needed to identify potential subjects from their AllScripts EHR. As many organizations do, they had five medical students spend four months (working part-time) trawling through patient records to identify 700 potential study candidates.

The process was particularly painful because simply looking for the ICD codes for HIV and Hepatitis C in structured fields was missing significant numbers of potential subjects. This was caused by variations in where the data was recorded; sometimes it was coded in structured fields; sometimes it was written in the patient narrative that he or she was positive for HIV or Hepatitis C; sometimes it was both.

Assessing the narrative is always a problem with variations in patient history vs family history and “tested for HIV, negative result” and “positive for HIV” requiring careful reading.

Drexel utilized our I2E NLP platform and had indexed a large collection of patient records by extracting documents from AllScripts via their analytical data warehouse.

The data sets were indexed with the usual domain ontologies covering diseases, medications, procedures etc. to support rapid searching in I2E.


There’s a lot of buzz in the healthcare community at the moment surrounding the use of artificial intelligence with machine learning for pattern identification, decision-making, and outcome prediction. The availability of high-quality data for training algorithms is vital to machine learning’s success - but a lot of this information is tied up in unstructured clinical notes. Natural language processing (NLP) is the key to extracting the “good stuff” from this vast trove of unstructured text. Combining that “good stuff” with already structured data helps healthcare providers to understand the patterns and trends in data via machine learning - and thereby enhance care, reduce costs, and improve population health.

Which type of NLP software is best?

The first question that healthcare users must ask themselves is “Which type of NLP software best suits my needs?”

Statistical NLP systems require example data to identify patterns in new data. The examples may come from dictionaries or ontologies - or they might need to be manually annotated by a clinician - which can be an extremely laborious and institutionally costly task.

Meanwhile, most rule-based NLP systems require a specialist to define the types of language rule or pattern that represent certain healthcare concepts. This approach can make them more accurate, but they will be limited only to the patterns that the specialist has thought of.


HIMSS 17

Information Technology AND Healthcare? Why on Earth would you combine such incompatible career fields?

I can’t tell you how many times I was questioned about this in my past. Early on in my career, no one ever told me that my early pursuits of combining my Computer Operations training in the Air Force with my decision to pursue medicine was actually a good idea. In fact, it was quite the opposite. And yet - this year I can give about 45,000 more reasons (the number of attendees at HIMSS 2017 [1]) on why the path led to a promising merging career field after all.

The “missing link” career - people divided by a common career field.


Risk stratification has, so far, been biased toward structured data due to accessibility issues. As interest in long-term member wellness increases in importance it is the insights trapped in unstructured data that will become the differentiator in a changing and competitive market. The payers who are able to characterize member groups at a fundamentally more detailed level will have the advantage of population insight over those who struggle to do so.

Data sources that are increasing in scale and availability include electronic healthcare records (EHRs) data in Continuity of Care Document (CCD) format from providers, OCR notes about members, and nurses’ notes.

How can payers make effective use of unstructured data to stratify populations more effectively when much of their infrastructure is tied to structured data? Sources of unstructured data contain significantly more detail about members but are much more varied.

Here at Linguamatics Health, our Clinical NLP specialists understand the urgency and complexity of bringing together data sources, both structured and unstructured, in a workflow that gets you to insights you need quickly.


Ever find an acute problem such as a fracture, which shows in a Problem List, but healed months ago? Or perhaps the problem list states a case of bronchitis that may have been transient or may actually be Chronic Obstructive Pulmonary Disease (COPD)? After all, a diagnosis of COPD is a collaboration of symptoms and test results. How many clinicians find the spare time to go retrospectively back in the EHR and calculate a patient’s, “coughing with excessive sputum nearly everyday for at least 3 months of the year, for 2 years in a row” [1]?

But fixing the problem list can be time-consuming and complicated. Isn’t there an alternative (better) way?

Many organizations believe that in order to derive an accurate picture of their population’s health, medication lists can be just as good as their problem list. What if you find a patient taking an atypical antipsychotic medication and they don’t have a diagnosis that coincides on their Problem List? Can we just assume a mental health diagnosis? After all, this conclusion seems logical. Or is it? Is it an oversight on their Problem List or are they prescribed it for an off-label reason? According to the Agency for Healthcare Research and Quality (AHRQ), a 2011 report stated off-label atypical antipsychotic medications uses. This included areas such as; anxiety, ADHD, behavioral disturbances of dementia and severe geriatric agitation, MDD, eating disorders, insomnia, OCD, PTSD, personality disorders, substance abuse, and Tourette's syndrome. [2].

Therefore, can we really make assumptions?