Big data, real world data - where does text analytics fit in?

February 26 2015

Big data? Real world data? What do we really mean?

I was at a conference a couple of weeks ago, an interesting two days spent discussing what is big data in the life science domain, and what value can we expect to gain from better access and use.

The key note speaker kicked off the first day with a great quote from Atul Butte: “Hiding within those mounds of data is knowledge that could change the life of a patient or change the world”.

This is a really great ambition for data analytics.  But one interesting topic was, what do we mean by big data? One common definition from some of the Pharma folk was, that it was any sort of data that originated outside their organization that related to patient information.

To me, this definition seems to refer more to real world data – adverse event reports, electronic health records, voice of the customer (VoC) feeds, social media data, claims data, patient group blogs. Again, any data that hasn’t been influenced by the drug provider, and can give an external view – either from the patient, payer, or healthcare provider.

Many of these real world sources have free text fields, and this is where text analytics, and natural language processing (NLP), can fit in. We have customers who are using text analytics to get actionable insight from real world data – and finding valuable intelligence that can inform commercial business strategies.

Valuable information could be found in electronic health records, but these are notoriously hard to access for Pharma, with regulations and restrictions around data use, data privacy etc.

So, what real world data are accessible?

Our customers are inventive, and have used data types such as clinical trial reports, clinical investigator brochures, National Comprehensive Cancer Network (NCCN) guidelines, and VoC call transcripts.

VoC call transcripts are a rich seam of potential patient reported outcomes, side effects, drug interactions, and more.

The medical information group at Pfizer have used Linguamatics I2E text analytics solution to access insights that can have a huge impact on commercial business decisions. It has been their strategic goal to efficiently analyze unstructured data to prompt decision makers to the signals that come from users of Pfizer products.


Workflow for text analytics over unstructured VoC feeds

Researchers in the predictive analytics group built a workflow to take the call transcripts, process them using advanced text analytics to make sense of the unstructured feeds, and visualize the output to see trends, and build predictive models around the different products and the real-world data coming back from patients, consumers, medical assistants, pharmacists, or sales representatives.

The calls could be categorized and tagged for key metadata such as caller demographics, and reason for calling (e.g. complaint, formulation information, side effect, drug-drug interactions).

Key product questions posed by Medical Informatics, to examine unexpected side effects, off-label use, lack of efficacy, dose-related questions, and separating side effects from pre-existing conditions.

Text analytics enabled the medical affairs researchers to deepen the relationship for drug-disease associations, by looking within the call logs for information on pre-existing conditions, and relating these to the potential side effects reported in the call log.

These associations enabled over 70% of the reported side effects to be related to underlying pre-exisiting conditions – and not an ADR.

So does this count as big data? Of course it all depends on your definition. But if you think of the classic 3 Vs – velocity, variety, and volume – then maybe there is a fit – these feeds are unstructured complex text, and Pfizer receive about 1 million messages per year on their 1-800 number. So, not huge velocity, but reasonable volume, and definitely variety.

And, if analysed well, there’s huge potential value.  At least, that’s our view – we’d love to hear what you think?

The 3Vs that define Big Data