Using NLP-based Text Mining to Gather Patient Insights from Social Media at Roche

Pharmaceutical companies are increasingly recognizing that patients’ social media posts are a valuable source of insight into patient-reported outcomes, views, symptoms, use of competitive products and more. Analyzing social media posts has the potential to provide insights from a broad population of patients, healthcare professionals, and key opinion leaders.

The standard methodology to gather these insights is primary market research. Pharma companies work hard to establish and maintain focus groups for live interviews, questionnaires and other ways to gather insights from patients and providers. However, the research process is time-consuming and expensive, requiring pharma companies to invest a significant amount of resources; and the number of patients that can be targeted is small.

Analysis of social media has the potential to be more efficient, more cost-effective, and address much larger patient populations than primary market research. The problem in analyzing social media content, however, is that social platforms represent a constantly flowing firehose of noisy information, so separating the signal from the noise poses a significant challenge.

Text mining with Natural Language Processing (NLP) technology represents an alternative approach to unlocking the value of social media content. By enabling artificial intelligence-based NLP tools to gather and analyze social data, pharmaceutical companies can more swiftly and efficiently glean insights that will influence the drug development process.

Deeper insight into Parkinson’s symptoms

Like many pharmaceutical companies in recent years, Roche has endeavored to better capture the voice of the patient across drug discovery and development. Obtaining patient feedback helps pharmaceutical companies ensure that the drugs they’re developing are truly helping those they are intended to assist. For example, patient feedback can provide actionable information on how certain pharmaceuticals are effecting a patients’ quality of life or whether a clinical trial’s clinical endpoints are relevant to patients.

For a developmental drug to treat Parkinson’s Disease, Roche sought to broaden its understanding of the Parkinson’s conceptual disease model, the representation of the symptoms and impacts of the disease. A robust conceptual model provides a basis for the evaluation of any treatment in terms of its benefit to patients.

Roche’s researchers established a series of NLP-based text mining queries to analyze patient discussions around Parkinson’s across several social platforms, blogs, patient forums and relevant websites. The analysis scope of the project was observational research only, with data downloaded from open health social networking sites and communities. NLP enabled the researchers to use open linguistic patterns to find and extract unknown symptoms and impacts, by looking for any words following phrases such as, “I was feeling xyz”, “my legs started aching and xyz”.

As a result of the text-mining analysis, Roche researchers gained two major insights. They added two impacts to the conceptual disease model, which had been identified by literature yet their relevance to patients required further evidence; and they also identified several additional symptoms. A journal publication describing the process by which Roche developed the final clinical model, including patient interviews, is in preparation.

By performing an NLP-enabled social media listening projects, Roche has enhanced its understanding of symptoms that impact Parkinson’s patients, and has gained awareness that can improve the design of future clinical trials.


Visit our RWD page for more info on NLP across a wide range of real world data, including social media.

Read a related article where NLP was used to clarify information found on social media:

READ ARTICLE