Skip to main content

Precision Medicine

Precision medicine refers to the ability to tailor treatment to specific patient groups. NLP supports this initiative by rapidly extracting actionable insights from textual sources such as PubMed Central® (PMC) and MEDLINE®.

  1. Overview
  2. Precision Medicine Research in Healthcare and Pharma
  3. Challenges for Precision Medicine within Pharma and Healthcare
  4. Role of Natural Language Processing in Precision Medicine
  5. NLP Case Studies in Precision Medicine Research
  6. Platform Demo

The terms precision medicine, personalized medicine, individualized or genomic medicine all refer to the ability to tailor treatment to the most appropriate group of patients, either at the clinical level, or within drug discovery and development.


Within the clinical arena, understanding the best treatment pathway for a particular patient or group of patients requires an ability to access and analyze information about many different aspects of patients’ lives beyond just their medical history.

Within the pharmaceutical industry, the annotation of high-throughput biological screens, such as next generation sequencing (NGS), can provide information for pharmacogenomic-related drug development: from biomarker discovery and target evaluation to patient stratification and clinical profiling. Pharmaceutical organizations are also interested in using real world data (RWD), such as electronic medical records (EMRs), to understand effectiveness of therapies in patient sub-populations.

However, many of the sources needed for understanding detailed clinical phenotypes or genotype-phenotype relationships are unstructured text. Linguamatics NLP platform can unlock the value from unstructured text sources including EMRs, scientific literature, conference abstracts or internal reports.

Transformation of Traditional R&D in Pharma

Over the past decades, the pharmaceutical industry as a whole has witnessed declining research and development productivity (Scanell et al. 2012), increased pricing pressures and deeply eroded operating margins. We are also seeing a failing blockbuster medicine model: one-size-fits-all no longer works, and there is a need for a precision medicine approach.

The cost curve has also changed, leading to a shift in focus and investment from acute care to preventive healthcare, which is closely aligned with precision medicine. A better understanding of the molecular basis of disease is key to the development of personalized medicine across pharmaceutical R&D, as discussed a few years ago by Janet Woodcock, Director of the FDA’s Center for Drug Evaluation and Research (CDER) in her presentation the “Coming of Age of Personalized Medicine”.

Value-Based Care and Population Health

Value-based care and population health management increasingly demand personalized and precision care delivery. Healthcare consumerism and the explosion in patient data are enabling individual patient stratification, which is critical for the precision health practice.

We’re also seeing a shift from treatment to overall prevention, diagnostics and monitoring. This trend is well aligned to the goal of precision medicine: enhancing the quality of life. It is moving towards patient engagement and disease management thereby promoting early diagnosis and reducing the overall cost of treatment. We are now at the point where the focus is on prevention and wellness. This approach requires healthcare professionals to manage and mitigate risk, with the result that the importance of data-driven precision medicine is going to increase significantly over the next 5 years.

Transitioning from one-size-fits-all to precision medicine with multi-level patient stratification.

Source: Forbes

Precision Medicine Research in Healthcare and Pharma

Origins of Precision Medicine

Effective drug development and the health management of individual patients requires a precision medicine approach. During his 2015 State of the Union address, President Barack Obama announced details of his administration’s Precision Medicine Initiative, which promised to accelerate the development of tools and therapies that are customized to individual patients. There are similar initiatives in many countries, with goals to use genomics and clinical data to more deeply understand the interplay of genotype and phenotype in health and disease.

Precision medicine takes into account healthcare’s relatively minor role in impacting a patient’s overall health and well-being, compared to the larger roles of genetics, health behaviors and social and environmental factors. It focuses on disease treatment and prevention, and takes into account the variability in genes, environment and lifestyle between individual patients. As such, it should be fuelled by evidence-based insights derived from population-level analysis.

To achieve these goals, the whole patient population must be continuously monitored to identify movement between risk categories. This requires looking at individual patients and uncovering changes that might impact wellness, such as lifestyle behaviors, living arrangements or ambulatory status. However, these changes are typically only recorded in unstructured patient notes, and thus are not easily monitored. To complicate matters further, providers and payers often need these insights in real time to support clinical decisions.

For example, consider the treatment of patients with chronic obstructive pulmonary disease (COPD). A provider could identify COPD patients based on a broad disease code, but a full evaluation of an individual patient requires an understanding of that patient’s risk factors and long-term prognosis. Ideally, therapies should be developed and chosen based on genetic, environmental and lifestyle factors.

Clinicians need to understand multiple details that are not easily captured in a structured format, such as the patient’s ongoing exposure to second-hand smoke, previous exposure to environment pollutants, workplace chemicals or known genetic predispositions.

Typically, a clinician would need to read multiple pages of notes to glean clinically relevant details. Alternatively, by utilizing an advanced technology such as NLP, clinicians can uncover these nuances quickly and on a much wider scale.

If I manually sit at a computer, I could find 25 phenotypes but after training Linguamatics, I can find 130.

Benjamin Darbro, MD, Associate Professor of Pediatrics at the Stead Family Department of Pediatrics 1

1 Quoted in Health Data Management external article

Pharmacogenomics and Targeted Therapies

Since the human genome was published in 2001, there has been growing debate about the potential application of this genomic and genetic knowledge to personalized medicine and, in recent years, the pharmaceutical industry appears to be approaching this goal.

Pharmacogenomics is the study of the role an individual’s genome plays in drug response, which can vary from adverse drug reactions to lack of therapeutic efficacy.

The FDA’s CDER has been urging adoption of pharmacogenomics strategies and the pursuit of targeted therapies for a variety of reasons: the potential for decreasing response variability, improved safety and increasing the size of treatment effect by stratifying patient populations. With the recent explosion in data from next generation sequencing (NGS) technologies, one of the remaining bottlenecks in the application of genomic variation data is access to the appropriate annotation.

From NGS workflows, scientists can quickly identify lists of candidate genes that differ between two conditions (e.g. case-control or family hierarchies). Gene annotations are essential to interpreting these gene lists, and to discovering fundamental properties like gene function and disease relevance.

Read this blog to find out how NLP enhances next-generation sequencing data analysis:


Key sources for these annotations include the ever-growing biomedical literature either in structured databases such as the Catalogue of Somatic Mutations in Cancer (COSMIC), Genetic Association Database (GAD) and DGA Signatory Database for Distributors. Valuable information can also be found from textual sources such as PubMed Central® (PMC), MEDLINE® and Online Mendelian Inheritance in Man (OMIM).

Challenges for Precision Medicine within Pharma and Healthcare

Precision medicine demands a thorough understanding of the natural history of a disease, particularly the genes and gene variants involved and their relationship to patient phenotypes. This is especially important in the treatment of cancer, and is driven by the wider availability of genomic analysis and more effective access to electronic medical records.

With the advent of high throughput genomics technologies, patients can be rapidly screened via gene panels, RNA Sequencing, whole genome analysis or other methods. However, a significant bottleneck is now the biological interpretation of results. Existing databases of genetic variant data are incomplete and not always up to date, so researchers resort to manual searching of literature resources (abstracts, full text papers, conference reports, etc); which can be very time consuming.

Another critical area is the thorough understanding of patient phenotypes, both at the individual and population level. The structured data in EMRs provides some information on phenotypes, but there is much more detail buried in the unstructured text of the genetic testing referral forms or medical charts.

Role of Natural Language Processing in Precision Medicine

Natural language processing (NLP) provides an alternative to traditional search methods, allowing the rapid extraction of actionable insights from textual sources such as PubMed Central® (PMC), MEDLINE®, Online Mendelian Inheritance in Man (OMIM) and electronic health records (EHR).

Unstructured data is the fundamental driving force for precision medicine analytics, and NLP transforms the data buried in unstructured text into structured information that can be used in analytics. NLP in a healthcare environment enables clinicians and researchers to gain a complete picture of the patient, add more detail to the basic information from the structured data and deepen their understanding of the individual.

Tailoring drug development and delivery for disease sub-groups can mean searching for sparse data (e.g. for rare diseases), so an NLP platform such as Linguamatics can help find the “needles in a haystack”.

Keyword search is not sufficient for analyzing data on this scale, but NLP-based text mining, powered by ontologies, can bring real benefits.

The advantages of using Linguamatics NLP platform for precision medicine research include:

  • Extract phenotypic, life style factors and social determinants of health information from electronic health records: Linguamatics NLP platform reduces the manual effort required to extract phenotypic information from electronic health records (EHR) as well as important predictive characteristics such as social determinants of health and life style factors that can impact treatment decisions
  • Identify disease associations between phenotype and variant: To characterize rare diseases and variants of unclear significance, researchers can run Linguamatics NLP platform queries using broad vocabularies for genes, diseases, variants & mutations to identify disease associations between phenotype and variant.
  • Remove manual search: Linguamatics NLP platform removes the need to manually search and review variants associated with rare disease or poorly annotated mutations.
  • Extract all possible gene variants and mutation patterns: Keyword search is not sufficient for healthcare research: users need broad vocabularies (including coverage of common synonyms) for genes, diseases and patterns to extract all possible gene variants and mutation patterns.
  • Search multiple scientific sources at once: Linguamatics NLP platform enables users to search literature, conference reports and other sources for genotype-phenotype associations.
  • Visualize structured output for rapid analysis and understanding, or integrate into databases and business dashboards.

NLP Case Studies in Precision Medicine Research

Genotype-Phenotype Associations in Multiple Sclerosis at Sanofi

Discover how Sanofi used Linguamatics NLP platform for literature mining, and to annotate the association of human leukocyte antigen (HLA) alleles with diseases and drug hypersensitivity as part of a multiple sclerosis (MS) biomarker discovery project.

Shire’s Systematic Examination of Gene-Disease Associations

Shire employs Linguamatics NLP platform for the systematic examination of gene-disease associations. In this webinar, Madhusudan Natarajan will discuss the value of text analytics for disease severity and genotype-phenotype association, focused on Hunter Syndrome. Hunter Syndrome is a rare disease, also known as Mucopolysaccharidosis II, is caused by an X-linked deficiency in iduronate-2-sulfatase. Deriving systematic annotation around patient genotypes for disease specific mutations, and correlating these to efficacy scores, immunogenicity responses etc. can offer tremendous insight into patient genotype-phenotype relationships, as well as patient genotype-outcome relationships.

A meta-analysis of immunogenicity responses to administered drugs, based on patient genotype, was recently presented to, and accepted by, the European Medicines Agency. Using text analytics, these relationships have been extended to patient registries to fulfil reporting requirements to regulatory bodies.

University of Iowa uses NLP to Improve Phenotype Extraction for Precision Medicine

Identifying the best treatment pathway for a patient or group of patients requires the ability to analyze detailed information from each patient's medical record, together with broader aspects beyond their immediate medical history.

At the University of Iowa, scientists at the Stead Family Children’s Hospital are working on a precision medicine research project using Linguamatics natural language processing (NLP) platform to extract phenotype details from electronic medical records of patients with suspected genetic disorders. Over 700 patients undergo chromosomal microarray (CMA) testing at the university each year. Classification of CMA results into Normal, Abnormal or VUS (Variant of Unclear clinical Significance) depends heavily on manual chart review and subjective determination of the relevance of the genetic variant found to the clinical phenotype.

Using manual methods, the time taken to extract phenotypes for 100 patients is just over 34 hours, compared to 10 minutes using natural language processing. Extrapolating this to the 700 CMAs run at University of Iowa Stead Family Children’s Hospital translates to nearly 240 hours manually vs. just 1.2 hours using Linguamatics NLP platform.

Webinar: NLP in Precision Medicine: Real-World Clinical and Research Applications

This webinar discusses how NLP is being used to transform unstructured source data into clinical and research decision support insights, and hear about some of the latest precision medicine application areas including:

  • Computational phenotyping with the Humana Phenotype Ontology
  • NLP-based assessment of variants of unknown significance in medical literature
  • Genotype-phenotype data mining for rare disease patient stratification

NGS Annotation within Pharmacogenomics and Personalized Medicine

Next-generation sequencing (NGS) is increasingly being applied across the drug discovery and development pathway, for example, in target evaluation, patient stratification and clinical profiling. However, biological interpretation of the output of NGS is very time-consuming, being a mostly manual process of literature searching and annotation of the gene results.

This webinar shows how Linguamatics NLP platform can be used to collate a comprehensive gene profile, with key biological annotation from a combination of sources like MEDLINE®, OMIM and NIH Grants.

Platform Demo



Ready to get started?

Request a Demo

Questions? Ask our experts