Case Study: Text mining at Sanofi: Genotype-Phenotype Associations

The human leukocyte antigen (HLA) genotype is an important risk factor for multiple sclerosis (MS). As part of a project to discover potential MS biomarkers, Sanofi decided to annotate the association of HLA alleles and haplotypes with diseases and drug hypersensitivity. There are some public resources that associate HLA alleles with over 40 different autoimmune diseases, some cancers, infectious disease, and drug hypersensitivities, but none provides systematic annotation of these associations. Sanofi established a workflow for whole exome sequencing-based HLA typing and analysis that identified more than 400 HLA alleles. They used the Linguamatics I2E platform to analyze and search the literature to annotate the association of the HLA alleles with diseases and drug hypersensitivity. This project resulted in more than double the previous disease associations, and the curated annotations were fed into a searchable knowledge base for broad use within the Sanofi team.

