Text Mining at Sanofi for Genotype-Phenotype Associations in Multiple Sclerosis

June 5, 2018
Venue: Online webinar

When: Tues, June 5, 2018
Time: 8:00 - 9:00am (PST) | 11:00am - 12noon (EDT) | 4:00 - 5:00pm (BST) | 5:00 - 6:00pm (CEST)
Length: 1 hour

Access & watch

This webinar will discuss how Sanofi used literature mining to annotate the association of human leukocyte antigen (HLA) alleles with diseases and drug hypersensitivity as part of a multiple sclerosis (MS) biomarker discovery project.

For any drug development project, it is important to have a comprehensive understanding of the genetic associations for the disease of interest. While public databases of genomic variants provide valuable information, there can be many gaps in the biological knowledge. For Sanofi’s internal MS biomarker project, they needed a comprehensive catalogue of annotations to HLA alleles and turned to Linguamatics I2E to text mine the scientific literature.

The HLA region is the most polymorphic region of the human genome. HLA alleles have been associated with more than 40 different autoimmune diseases, various types of cancer, infectious disease, and drug adverse events. However, there are no known resources that systematically annotate the association of HLA alleles and diseases.

For the Sanofi MS project, a workflow was established for whole-exome sequencing-based HLA typing and analysis. This identified more than 400 HLA alleles. The Linguamatics I2E platform was used to search the literature to annotate the association of the HLA alleles with diseases and drug hypersensitivity. This project resulted in more than double the previous disease associations and the curated annotations were fed into a knowledge base for broad use within the Sanofi team.

What will you learn?

  • How natural language processing (NLP) text mining can extract structured data from unstructured text in scientific papers

  • How text mining is used at Sanofi to extract the most up-to-date published knowledge for a gene or group of genes, including information on diseases and specific allele variations

Speakers & Bios:

Dongyu Liu, Associate Director, Translational Sciences at Sanofi

Dongyu Liu is associate director in the science computing group of Translational Sciences department at Sanofi. Dongyu’s research interests include bioinformatics, data mining and text mining. He plays a key role in bringing in and employing text mining technology to support ongoing research projects in Sanofi. He received a Ph.D. from University of Rochester, and did postdoctoral research at Whitehead Institute.

Jane Reed, Head of Life Science Strategy at Linguamatics

Jane Reed is the head of life science at Linguamatics. She is responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science domain. Jane has extensive experience in life sciences informatics. She worked for more than 20 years in vendor companies supplying data products, data integration and analysis and consultancy to pharma and biotech - with roles at Instem, BioWisdom, Incyte, and Hexagen. Before moving into the life science industry, Jane worked in academia with post-doctoral positions in genetics and genomics research.