Skip to main content

Text analytics for genotype-phenotype association in Hunter Syndrome

Bone deformities, hearing loss, frequent respiratory infections, cognitive impairment and chronic heart and liver disorders are symptoms suffered by infants with Hunter syndrome (also known as Mucopolysaccharidosis II). This blog follows our previous blog on associations between genotype and phenotype in very rare diseases, carried out by Shire. 

Shire, now part of Takeda, provides an enzyme replacement therapy for Hunter Syndrome. However, in order to ameliorate the neurocognitive effects, the enzyme replacement molecule needs to be delivered to the central nervous system (CNS) via an innovative implant device, which is an invasive procedure.

Shire was keen to find a reliable way to identify young patients who had the greatest potential to benefit from the treatment, and wanted to survey the literature to better understand what genetic markers could be used for patient stratification. Shire develops and provides healthcare in the areas of behavioural health, gastrointestinal conditions, rare diseases, and regenerative medicine. The objective of this research was to use text mining to identify genotypes and phenotypes associated with Hunter syndrome, establish meaningful correlations, and enable smarter decisions about who to include in the trial. Linguamatics NLP technology was already used in the past to identify gene-disease associations for rare diseases, so the team knew that it was possible. By working with the Shire team lead by engineer and biotech research specialist Madhu Natarajan, they found key biological hotspots in the mutation patterns related to phenotype severity.

Text mining was remarkably successful. Results were significantly better than any genetic database of reported genotypes available.

- Madhu Natarajan, Director, Systems Pharmacology, Shire

Patient stratification

Linguamatics NLP capabilities enabled Shire’s team to identify every patient with data published in PubMed with Hunter syndrome or related symptoms; capture and extract mutations and variants in the iduronate-2-sulfatase gene; and relate this to specific phenotypes described, particularly around cognitive impairment. The study showed that Linguamatics text mining bettered genetic database of reported genotypes, providing specific, previously unavailable, phenotype severity associations with particular genotypes. It also enabled data-led clinical decision making for patient treatment. Madhu reported:

We picked up insights that the clinicians who were seeing patients couldn’t. We were pleasantly surprised by the amount of data we found. We matched or bettered everything that was out there; it was shockingly better than any genetic database of reported genotypes available.

Webinar: A systematic examination of gene-disease associations through text mining approaches:

We hosted a webinar with Madhu where he illustrates some of the challenges for R&D for orphan diseases, particularly around text mining for mutation and variant patterns, which can be reported in so many different ways in the literature. This webinar is based on Shire's presentation at a previous Linguamatics Text Mining Summit held in Newport, RI.

Watch webinar: A systematic examination of gene-disease associations through text mining approaches


Ready to get started?

Request a Demo

Questions? Ask our experts