Linguamatics Spring Text Mining Conference 2017

Conference
April 24, 2017 to April 26, 2017
Venue: Cambridge, Cambridgeshire, United Kingdom

See the conference agenda

Read about the 2017 text mining use cases and featured speakers

Read about the 2016 conference

#LMSpring17

The Linguamatics Spring Text Mining Conference 2017 was held on Mon, 24 April – Wed, 26 April 2017 at the Møller Centre, a Centre for Excellence in the heart of Cambridge, UK.

Agenda

Click here to see the full 3 day conference agenda 2017

Monday 24th April

11:00am onward    REGISTRATION in the Study Centre Foyer, next to the Business Centre

12:00pm - 1:15pm    BUFFET LUNCH available in Study Centre Foyer

Afternoon: Training workshops with the text mining experts for all experience levels

EVENING SOCIAL EVENT at The Møller Centre Tower Lounge Bar

Tuesday 25th April

8:30am - 9:00am   REGISTRATION & REFRESHMENTS in Study Centre Foyer/ Refreshment Area

9am start   Main day presentations in Study Centre Suite 2 (Rooms 10 and 11)

7pm onward    DRINKS AND DINNER at Queen's College. Transport by bus from the Moller Centre will be provided.

Wednesday 26th April

9:00am Linguamatics Community half hour Workshop followed by morning Training workshops.

The conference will end with lunch on Wednesday.

Speakers and Bios

Samir Courdy

Huntsman Cancer Institute, USA

Samir is the Chief Research Information Officer (CRIO), and Director of Research Informatics at the Huntsman Cancer Institute.

Samir joined the Institute in 1999. Samir is responsible for all aspects of software design and development for clinical, basic science, and population research at HCI. He coordinates the efforts of the Research Informatics Shared Resource to meet the needs and requirements of the Cancer Center, provides guidance and planning effort for each project, runs coordination meetings, manages staff, and provides strategic direction for architecture, projects, and technology.

He has been at HCI for 15 years during which his team was responsible for the design and implementation of several systems, including an enhanced Research Subject Registry.  A Biospecimen Tracking System (itBioPath), a translational research application for the Biorepository and Molecular Pathology Core BMP, and a Comprehensive Clinical Cancer Research System (CCR) with a robust and extensible meta model enabling all cancer groups to co-exist in the same database, managed by a very complex security model which is compliant with IRB, and HIPPA regulations. The team released GNomEx a High Throughput Next Gen Sequencing LIMS system for use by the HCI and University of Utah community.

The team completed work on an analysis suite built upon the Integrated Genome Browser (IGB) and the Distributed Annotation System (DAS/2) along with an internally developed publishing suite GenoPub.  GNomEx and GenoPub were both released as an open source project and are available for download on sourceforge.  GNomEx was recently integrated with our BioSpecimen System (itBioPath) to provide linkages between genomic and clinical data.  The development of a Chemical Screens Annotation and Management System (CSAM) was also completed within the last six months.  Mr. Courdy is developing a natural language processing workflow for identifying and extracting clinical and diagnostic information from surgical pathology reports, physicians’ notes and radiology documents.  This project is utilizing a third party software package from Linguamatics.

Will Hayes & Jon Sanford

Mundipharma

Daniel Stoffler & Raul Rodriguez-Esteban

Roche

Daniel leads and manages cheminformatics activities in the newly formed Data Science organization.

Raul is a member of the Data Science group at Roche Pharma in Basel, Switzerland, coordinating text mining initiatives in R&D.

John Brimacombe

Linguamatics

John Brimacombe is a serial entrepreneur. After graduating in Law and Computer Science from Trinity College, Cambridge, he founded Jobstream Group plc, which provides specialist ERP software to the international financial services sector. Brimacombe subsequently co-founded pioneering mobile entertainment start-up nGame Ltd., which was acquired by Hands-On Mobile Inc. He served as President/COO of HandsOn Mobile for over 2 years, leading the company through 7 major M&A transactions and massive global expansion. In addition to an ongoing commitment to Jobstream, Brimacombe currently chairs the enterprise natural language search tools provider, Linguamatics, is a Partner at Sussex Place Ventures, the venture-capital arm of the London Business School, is a seed-investor in multiple US and UK start-ups and has extensive experience as a non-executive director from start-up to Public markets.

Phil Hastings

Linguamatics

Phil Hastings is based at Linguamatics’ Cambridge headquarters and has over 25 years of experience in the scientific information and software industries. He worked for a number of years in commercial and product management roles for scientific, technical and medical content providers John Wiley & Sons and the Thomson Corporation. Prior to joining Linguamatics, Phil spent over five years at Accelrys, a leading provider of molecular modeling and informatics software, in product marketing and business development roles. Phil holds a PhD from the University of Nottingham.

David Milward

Linguamatics

David Milward is chief technology officer (CTO) at Linguamatics. He is a pioneer of interactive text mining, and a founder of Linguamatics. He has over 20 years experience of product development, consultancy and research in natural language processing (NLP). After receiving a PhD from the University of Cambridge, he was a researcher and lecturer at the University of Edinburgh. He has published in the areas of information extraction, spoken dialogue, parsing, syntax and semantics.

Jane Reed

Linguamatics

Jane Reed is the head of life science strategy. She is responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science domain. Jane has extensive experience in life sciences informatics. She worked for more than 15 years in vendor companies supplying data products, data integration and analysis and consultancy to pharma and biotech - with roles at Instem, BioWisdom, Incyte, and Hexagen. Before moving into the life science industry, Jane worked in academia with post-docs in genetics and genomics.

Paul Milligan

Linguamatics

Paul Milligan is Senior Product Manager at Linguamatics, having started at the company as an Application Specialist in 2005. In his role, he is responsible for product management, marketing, and strategy guidance of I2E, an award-winning agile, high performance enterprise text
mining software. Paul also has responsibility for the I2E Web Services API, to allow integration of text mining capabilities into large-scale workflows. He has worked on many customer projects, including analysis of historical pre-clinical records, large-scale extraction of information from patents and semantic annotation for enterprise search. Paul has a PhD from University of Cambridge in computational drug design where his work involved developing algorithms to optimize combinatorial libraries for protein-ligand interactions. Before joining Linguamatics, Paul worked in the Cambridge biotech sector creating bioinformatics and chemoinformatics tools for processing of diverse large databases, including PDB, compound registries and PubMed.

James Loudon-Griffiths

AstraZeneca

James is a Clinical Information Science Associate. He graduated from The University of Edinburgh with a Master’s degree in Biological and Medicinal Chemistry. James is presenting on A Comparison of Nausea Incidence from PatientsLikeMe vs Drug Product Labels.

Helen Pitman

Cancer Research UK

Helen is now working at Cancer Research UK in London as a project analyst on the stratified medicine programme. This is a novel and fast paced exciting project to be involved in.

Presentations and Abstracts

Welcome Address

John Brimacombe, Linguamatics

Navigating the Quagmire of Clinical Data in Free Text Reports

Samir Courdy, Huntsman Cancer Institute

The amount of structured and unstructured clinical data found in surgical pathology and radiology reports, and physicians notes, including diagnosis, and treatment information is daunting.  The effort required for manual abstraction of this information from these reports can be overwhelming.  We propose to build an automated workflow process for identifying such reports, utilizing I2E to tag all relevant clinical information, and developing an extraction methodology to associate such unstructured diagnostic information with discrete data elements for research and longitudinal follow up of patients and research subject, to help alleviate the manual and human effort required for abstracting this information, improving quality, consistency and efficiency of data collection for improved outcomes and research.

Sifting through the deluge of information present in surgical pathology reports, physicians’ notes, and radiology reports is daunting.  To make sense out of all this information, we as informaticists, and data scientists have to develop better approaches and tools utilizing robust data mining techniques and methodology to automatically abstract and annotate data on patients for diagnoses, research cohort identification, and improved outcomes, higher data quality, and reduced costs of manual abstraction. Here, we present a methodology utilizing I2E from Linguamatics as a natural language processing tool for implementing such a solution for prostate cancer, and chronic myelogenous leukemia.

Stories from Clinical Informatics at AstraZeneca

James Loudon-Griffiths, AstraZeneca

Comparison of rates of nausea side effects for prescription medications from an online patient community versus medication labels:

This exploratory analysis seeks to understand the trends and values of patient-generated health data (PGHD) in a side effect reporting capacity by comparing the PGHD source PatientsLikeMe (PLM), an online real world patient self-reporting platform, with the FDA Drug Product Labels (DPLs), the authoritative resource for drug related information captured through clinical trials.

Nausea was selected in this comparison due to its ubiquity as a side effect in medications. Text mining offered an elegant solution to extracting the unstructured nausea side effect data located in the XML tables and free-text of the DPLs – supporting the creation of a structured dataset. This subsequently allowed an effective comparison to be made to the equivalent PLM data.

Understanding the relationship between clinical trial- and real-world-reported side effects is critical to better characterising the safety profile and tolerability of medications.

Text Mining to Support Clinical Literature Searches:

I2E OnDemand has helped exploit the different external information sources available at AstraZeneca. Literature searches can be functioned in a systematic way and provide a valuable record of the areas in the external literature landscape that have been covered.

In addition, specific information requests often require searching further than the available information alluded to in the title and abstract; text mining offers an efficient and effective way of bulk searching articles’ full-text to reach this level of detail.

I2E Product Roadmap

David Milward, Linguamatics

The main part of this talk will focus on developments that will be provided within the I2E 5.1 release later this year. A major enhancement in I2E 5.1 is the ability to deal with spelling variants as well as spelling errors and OCR errors. I2E 5.1 will also provide improved usability through a much more convenient output editor for both individual queries and multi queries, and it will encourage re-use through the ability to embed one query within another.

Linguamatics aim to make it as easy to access information from unstructured data as from structured. The I2E platform provides the power to find relevant information from unstructured data, and to format results for efficient review. The intoduction of the EASL query language and the web services API has made the I2E platform much more open allowing for other interfaces which adopt a different balance between power and usability. The talk will end with a demonstration of a prototype which aims to make text mining accessible to a wider audience.

User Experience Journey

Roger Attrill, Linguamatics

IDMP and Compliance - Using Text Mining to Support Regulatory Workflows

Will Hayes & Jon Sanford, Mundipharma Paul Milligan, Linguamatics

IDMP (IDentification of Medicinal Products) is a set of international standards developed by ISO that will become mandatory in Europe in a phased approach, effective from 2018, and is expected to be adopted by the FDA and globally over the next few years.

Capturing the hundreds of data attributes required per product, 70% of which lie in a variety of unstructured text sources, demands time, resource, and investment. Mundipharma Research realised that text mining could offer a solution to this challenge.

In this talk, we will present a pilot project using I2E to extract data attributes for IDMP Iteration 1 and beyond from Summary of Product Characteristics (SmPC) documents, and how this workflow can be implemented on an enterprise scale, to integrate with master data management for regulatory information.

Linguamatics Health: Recent Developments and Use Cases

Phil Hastings, Linguamatics

Electronic health records (EHRs) provide a rich source of valuable data, with the potential to discover insights about patients and their response to treatment. However, with up to 80% of the information within the unstructured text, hospitals and medical researchers need better ways to use this vital information to improve patient care. This talk will highlight key customer applications where I2E is supporting precision medicine, population health and clinical research. It will also cover new product developments to support real-time and large scale mining of patient data.

Linguamatics Life Science: Recent Developments and Use Cases

Jane Reed, Linguamatics

In this era of big data, life science organizations face the challenge of filtering ever-increasing volumes of text information to gain actionable insights for key decision-making. I2E’s flexibility means it can be beneficial in many applications and use cases. This talk will provide an overview of some customer use cases from a range of different disciplines, and highlight some of the solution areas where significant benefit has been found.

Extracting Attributes from Pathology Records in the CRUK Stratified Medicine Programme 1

Helen Pitman, CRUK Paul Milligan, Linguamatics

The amount of structured and unstructured clinical data found in surgical pathology and radiology reports, and physicians notes, including diagnosis, and treatment information is daunting.  The effort required for manual abstraction of this information from these reports can be overwhelming.  We propose to build an automated workflow process for identifying such reports, utilizing I2E to tag all relevant clinical information, and developing an extraction methodology to associate such unstructured diagnostic information with discrete data elements for research and longitudinal follow up of patients and research subject, to help alleviate the manual and human effort required for abstracting this information, improving quality, consistency and efficiency of data collection for improved outcomes and research.

Sifting through the deluge of information present in surgical pathology reports, physicians’ notes, and radiology reports is daunting.  To make sense out of all this information, we as informaticists, and data scientists have to develop better approaches and tools utilizing robust data mining techniques and methodology to automatically abstract and annotate data on patients for diagnoses, research cohort identification, and improved outcomes, higher data quality, and reduced costs of manual abstraction. Here, we present a methodology utilizing I2E from Linguamatics as a natural language processing tool for implementing such a solution for prostate cancer, and chronic myelogenous leukemia.

ARTEMIS - A Text Mining Tool for Chemists

Daniel Stoffler & Raul Rodriguez-Esteban, Roche

Artemis is an easy to use web-based front-end to I2E developed at Roche. It enables our drug project teams (a.k.a. non-expert I2E users) to execute pre-defined text-mining queries and enriches the results with further data. Its necessity stems from the need to frequently mine our licensed external content for chemicals affecting targets or diseases to identify and extract previously unknown relationships. We present a look under the hood of the tool and delve briefly into cheminformatics and how we capitalize on the I2E Chemistry module. We show how the compound-target and compound-disease relationship extraction is implemented, enabling us to query for relationships in which only one partner of the relationship is known. Finally, we present how the data can be further enriched and interactively analyzed by our project teams.   

Enhancing Big Data with I2E

Paul Milligan, Linguamatics

Text mining has been used to turn unstructured text into structured information for a wide variety of projects in many application areas. However, there is a growing need within organizations to address this at the enterprise level across very large data sets. Although there are numerous ETL (Extract, Transform and Load) solutions available, they are designed to work on structured data, whereas the majority of the data is in unstructured form.

This talk looks at how I2E can be used in these big data environments to enhance the data by adding information and transforming it into a structured form, with target systems such as data warehouses and enterprise search platforms.

Workshops

We have hands-on training workshops run by our text mining experts available for total beginners, intermediate users and also advanced users.

See this year's Hands-On Training Workshop selection guide, schedule and descriptions.

5 reasons to attend

  1. Gain first-hand knowledge and experience on how structured and unstructured content can be mined to uncover valuable information
  2. Gain hands-on experience of NLP text mining through workshops and training
  3. Understand the challenges other pharmaceutical and healthcare professionals are facing and explore solutions to these challenges
  4. Gain a better understanding of NLP text mining and where it can fit into your organization
  5. Network and exchange ideas with peers and text mining experts

Past comments

One of the best summits and training sessions I have ever attended. The pioneering NLP efforts, and users responsiveness is unmatched by any other text mining (NLP) vendor. The Linguamatics approach & I2E system is relatively intuitive, easy to manage, powerful and useful.

Both workshops were very useful. I enjoyed the interactive format.

This is such a great meeting, it's so good to hear from other people all the different ways in which they're using I2E.

I have been going to conferences for 15 years and this one is the best one.

Congratulations on a great user meeting - the talks were quite excellent.

Very useful conference!

Venue

Access the delegate pack for information on getting to The Moller Centre and more.

 

Accommodation:

Linguamatics has negotiated a preferred rate of £113.00. To book, call the Moller Centre and speak with the reception team on 01223 465 500 (quote KX28633  - this will help the team find the booking). Parking is free on site at the Moller Centre.

https://www.mollercentre.co.uk/venue/accommodation/

Moller Centre

Access the delegate pack for information on getting to The Moller Centre and more.

About and Registration

The conference provides new, experienced and potential users of Linguamatics I2E software an excellent opportunity to explore the latest trends in natural language processing-based text mining.

Delegates will discover how I2E is delivering valuable intelligence from text in a range of applications, as well as have an opportunity to network with the Linguamatics community.

Our evening social events will be held at beautiful and historic Cambridge venues. This year we will be dining at Queens College on Tuesday, April 25.

Who should attend?

New and experienced users of Linguamatics I2E and other text mining software, alongside any professionals interested in the mining and analysis of textual information.

Travel and Accommodation

To book a hotel stay at the Moller Centre, please call 01223 465 500 to speak with the reservations team (quote KX28633  - this will help the team find the booking). Parking is free on site at the Moller Centre.

 

Cost per night is £113.00.

https://www.mollercentre.co.uk/venue/accommodation/