Skip to main content

Linguamatics Spring Text Mining Conference 2016

Cambridge, United Kingdom

Read about the 2016 conference on the blog.


The Linguamatics Spring Text Mining Conference 2016 was held April 25-27, 2016 at the Møller Centre, a Centre for Excellence in the heart of Cambridge, UK.

Confirmed speakers

Thierry Breyette

Novo Nordisk

Thierry is the Senior Manager of Information Analytics at Novo Nordisk Inc., located in Princeton, NJ. Currently, Thierry’s primary data focuses are real world, market, and clinical trial data.  Thierry is influenced by design thinking principles and enjoys exploring novel solutions to information problems. In his work, Thierry utilizes a combination of tools and techniques, including natural language processing, information visualization for discovery and presentation, and conducting descriptive and predictive analyses.

Some of Thierry’s past projects include working on social media analysis and digital opinion leader identification, identifying macro and micro healthcare market trends in the US, detecting patterns in clinical trial protocol deviations.

When Thierry is not playing around with data, he enjoys spending time with his wife and two daughters, admiring vintage cars, collecting transit maps, and sipping the occasional whisky. 


Presenting on generating actionable insights from real world data.

Jonathan Hartmann

Georgetown University Medical Centre

Jonathan Hartmann, MLS is a hospital informationist at the Georgetown University Medical Center. In this position, Jonathan attends rounds with clinical teams in the MedStar Georgetown University Hospital and provides evidence-based information at the bedside using a variety of resources and tools including Linguamatics I2E and diagnostic software.

Previously, Jonathan was AHEC program librarian at the University of South Florida; reference & instruction librarian and assistant professor of instructional resources at the Massachusetts College of Pharmacy and Health Sciences; and outreach coordinator, AHEC librarian and instructor in the department of medicine at the Medical College of Ohio.

He holds a Master of Library Science degree from Kent State University.

Jonathan will be presenting on the evolution of I2E to improve patient care.

James Loudon-Griffiths


James is on a 2-year graduate placement with AstraZeneca as a Clinical Information Scientist within Biometrics & Information Sciences.

He graduated from The University of Edinburgh with a Master’s degree in Biological and Medicinal Chemistry.

James is presenting on A Comparison of Nausea Incidence from PatientsLikeMe vs Drug Product Labels.

Ralf Josef Jaeger


Ralf J. Jaeger is a senior scientist in the Data Science group of the department Pharma Research and Early Development Informatics at F. Hoffmann La Roche, Basel, Switzerland. He is a biologist and is trained in scientific information management (DGI) and project work. Early he began to introduce and integrate text-mining as complementary technology, including Linguamatics I2E, to support the company’s research groups. He entered the field of information science as a organizer and instructor of trainings offered by a former consultant company in Tübingen, Germany, before joining Roche in 2000. Ralf received his diploma in neurogenetics and his PhD in human genetics in the department of biology from Albert-Ludwigs-University, Freiburg i.Br., Germany and postdoctoral fellowship training at the HKI Leibnitz Institute in Jena, Germany, with focus on lab process automation.

He will be presenting on Hunting synonyms: Synonym identification for ontology development and use with I2E.

Eleanor Yelland

University College London

Eleanor Yelland is a PhD Student in the Division of Psychiatry at University College London. Her PhD is a partnership with Linguamatics and Ieso Digital Health, who provide text-based online cognitive behavioural therapy. The project focuses on the language within the treatment sessions and how text-mining methods can be applied to best use this to learn about and improve treatment provision. The work primarily involves identifying potentially relevant linguistic characteristics, measuring these and building statistical models of their relationship with therapy outcome scores.  

Eleanor will be presenting on I2E in mental health: Analysis of online transcripts used in cognitive behavioural therapy.

John Brimacombe


John Brimacombe is a serial entrepreneur. After graduating in Law and Computer Science from Trinity College, Cambridge, he founded Jobstream Group plc, which provides specialist ERP software to the international financial services sector. Brimacombe subsequently co-founded pioneering mobile entertainment start-up nGame Ltd., which was acquired by Hands-On Mobile Inc. He served as President/COO of HandsOn Mobile for over 2 years, leading the company through 7 major M&A transactions and massive global expansion. In addition to an ongoing commitment to Jobstream, Brimacombe currently chairs the enterprise natural language search tools provider, Linguamatics, is a Partner at Sussex Place Ventures, the venture-capital arm of the London Business School, is a seed-investor in multiple US and UK start-ups and has extensive experience as a non-executive director from start-up to Public markets.

Phil Hastings


Phil Hastings is based at Linguamatics’ Cambridge headquarters and has over 25 years of experience in the scientific information and software industries. He worked for a number of years in commercial and product management roles for scientific, technical and medical content providers John Wiley & Sons and the Thomson Corporation. Prior to joining Linguamatics, Phil spent over five years at Accelrys, a leading provider of molecular modeling and informatics software, in product marketing and business development roles. Phil holds a PhD from the University of Nottingham.

David Milward


David Milward is chief technology officer (CTO) at Linguamatics. He is a pioneer of interactive text mining, and a founder of Linguamatics. He has over 20 years experience of product development, consultancy and research in natural language processing (NLP). After receiving a PhD from the University of Cambridge, he was a researcher and lecturer at the University of Edinburgh. He has published in the areas of information extraction, spoken dialogue, parsing, syntax and semantics.

Jane Reed


Jane Reed is the head of life science strategy. She is responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science domain. Jane has extensive experience in life sciences informatics. She worked for more than 15 years in vendor companies supplying data products, data integration and analysis and consultancy to pharma and biotech - with roles at Instem, BioWisdom, Incyte, and Hexagen. Before moving into the life science industry, Jane worked in academia with post-docs in genetics and genomics.

Guy Singh


Guy Singh is senior manager, product and strategic alliances at Linguamatics. He has a joint role that spans both managing the I2E product and the partner ecosystem around it. In his product role, he is responsible for product management, marketing and strategy guidance of I2E, an award winning agile, high performance enterprise text mining software. His strategic alliances role holds responsibility for recruiting and managing partners across technology, content and services for text mining. Guy joined Linguamatics from information intelligence organization I2 (now part of IBM), where he was responsible for the launch of their first ever search based product. He has held posts in R&D, Engineering, Product Marketing and Management at Vodafone, Baltimore Technologies, Oracle and IXI. With over 20 years of experience in the IT industry working in a wide range of environments, his achievements include building a new multi-million dollar search product line from scratch and managing product businesses with turnovers in excess of $40 million.


The following presentations will be included at the conference:

Generating Actionable Insights from Real World Data

By Thierry Breyette

The density and variability of the information landscape is making it increasingly difficult to identify meaningful trends in data.  Traditional data sources such as clinical trial data and publication data are one piece of an increasingly complex information puzzle. As data capture and publishing platforms explode, newer and highly varied data sources are available for analysis, including internally generated data, social data, patient data, clinician data, market data, hospital data, etc. Building a forward looking analytics framework to tackle these new data challenges requires both extensible and flexible tools, and creative thinking.   

This talk with cover some of the real world data projects we have worked on: social media analysis and digital opinion leader identification, identifying macro and micro healthcare market trends in the US, detecting patterns in clinical trial protocol deviations, gleaning clinical insights based on discussions between US based medical liaisons and health care providers, and patient and caregiver ethnographic data to discern patterns in patient sentiment, compliance, routines, behaviors, and overall treatment satisfaction and outcomes. The discussion will focus on our approach to each of these projects, the outcome and impact.

What's new in I2E 2016

By Guy Singh

This presentation provides an overview of 2 major product releases, I2E 4.4 and I2E 5.0. As well as providing a summary of all the features within these releases, the presentation will focus on some significant new capabilities introduced to the user community: EASL (Extraction And Search Language - new query language for I2E), Range Search and Charting enhancements. The talk will also include a summary of changes to content and infrastructure to the I2E OnDemand platform.

A Comparison of Nausea Incidence from PatientsLikeMe vs Drug Product Labels

By James Loudon-Griffiths

Nausea is a common Adverse Reaction (AR) associated with many medicines that can be debilitating to the individual. Currently, drug product labels are the main source of AR data that clinicians rely on for prescribing; the source of this information being clinical trial data. For subjective ARs, such as nausea, there is a particular risk for cases to be under-reported. This gives the potential for differences to exist between clinical results and a real world evidence (RWE) source like PatientsLikeMe (PLM), an online patient self-reporting platform. AstraZeneca has a collaboration with PLM [1], in which RWE data, specifically for nausea AR frequencies, will be compared with the drug product label data. Initially a systematic approach will be followed to extract information on nausea AR frequencies from the FDA drug product labels source DailyMed; an approach utilising I2E OnDemand. The results of this analysis will help to determine if the RWE PLM data could provide an additional supplementary data source for AR frequencies, in addition to the drug product labels.


1. AstraZeneca. 2015. AstraZeneca And PatientsLikeMe Announce Global Research Collaboration. [Press release]. [Accessed 13 Apr. 2016]. Available from:

The Evolution of I2E to improve patient care

By Jonathan Hartmann and Guy Singh

At previous Linguamatics conferences, a novel use case of using I2E during and after patient rounds in a hospital was presented.

I2E is being used to extract information to aid physicians’ decision making on their daily visits to their patients’ bedsides. Using a tablet to text mine information from MEDLINE and other sources enables real time operational use during hospital rounds.

This presentation will look back over the past few years and describe the evolution of the solution from its inception to how it is being used today. It will detail the technical architecture, challenges and current set up. 

Real life case studies will be used to describe how the solution has been applied to assist physicians in their diagnosis and treatment of patients.

Future Innovations: I2E Product Roadmap

By David Milward

Over the last few years the I2E core platform has become both more powerful and more configurable, with particular recent highlights including multi-lingual processing and normalized values.

The shift from pre-processing and post-processing into I2E itself typically provides faster and easier development. However, as more complexity is embedded within queries, it becomes important to be able to share and re-use as much as possible. A major effort in the next year will be to make this easier.

This talk will demonstrate two new features in development for I2E Pro: a more convenient output editor for both queries and multi queries, and the ability to embed one query inside another. It will also cover other upcoming developments, such as spelling correction and incremental charts, and research to improve integration between I2E and the distributed big data framework, Hadoop.

Finally, we will show ideas for a simpler interface that could make smart queries easier to exploit by occasional users.

Linguamatics I2E in Life Sciences: recent developments and use cases

By Jane Reed

In this era of big data, life science organizations face the challenge of filtering ever-increasing volumes of text information to gain actionable insights for key decision-making. I2E’s flexibility means it can be beneficial in many applications and use cases. This talk will provide an overview of some customer use cases from a range of different disciplines, and highlight some of the solution areas where significant benefit has been found.

I2E in mental health: Analysis of online transcripts used in cognitive behavioural therapy

By Eleanor Yelland

This presentation will cover a portion of a UCL-based PhD project that forms a collaboration between the university, Linguamatics and Ieso Digital Health, who provide online text-based cognitive behaviour therapy to patients with mild to moderate depression and anxiety. The transcripts from these therapy sessions form a rich and underutilized data source. This work aims to explore how text mining with I2E can be applied to best use this data to learn about the therapeutic process and improve therapy provision. 

Unstructured patient data and Real World Evidence

By Phil Hastings

The focus on Real World Evidence has grown substantially with changing re-imbursement models and the need for more insights into treatment response and patient outcomes. Clinical trials often reflect a population with simple medical histories to better assess the impact of a given drug. However, in the real world patients have complex disease comorbidities and it is vital to understand disease progression and drug performance post launch. Much of the commercially available data on patient outcomes is reliant on structured disease codes that only tell part of the story, whereas up to 80% of EHR data is unstructured and contains the richest information about the patient. Linguamatics has expanded rapidly into healthcare in recent years and this talk will share some use cases where vital patient insights are being extracted.

Hunting synonyms: Synonym identification for ontology development and use with I2E

By Ralf J Jaeger

Human language is both creative and highly dynamic. This holds also true for written language and reaches into all fields of human interest, including science. The very same facts and thoughts can be expressed in many ways. Fortunately, well maintained taxonomies were introduced early to make it easier for scientists to keep track of concepts touching their area of research in the libraries. Thus, efficient retrieval of articles became manageable. In the light of fast growing internal and external scientific corpora and less time to find relevant information it became more and more crucial to have well maintained thesauri. Unfortunately, concepts used in library taxonomies are often formalized and come with no or too few synonyms used in free text. And although the unspecific, alternative use of grammatical concepts is an indispensable tool in text mining it is of importance to have excellent thesauri to find and extract phrases from scientific papers with regard to post-processing, precision, time etc.. In addition science begins to learn from non-scientific sources such as social media, which in turn widens the scope to express information. It seems obvious that lacking free text synonyms in taxonomies is detrimental for optimal results. Finally it becomes evident that abundant thesauri also may have their limitations. Thus, tagging of concepts with more granularity, ideally driven by the user, during indexing of corpora is required. It needs to become part of the standard text mining workflows to cope with the semantic diversity.


Monday 25th April

11:00am onward    REGISTRATION in the Study Centre Foyer, next to the Business Centre

12:00pm - 1:15pm    BUFFET LUNCH available in Study Centre Foyer

Training workshops: Session 1

1:15pm - 5:15pm

  • 1A Introduction to I2E: Room 10
  • IB I2E Intermediate Use and Best Practice: Room 11
  • Linguamatics I2E Query Hackathon: Suite 3 (Rooms 12 and 13)



There will be coffee and snacks available in the foyer all day. Demo Room: Study Centre Room 14

Demonstrations of I2E 5.0 can be arranged for you to view and try out. Previews of future User Interface concepts will also be taking place.


6:15pm   EVENING SOCIAL EVENT at The Møller Centre Tower Lounge Bar

Tuesday 26th April


8:30am - 9:00am   REGISTRATION & REFRESHMENTS in Study Centre Foyer/ Refreshment Area

9am start Main day presentations in Study Centre Suite 2 (Rooms 10 and 11)

9:00am - 9:15am Welcome address, John Brimacombe, Linguamatics

9:15am - 9:40am Generating Actionable Insights from Real World Data, Thierry Breyette, Novo Nordisk

9:40am - 10:00am What’s new in I2E, Guy Singh, Linguamatics

10:00am - 10:25am A Comparison of Nausea Incidence from PatientsLikeMe vs Drug Product Labels, James Loudon-Grif ths, AstraZeneca

10:25am  - 11:00am   REFRESHMENT BREAK  

11:00am - 11:25am The Evolution of I2E to improve patient care, Jonathan Hartmann, Georgetown University Medical Center, and Guy Singh, Linguamatics

11:25am - 11:50am Future innovations: I2E Product Roadmap, David Milward, Linguamatics

11:50am - 12:00pm Introduction to roundtable discussions, Jane Reed, Linguamatics

12:00pm - 12:05pm Partner lightning rounds: Copyright Clearance Center

12:05pm - 12:10pm Partner lightning rounds: Thomson Reuters

12:10pm - 12:15pm Partner lightning rounds: ChemAxon

12:15pm - 12:20pm Partner lightning rounds: IFI Claims 

12:20pm - 1:20pm    LUNCH in the restaurant

Roundtables: 1:20pm - 2:20pm

ROUNDTABLE 1: Data visualizations & dashboards for text analytics Room 15

ROUNDTABLE 2: Text mining in enterprise work ows: opportunities for integration and embedding. Suite 3 (rooms 12 and 13)

ROUNDTABLE 3: The potential of text analytics for real world data. Suite 2 (rooms 10 and 11)

ROUNDTABLE 4: Text mining for novel and known chemicals. Suite 2 (rooms 10 and 11)

ROUNDTABLE 5: Text mining full-text literature - challenges and opportunities. Suite 3 (rooms 12 and 13)

2:20pm - 2:40pm Linguamatics I2E in Life Sciences: recent developments and use cases, Jane Reed, Linguamatics

2:40pm - 3:05pm I2E in mental health: Analysis of online transcripts used in cognitive behavioural therapy, Eleanor Yelland, University College London

3:05pm - 3:35pm    REFRESHMENT BREAK

3:35pm - 4:10pm Roundtable feedback session

4:10pm - 4:35pm Hunting synonyms: Synonym identi cation for ontology development and use with I2E, Ralf Josef Jaeger, Roche

4:35pm - 4:55pm Unstructured patient data and Real World Evidence, Phil Hastings, Linguamatics

4:55pm - 5:00pm  Wrap up

6:15pm Meet in the Study Centre Foyer (main conference area). Coach and walking options will be provided to Trinity College. The walk will take around 35 minutes.

7pm onward    DRINKS AND DINNER at Trinity College old kitchen

9:30pm Coach available to return to Møller Centre 

Wednesday 27th April

Training workshops: Session 2

9:00am - 1:00pm

  • 2A I2E Hints and Tips, Ask the Expert: Room 10
  • 2B Exploiting New I2E Features: Room 11
  • 2C Developing Applications for I2E: Suite 3 (Rooms 12 and 13)

10:45am - 11:15am   REFRESHMENT BREAK

1:00pm - 2:00pm    LUNCH Study Centre Room 9

2:00pm End of conference. See you next year!


We have hands-on training workshops run by our text mining experts available for total beginners, intermediate users and also advanced users.

Access the Hands-On Training Workshop selection guide, schedule and descriptions.

Recommended workshop selections

For those who are completely new to querying with I2E:

  • Monday: 1A: Introduction to I2E
  • Wednesday: 2A: I2E Hints and Tips, Ask the Expert

For users who already have experience of querying with I2E:

  • Monday: 1B: I2E Intermediate Use and Best Practice,
    or H: Linguamatics I2E Query Hackathon
  • Wednesday: 2A: I2E Hints and Tips, Ask the Expert,
    or 2B: Exploiting New I2E Features

For those interested in using the Web Services API and/or EASL:

  • Wednesday: 2C: Developing Applications for I2E

Roundtable topics

Linguamatics I2E Query Hackathon

Linguamatics will be hosting an I2E Query Hackathon as part of the Spring Text Mining Conference 2016.

Get hands-on with Linguamatics I2E platform to solve real-life challenges. Benefit from a greater understanding of query strategies and learn new skills from colleagues with guidance from Linguamatics NLP text mining experts. 

The Challenge

One of the most common challenges in extracting information from free text is making sense of the context within which a disease term, for example, occurs. Does a reference to diabetes refer to a current diagnosis, a family history or a hypothesis that was discounted?

The I2E Query Hackathon will address these challenges and requires participants to extract and classify disease terms (plus medications if time allows) from sample medical transcripts.

A small set of example medical transcripts and annotations will be provided to participants to demonstrate how results should appear. An annotated test data set will be used to assess each team’s efforts and with the results being submitted at the end of the day.

Who should participate?

I2E Query Hackathon is for existing I2E users who want to advance their query skills, to learn about dealing with gold standard, test and training data sets, and to share best practice.

Participation for the I2E Query Hackathon is free but registration is required and spaces are limited so please sign up early.

Participants will be split into teams, each with a Linguamatics coach to provide guidance and assistance on best practice.

5 reasons to attend

  1. Gain first-hand knowledge and experience on how structured and unstructured content can be mined to uncover valuable information
  2. Gain hands-on experience of NLP text mining through workshops and training
  3. Understand the challenges other pharmaceutical and healthcare professionals are facing and explore solutions to these challenges
  4. Gain a better understanding of NLP text mining and where it can fit into your organization
  5. Network and exchange ideas with peers and text mining experts

Past comments

One of the best summits and training sessions I have ever attended. The pioneering NLP efforts, and users responsiveness is unmatched by any other text mining (NLP) vendor. The Linguamatics approach & I2E system is relatively intuitive, easy to manage, powerful and useful.

Both workshops were very useful. I enjoyed the interactive format

This is such a great meeting, it's so good to hear from other people all the different ways in which they're using I2E

I have been going to conferences for 15 years and this one is the best one

Congratulations on a great user meeting - the talks were quite excellent

Very useful conference


Access the delegate pack for information on getting to The Moller Centre and more.

Moller Centre

Access the delegate pack for information on getting to The Moller Centre and more.

About and Register

The conference provides new, experienced and potential users of Linguamatics I2E software an excellent opportunity to explore the latest trends in natural language processing-based text mining.

Delegates will discover how I2E is delivering valuable intelligence from text in a range of applications, as well as have an opportunity to network with the Linguamatics community.

Our evening social events will be held at beautiful and historic Cambridge venues. This year we will be dining at Trinity College on Tuesday, April 26. Trinity was founded by Henry VIII in 1546.

Who should attend?

New and experienced users of Linguamatics I2E and other text mining software, alongside any professionals interested in the mining and analysis of textual information.


Ready to get started?

Request a Demo

Questions? Ask our experts