Skip to main content

Posts from February 2015

Big data? Real world data? What do we really mean?

I was at a conference a couple of weeks ago, an interesting two days spent discussing what is big data in the life science domain, and what value can we expect to gain from better access and use.

The key note speaker kicked off the first day with a great quote from Atul Butte: “Hiding within those mounds of data is knowledge that could change the life of a patient or change the world”.

This is a really great ambition for data analytics.  But one interesting topic was, what do we mean by big data? One common definition from some of the Pharma folk was, that it was any sort of data that originated outside their organization that related to patient information.

To me, this definition seems to refer more to real world data – adverse event reports, electronic health records, voice of the customer (VoC) feeds, social media data, claims data, patient group blogs. Again, any data that hasn’t been influenced by the drug provider, and can give an external view – either from the patient, payer, or healthcare provider.

Many of these real world sources have free text fields, and this is where text analytics, and natural language processing (NLP), can fit in. We have customers who are using text analytics to get actionable insight from real world data – and finding valuable intelligence that can inform commercial business strategies.

Valuable information could be found in electronic health records, but these are notoriously hard to access for Pharma, with regulations and restrictions around data use, data privacy etc.

So, what real world data are accessible?

(Cambridge, UK and Boston, USA - 24 February 2015)

Advanced NLP text analytics over gene-disease database brings powerful search benefits for target identification, NGS annotation, clinical genetics and diagnostics.

Linguamatics announces that it is making the Online Mendelian Inheritance in Man® (OMIM) data available with its market-leading text analytics platform, I2E.

The new service will be offered on the cloud via Linguamatics I2E OnDemand platform and is part of Linguamatics’ ongoing strategy to expand the range of off-the-shelf content accessible through its text mining and knowledge discovery solutions.

I2E OnDemand provides access to a wide variety of data such as MEDLINE, FDA Drug Labels, Patents,, PubMed Central (open access subset) and NIH grants.

The addition of OMIM allows users to accurately identify and extract information to reveal genetic associations for unusual clinical case presentations or phenotypes; or to search for potential targets for a particular therapeutic area, for initial target selection.

OMIM is a comprehensive catalogue of all known human diseases with a genetic component. The database includes documented associations to the relevant genes in the human genome, and related information including gene and disease descriptions, clinical synopsis, animal models, inheritance, mapping, history, and more.

Patent information professionals gathered in sunny San Francisco for the 2015 PIUG Biotechnology Conference on February 16–18th.

The conference, hosted at Genentech, offered a mix of workshops, presentations, vendor exhibitions and networking opportunities that brought together patent searchers from diverse biotechnology organizations.

The central theme of this year’s conference was “Maximizing Value in Biotechnology Searching with New Technologies and Trends”. Delegates were eager to enhance existing search strategies which included a mix of content provider search tools, keyword search, in-house developed programming/machine learning, manual curation outsourcing and, for many, Linguamatics I2E.

They all had one thing in common, everyone was interested in finding new trends, techniques and technologies that would help them return more relevant patent information more efficiently.

The conference started on the first day with a series of workshops. David Milward, our CTO, delivered a workshop on new developments in text mining patents.

The workshop included an overview of updates to our text mining platform I2E, to allow easier embedding and automation, multilingual processing, improved visualization and simpler extraction of information from tables – all of which resonated well with this year’s theme.

There’s a variety of ways of running searches using I2E but for most purposes, the modes can be simplified to:

  • Search using the I2E Java Client, and
  • Everything else

This distinction is important for users, administrators and developers because access to querying is licensed in the same way. Today’s post will explain the differences between the two modes as well as how to make sure that you’re using your existing capabilities in the most efficient way, with reference to license pools, capabilities and user groups.

Querying using the I2E Java Client

If you’re running a search via the I2E Java client, you will have an interactive license pool that has a “Pro Query” capability (for simplicity, I’m ignoring “Express Query” and “Smart Query” capabilities; the description mostly applies to these as well).

In addition to allowing you to run a search, “Pro Query” capability also provides you with uncontested access to the server (unless you log out or your session times out) and the ability to load, create and save queries.

Tasks available for I2E Java client

If two people want to run the I2E Java client at the same time, they will both need to belong to a license pool with a “Pro Query” capability and the sum of those license pools are at least two (e.g. Two named user license pools or 1 concurrent user license pool with two seats).