Blog

There’s only a few days to go until the Linguamatics Text Mining Summit, which begins on 7th October in Newport RI.

This is an opportunity for I2E users and other developers to get hands-on access to the new version of I2E — version 4.1 — as well as attend a variety of interesting presentations and a number of training sessions.

This year, there is a training session dedicated to I2E Administration and use of the Web Services API. You can meet up with individuals from other organizations who will share their experiences with our API, along with training material that covers the various parts of the API in sufficient details to start using it yourself.

There will also be a case study on using the API to create workflows that integrate text mining. And, of course, lots of demos!

I look forward to seeing you there!

Paul


I2E 4.1 opens up new opportunities for connecting actionable insights from unstructured data across diverse cloud and enterprise-based content silos

(Cambridge, England and Boston, USA – September 19, 2013 ) -Linguamatics, the leader in natural language processing-based text mining and analytics, is pleased to announce the latest release of its award-winning software platform, I2E. Version 4.1 provides enhancements in a number of key areas, including more streamlined access to content through linking enterprise servers with content servers in the cloud, and enhanced chemical querying capabilities.

I2E’s new Linked Server functionality facilitates easier access to text mining across different content silos wherever they might be located, whether in-house data or content served from the cloud, including from Linguamatics’ own I2EOnDemand platform. This federated approach will enable faster linking of extracted information from diverse unstructured data sources such as scientific literature, clinical data, patents and in-house information, leading to increased speed to insight.

I2E’s enhanced chemical querying capability introduces faster, more scalable substructure and similarity searching within the context of sophisticated natural language queries, and integrates chemical structure drawing, using chemistry components from ChemAxon.

Amongst other product enhancements, index optimization delivers reductions in index sizes of around 30%, leading to savings in storage costs. This is particularly significant as customers scale up their enterprise text mining operations to deal with the challenges of big data.


Natural Language Processing (NLP), big data and precision medicine are three of the hottest topics in healthcare at the moment and consequently attracted a large audience to the first NLP & Big Data Symposium, focussed on precision medicine.

The event took place on August 27th, hosted at the new UCSF site at Mission Bay in San Francisco and sponsored by Linguamatics and UCSF Helen Diller Family Comprehensive Cancer Center.

Over 75 delegates came to hear the latest insights and projects from some of the West Coast’s leading institutions including Kaiser Permanente, Oracle, Huntsman Cancer Institute and UCSF.

The event was held in the midst of an explosion in new building development to house the latest in medical research and informatics, something that big data will be at the heart of.

Linguamatics and UCSF recognized the need for a meeting on NLP in the west and put together an exciting program that clearly caught the imagination of many groups in the area.

 

Over 75 delegates attended the Symposium

Key presentations included:


The internet right now, as Tim Berners-Lee points out in Scientific American, is a web of documents; documents that are designed to be read, primarily, by humans.

The vision behind the Semantic Web is a web of information, designed to be processed by machines. The vision is being implemented: important parts of the key enabling technologies are already in place.

RDF or the resource description framework is one such key technology. RDF is the language for expressing information in the semantic web. Every statement in RDF is a simple triple, which you can think of as subject/verb/object and a set of statements is just a set of triples.

Three example triples might be: Armstrong/visited/moon, Armstrong/isa/human and moon/isa/astronomical body. The power of RDF lies partly in the fact that a set of triples is also a graph and graphs are perfect for machines to traverse and, increasingly, reason over. After all, when you surf the web, you’re just traversing the graph of hyperlinks. And that’s the second powerful feature of RDF.

The individual parts, such as Armstrong and moon, are not just strings of letters but web-addressable Uniform Resource Identifiers (URIs). When I publish my little graph about Armstrong it becomes part of a vast world-wide graph: the Semantic Web. So, machines hunting for information about Armstrong can reach my graph and every other graph about Armstrong. This approach allows the web to become a huge distributed knowledge base.


Your most common usage of the I2E Web Services API is likely to be to automate query execution to generate results.

Queries themselves are always constructed and refined in the I2E client interface; from there they can be saved onto the I2E server ready for batch processing. When running a query automatically you need to provide, as a minimum, two pieces of information: the location of the index and the location of the query.

In this post we won’t worry too much about the index — we’ll assume that the index that the user originally used to create their query is still available — and focus on the query.

As saved by the user, the query contains sufficient information to specify the search itself (keywords, classes, phrases, etc.) as well as controlling the output settings, which will include (among other things) the format of the results (HTML, TSV, XML, etc) along with the ordering of results and selection of columns and highlighting.

When thinking about automating query submission, there are four use cases to consider: submit the query with no modifications; submit the query with modifications to the output settings; submit the query with modifications to the search terms in the query; and submit the query with modifications to both the output settings and the query search terms.

In each case, the query is run automatically by POSTing information (as a query template) about the query (in JSON format)  to the I2E server: the query template also contains information about the index to be searched.

The I2E server will then return some information about the query task including the status of the search and the location of the results.