The past few weeks have been busy: we’re fresh from our Text Mining Summit, which included a dedicated training session for users who wished to develop against the I2E Web Services API.

I also had the opportunity to go on site to a customer to provide some focused API training.

These sessions generated lots of interesting questions about automating processes from an administration perspective as well as a user perspective.

As I was presenting some high-level slides during the Text Mining Summit, I noted that I was mixing up put and post (and sometimes place and push!) in a way that is forgivable when using them as English verbs, but unhelpful when trying to explain a RESTful Web Service.

So after the Summit, I went back to our Developers Guide and back to my notes and started over, to create a helpful explanation of when you POST and when you PUT to the I2E Server.

Both PUT and POST are methods to transfer data to the server and there are some use cases when they can be used interchangeably.

One example of that is creating a new file called newfile.txt in the Source Data collection on the I2E server:

PUT url=http://i2eserver:8334/api;type=data/newfile.txt data=filecontent

POST url=http://i2eserver:8334/api;type=data/?base=newfile.txt data=filecontent
 

The end result will be the same in either case, but as you can see from the URL, the resource name is set differently:


The internet right now, as Tim Berners-Lee points out in Scientific American, is a web of documents; documents that are designed to be read, primarily, by humans.

The vision behind the Semantic Web is a web of information, designed to be processed by machines. The vision is being implemented: important parts of the key enabling technologies are already in place.

RDF or the resource description framework is one such key technology. RDF is the language for expressing information in the semantic web. Every statement in RDF is a simple triple, which you can think of as subject/verb/object and a set of statements is just a set of triples.

Three example triples might be: Armstrong/visited/moon, Armstrong/isa/human and moon/isa/astronomical body. The power of RDF lies partly in the fact that a set of triples is also a graph and graphs are perfect for machines to traverse and, increasingly, reason over. After all, when you surf the web, you’re just traversing the graph of hyperlinks. And that’s the second powerful feature of RDF.

The individual parts, such as Armstrong and moon, are not just strings of letters but web-addressable Uniform Resource Identifiers (URIs). When I publish my little graph about Armstrong it becomes part of a vast world-wide graph: the Semantic Web. So, machines hunting for information about Armstrong can reach my graph and every other graph about Armstrong. This approach allows the web to become a huge distributed knowledge base.


Your most common usage of the I2E Web Services API is likely to be to automate query execution to generate results.

Queries themselves are always constructed and refined in the I2E client interface; from there they can be saved onto the I2E server ready for batch processing. When running a query automatically you need to provide, as a minimum, two pieces of information: the location of the index and the location of the query.

In this post we won’t worry too much about the index — we’ll assume that the index that the user originally used to create their query is still available — and focus on the query.

As saved by the user, the query contains sufficient information to specify the search itself (keywords, classes, phrases, etc.) as well as controlling the output settings, which will include (among other things) the format of the results (HTML, TSV, XML, etc) along with the ordering of results and selection of columns and highlighting.

When thinking about automating query submission, there are four use cases to consider: submit the query with no modifications; submit the query with modifications to the output settings; submit the query with modifications to the search terms in the query; and submit the query with modifications to both the output settings and the query search terms.

In each case, the query is run automatically by POSTing information (as a query template) about the query (in JSON format)  to the I2E server: the query template also contains information about the index to be searched.

The I2E server will then return some information about the query task including the status of the search and the location of the results.
 


When choosing to develop an application that uses I2E, it is important to understand the capabilities of our text mining software as well details of the API itself.


 

As the graphic shows, tasks that are performed on the I2E Server are independent of each other and so allow diverse applications to be created: one app to run large-scale queries and present the information in a visual form in one example; another app to process documents automatically and publish the resulting indexing to the I2E Query GUI is another.

Today’s post is more about the latter: what are the basic details of the API itself; what languages are supported and what do we provide to get you started.

The WSAPI is RESTful. It communicates over HTTP(S), which is good for a number of reasons, including the ability for users to easily browse resources on the I2E server, and because there are freely-available modules or libraries for creating clients in different languages, including JavaScript running in a web browser (or even the web browser itself).

So the question of “what languages are supported?” generally becomes “what languages can use an HTTP/REST interface?”.

Here is a short list of some libraries/modules that we or our customers have used in our applications to communicate with the I2E server using the WSAPI:


The release of I2E 4.0 at the end of 2012 included a Web Services API (WSAPI) for our software for the first time.

The availability of this interface, along with sample code and a sample GUI, meant that it was possible for developers to include integration with I2E into their code.

We’ve used standard technologies when building our API but there are many software-specific features that need to be understood before you can choose what capabilities of I2E to include in your applications. For this, we are providing additional training materials such as training sessions at our Text Mining Summit, webinars, traditional phone and email support and, well, this blog.

This blog category — I2E WSAPI — will contain posts aimed at two different (but hopefully overlapping) audiences:

Existing I2E users (query builders or administrators) who have ideas about how to extend I2E’s functionality.
Post for this audience will introduce technical concepts like RESTful APIs, programming languages and interface creation.
Developers who wish to integrate world-class text mining into their product. Posts for this audience will introduce text mining/I2E concepts like search strategies, ontologies and results analysis.

In each case, the posts will cross-over so that there’s something to be learned by everyone. In addition, there will be posts where no knowledge can be assumed from either side, with topics including session management within I2E, Administration tasks and smart query handling.

We will try to keep posts short but frequent, giving you a chance to follow along on a weekly basis. If you have questions or comments, please contact us.