Glossary of terms
A form of technology that can be easily and quickly adapted.
Example: I2E is an agile technology because it can be used across many different applications and is continually improved based on feedback and requirements from customers.
Application Programming Interface (API)
An application programming interface or API is a set of routines designed to enable software developed by a third-party to access the services of a website or computer application. API documentation describes the proper way for the developer to access this functionality, which may be proprietary or based on a standard such as REST (see below).
Artificial Intelligence (AI)
The term artificial intelligence or AI was coined by computer scientist John McCarthy at a conference in 1956. Today, it is an umbrella term covering everything from process automation to robotics. AI can be categorized in any number of ways, but a common example is Weak vs. Strong AI. Weak AI refers to a system that is designed and trained for a particular task. Virtual assistants, such as Apple's Siri, fall within this category.
By contrast, a Strong AI system has human cognitive abilities so that when presented with an unfamiliar task, it has enough intelligence to find a solution. In the context of big data, AI can perform tasks such as identifying patterns more efficiently than humans, enabling businesses to gain more insight from their data.
The information required for key decision makers within an organization to make strategic decisions. This often includes direct information related to products, customers, or competitors and indirect information related to economic trends, political situations, environmental issues, etc. The goal is to gain complete situational awareness and increase an organization’s competitiveness.
Cooperative Patent Classification (CPC)
A patent classification system jointly developed by the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO). CPC has been developed from the previous European classification system and has a greater level of detail than the International Patent Classification (IPC) system.
James Dixon, the founder of Pentaho, is usually credited with defining the term "data lake". He explained it as follows: "If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples".
A more formal definition is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.
A data warehouse stores large amounts of data that has been collected and integrated from multiple sources. Because organizations depend on this data for analytics or reporting purposes, the data needs to be consistently formatted and easily accessible – two qualities that differentiate a data warehouse from a data lake.
ETL (Extract, Transform and Load)
Extract Transform and Load refers to a trio of processes performed when moving raw data from its source to a data warehouse or relational database. Data must be properly formatted and normalized in order to be loaded into these types of storage systems, and ETL is used as shorthand to describe the three stages of preparing data. ETL also describes the software category that automates the three processes.
Extensible Markup Language (XML)
A markup language that defines a set of rules for encoding documents in a format that is both machine-readable and human-readable.
Freedom to Operate
The determination that a particular action does not violate the intellectual property of others. This may refer to product commercialization, branding, the manufacturing process, etc.
Hadoop is an open source, Java-based programming framework that supports the processing and storage of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Intelligence Augmentation (IA)
Intelligence augmentation, also referred to as augmented intelligence or intelligence amplification, is an implementation of Artificial Intelligence that focuses on AI's assistive role, emphasizing the fact that it is designed to enhance human intelligence rather than replace it. It is hoped that this term will help people understand that IA will simply improve products and services, not replace the humans that use them.
Words that occur in a text at a higher frequency than we would expect to occur by chance. Formally, the frequency of each word in a text is compared to its expected frequency using appropriate statistical tests. The expected frequency of a word is derived from a large corpus which is considered as a reference for the general language use. The quality or significance of keywords can then be evaluated by the level of their enrichment within the text.
An MDL Molfile (*.mol) is a format frequenctly used for files that contain molecular information (e.g. atom type, bond type, atom coordinates, etc.). This information is often used to recreate a given molecule in different software including Mathematica.
Natural Language Processing (NLP)
A computer science field that deals with the recognition, analysis, and replication of human (natural) languages on a large scale by computer programs.
In the context of information science, an ontology is a formal set of definitions, properties or relationships between entities of a particular knowledge domain. Ontologies are useful in different scientific domains to categorize or classify terms.
Patent Analysis Tools
Patent analysis tools enable companies to extract significant information from patents databases. This information can be influencial in decisions related to release of products to market and exploring new revenue sources. It also gives an updated view of IP-associated activities in the field of interest and allows keeping track of competitors.
Patent Landscape Analysis
A multistep process that goes scans through all available patent information related to a technology and provides a snapshot of the current state. This type of analysis is sometimes accompanied by technological, legal and business reports which together provide a valuable resource for corporations, research organization and investors.
A process aimed at finding a specific set of patents that fall within a given scope. It can be either limited to a specific technology, a given time frame or specific owner(s).
A term used when referring to the capabilities of I2E. You can create sophisticated, specific queries which return relevant relationships rather than documents. You can define a query which asks for any person and any project, and I2E will provide a table of data that’s ready to go straight into your database or spreadsheet. This is accomplished using linguistic and domain analysis which most information retrieval engines don’t have, and by fast searching algorithms which ensure you get the right results quickly.
RESTful Web Services
REpresentational State Transfer (REST) is a software design approach used to build web-based services that are lightweight, maintainable, and scalable. A service built on the REST architecture is called a RESTful service. REST is a popular choice for building cloud services, and is used by sites such as Amazon, Google and Twitter.
Semantic enrichment is the process of providing data with additional value by assigning meaning to it, making it more accessible and relating it to other content or assets to develop new services. Value is typically added by annotating content with semantic markup and metadata. By ensuring that it is structured and semantically tagged, content not only becomes more discoverable, but it can be linked to other data-sets to create new content services.
Structured data consists of files containing well-organized information. Structured data is stored in a traditional database or data warehouse where it can be retrieved for analysis. Before the era of big data, and new data sources such as social media, structured data was an organization's primary input for making business decisions. Given its already-organized nature, structured data is largely managed with legacy analytics solutions.
Also known as Text Analytics, text mining refers to deriving desired information from text using statistical methods or algorithms that identify patterns and trends within the data. The input text is usually unstructured information, and the process of mining can involve parsing and/or the addition of linguistic features such as parts of speech, specific search terms and others. The text then becomes more structured and the interpreted data can be put into databases or be further analyzed by machine learning algorithms and other methods.
The Simplified Molecular-Input Line Entry System (SMILES)
A standard notation format for describing structures of chemical molecules using character-based strings. This format allows conversion back into two- or three-dimentional representation of molecules.
Unstructured data continues to grow in influence as organizations try to leverage new and emerging data sources. These new data sources are made up largely of streaming data coming from social media platforms, mobile applications, location services, and Internet technologies. The diversity of unstructured data sources make it far more of a challenge to manage and retrieve than traditional structured data sets. The development of data lakes and the Hadoop platform are examples of a new breed of technologies designed to address these challenges.
White Space Analysis
The process of uncovering and analyzing current unmet needs or future opportunities, specifically for products or services that don’t exist in the marketplace.