Extract Transform & Load (ETL) or Extract Load & Transform (ELT) projects have been around for many years, and continue to be of importance in many IT projects. However, ETL/ELT continues to face big challenges when it comes to tackling the vast majority of the data. It is widely accepted that 80% of big data is unstructured. The majority of industry solutions, including ETL and ELT, are not equipped to handle unstructured data. As a result, these solutions only address a small percentage of the available data, and overlook the value buried in unstructured or semi structured data.
Linguamatics fills this value gap in ETL/ELT projects, with solutions that are specifically designed to address unstructured data extraction and transformation on a large scale.
Linguamatics I2E NLP-based text mining software extracts concepts, assertions and relationships from unstructured data and transforms them into structured data to be stored in databases/data warehouses. Linguamatics I2E AMP can scale operations up to address big data volume, variety, veracity and velocity.
Scalable Text Mining for ETL/ELT provides:
- Scalable indexing
- Parallel indexing processes exploit multiple cores
- Distributed indexing across machines
- Scalable querying
- Distribution across cores
- New I2E OnDemand infrastructure is configured to exploit 150 core machines
- Distribution across machines
- Federated architecture
- Support for load balancing
- Scalable document processing pipelines
- Distributed processes across machines
- I2E AMP Asynchronous messaging platform provides fault tolerant and scalable processing
- Hadoop compatible