Skip to main content
NLP Data Factory Automation at Scale

NLP Data Factory

Our NLP Data Factory offers scalable and automated NLP transformation, either custom or out of the box for key applications and supports a wide range of data sources. Structure data to integrate into ETL workflows and machine learning models. 

  1. Data Factory overview
  2. Data Factory Video
  3. Key benefits
  4. Key features
  5. Applications
  6. Technical overview

The case for a wide-lens Natural Language Processing solution

Most organizations have a constant stream of new data added to their siloed organization’s systems - but about 80 percent is textual unstructured or semi-structured data that is rarely used despite its paramount importance in driving clinical and commercial outcomes. The reasons are varied, but for most, it boils down to three factors: there is too much data, search is ineffective to find the right documents, and employees’ time is too valuable to spend on time-consuming manual abstraction.

Niche NLP solutions can be helpful but are not able to extract and integrate information across multiple business areas within an organization. That has resulted in a disjointed “department-to-department” approach to business intelligence that is frustrating, ineffective, and unsustainable.

Technology has reached a pivot point, and it is time to widen the lens with an automated NLP platform that solves all of these challenges.

Extract, enrich and normalize with NLP automation

The NLP Data Factory rapidly surfaces and normalizes features of interest at scale, in an automated, robust and easily configurable pipeline. NLP and automation combine to deliver comprehensive value across multiple lines of business. The NLP Data Factory can be deployed as a stand-alone solution or be embedded in your existing workflows or technology stacks. The easily configurable NLP Data Factory seamlessly integrates with your internal and external source data and provides a platform for you to combine best-in-class natural language processing with a robust and flexible automation pipeline.

Linguamatics NLP Data Factory for Extract-Transform-Load processing of textual data has three key components: Data ingestion from a wide range of disparate sources such as EMR extracts, call transcripts, internal reports; with integrated OCR & table processing Highly scalable world class NLP transformation, either custom or out of the box for key applications: clinical and scientific research, bridging and mapping, metadata tagging, categorization Data Input/Output to common standards such as DB table (SQL), JSON, XML, RDF, FHIR

Linguamatics NLP Data Factory for Extract-Transform-Load processing of textual data has three key components:

  1. Data ingestion from a wide range of disparate sources such as EMR extracts, call transcripts, internal reports; with integrated OCR & table processing
  2. Highly scalable world class NLP transformation, either custom or out of the box for key applications: clinical and scientific research, bridging and mapping, metadata tagging, categorization
  3. Data Input/Output to common standards such as DB table (SQL), JSON, XML, RDF, FHIR

The result: rich, contextual information that elevates your business and gives you a competitive edge, automatically delivered where and when you need it.

Key NLP Data Factory benefits

  • Bring organization-wide value at scale from the key disparate data your organization has generated or invested in
  • Automate NLP workflows to process millions of documents and data fields every hour across a range of business lines
  • Deliver ready-to-use data that close gaps in existing knowledge, surface important context, and support better informed decision making

Key features of an automated NLP solution

  • Award winning NLP connects to diverse sources, and outputs to a wide range of standard formats
  • Embedded optical character recognition means no scanned text is left behind
  • Seamless integration into existing workflows and ML models
  • Flexible deployment on-premise, in cloud environments, or in a hybrid implementation
  • Effortless recognition and normalization of complex constructs, such as cancer staging
  • Fluent incorporation of trained ML models

NLP automation 8 million documents processed per hour

Numerous applications to strengthen your organization

The NLP Data Factory is a multi-mission NLP solution that will transform your business across myriad areas to add richness, context, ease, and insight. Here are just a few examples.

NLP Data Factory use cases metadata enrichment, clinical documentation improvement, SNOMED coding, novel target intelligence, biomarker discovery, safety case processing, clinical trial analytics, medical affairs insights, oncology profile, social determinants of health

Social determinants of health

Build a complete picture of patients by automating extraction of important predictive characteristics such as social determinants of health and lifestyle factors.  Only by ensuring these features are identified and acknowledged can equitable healthcare be delivered.

Learn more

Biomarker Discovery

Identify previously unknown relationships between biomarkers and disease profiles, and quickly identify sources that provide evidence for multiple biomarkers and phenotypic indicators of interest.

Learn more

Medical affairs insights

Extract unstructured information from diverse data sources, including Voice of the Customer (VoC) data from patient surveys and call center verbatims, customer complaints databases, focus groups, etc., and regularly monitor for potential product issues, competitive insights and breaking trends.

Learn more

Oncology profile

Surface clinical attributes such as cancer stage, tumor size, histology, and biomarker values to normalize and standardize high-complexity cancer information – making your data research ready.

Learn more

Other

Clinical documentation improvement 

Reduce time spent manually scouring charts to improve clinical documentation, and surface information to ensure the correct diagnosis is documented.

SNOMED coding

Rapidly process the clinical notes for each patient to normalize text to SNOMED CT codes, and make unstructured data ready for a common data model.

Metadata Enrichment

Break down data silos by intelligently tagging unstructured documents with rich metadata to increase their accessibility and value to your organization.

Novel target intelligence

Extract new findings for gene targets from scientific literature or patents, with context for specific diseases, drugs, and competitor organizations.

Safety case processing

Rapidly intake and process textual narratives of individual case safety reports for pharmacovigilance, including social media, post‐marketing safety reports, literature reports. Capture potential issues not explicitly flagged in structured clinical reports but documented in unstructured notes.

Clinical trial analytics

Glean high-value insights from the unstructured text in clinical trial reports for use in future study design and site selection, or to gain actionable information about competitors' worldwide clinical development activities.

Custom application areas 

The NLP Data Factory can be used for any custom application area where enriching unstructured and semi structured data is needed. Create and deploy your own NLP searches in an easy-to-use interface to see your data transformed reliably and repeatably at scale.

Technical overview

  • Industry-proven NLP technology
    The NLP Data Factory uses Linguamatics NLP technologies to power the normalization and standardization of input data. This blend of methods combines rule-based queries, machine learning, terminology matching, pattern extraction and relationship identification to ensure the highest possible accuracy for the task in hand.
     
  • Fast, scalable, architecture
    The components required to power the NLP Data Factory has been developed together to optimize the efficiency of the system. An internal orchestration component (AMP) is designed to parallelize incoming data, ensuring that scaling is effective and matches the availability of resources.
     
  • Flexible NLP framework
    The flexible nature of the IQVIA NLP query engines ensures that new modules can be dropped into the NLP Data Factory with no effort. These modules can be used right away or can be tuned further using a powerful browser-based query editing tool.
     
  • Easy deployment via Kubernetes
    Components in the system are containerized for simpler management. Furthermore, the full system is deployed using Kubernetes or equivalent, allowing for simpler installation, easier service monitoring and automated scaling of the system.

Seamless integration with our NLP Insights Hub

The NLP Data Factory was designed to complement other Linguamatics NLP offerings, including the NLP Insights Hub. Once you’ve unlocked your textual data at scale, you can easily feed those outputs into the NLP Insights Hub for customized dashboards and visualizations related to topics of interest. Navigate your data more effectively with our full suite of NLP solutions.

Ready to get started?

Request a Demo

Questions? Ask our experts