ETL & AMP: Not Just Another TLA (three letter acronym) - It’s How you Effectively Mine your Unstructured Big Data to Get Things Done!

With a background such as mine - medicine/ information technology/ government/ military - you need to know your audience, and ensure acronyms are appropriate.

In healthcare alone, DOA can mean several things: degenerative osteoarthritis, date of arrival, drug of abuse, dead on arrival, etc. Most of which I REALLY don’t want to see in a healthcare analytical report for Rheumatology.

Although ETL is no exception, it is widely used in the world of healthcare now as “Extract Transform and Load” and - unless you are speaking to a someone in the area of pulmonary and respiratory diseases - it will seldom get confused with “expiratory threshold load” which helps determine respiratory muscle efficiency. Then there is AMP, which in medicine is most commonly known as a adenosine monophosphate a vital component in all living cells. But for Linguamatics Health users, AMP is an acronym that is vital in it’s own right and stands for Asynchronous Messaging Pipeline.

Here at Linguamatics we are grateful to have some very talented folks that can explain our technological world in a way that is (sometimes) less technical. Alex Richard-Hoyling ( Senior Solutions Developer) explained how he helps ensure reliable data extraction in large healthcare systems via the Linguamatics Community. Below, I take the subject a step further to cross the chasm of where tech meets med.

What does ETL really mean in the world of NLP (Natural Language Processing) Healthcare Technology?

  1. Extract: Obtaining information from unstructured text. Unstructured text is anything that is typed into an electronic health record (EHR), rather than something that was clicked on or selected from a drop down menu, and stored in a structured database field. With NLP you can extract data from unstructured text. You can even obtain information from PDF documents, including information in tables.

  2. Transform: Making sure that the structure of the information obtained by one system is usable for the system it is going into. For example - you wouldn’t want to put a bunch of lab codes into a patient portal - instead, you’d first transform it into something meaningful to the destination data warehouse or mart.

  3. Load: Putting data into the correct destination to effectively use the information to ensure better patient care.

How do you ensure that the ETL process goes smoothly? Why, AMP of course!

Alex put it best when he described our solution for balancing the information load properly. Linguamatics I2E text mining solution utilizes Asynchronous Messaging Pipeline (AMP) for this. “Think of AMP as an intelligent traffic light system that only goes green when there is enough space for extra vehicles to circulate, yet allows them to queue up as needed.”

Is NLP really necessary to mine patient records - we have people doing this?

From an IT perspective, Alex stated that, “It would be unrealistic for the medical staff to manually go through all the records to get this information (from unstructured data).” A statement which, I couldn’t agree with more. Nonetheless, shockingly, we know in the healthcare industry, it is still often done. I believe if healthcare leaders calculated the cost of this repeated manual chart review, their resulting ROI (return on investment) calculations would make them want to run away to ROI (the airport code for Rovaniemi, Finland).

Personally, I’d rather run to ROI (the latter acronym defined)... summertime only please...just for fun.

Access the datasheet