Medical Coding Data Extraction Services

  6 m, 18 s

Lexalytics doesn’t try to cure cancer. Instead, we try to help the people who are curing cancer work more effectively. Which is why we’re now offering custom medical coding data extraction services.

Here’s the short version: We’ll build you a system to automatically identify and extract procedural and diagnostic billing codes and other data from medical documents and then insert it all into your data warehouse, electronic health records (EHR) system, or other application, all with a minimum of human involvement.

In this article I’ll explain how we’re using AI and natural language processing to do this, and how this helps to solve the expensive, error-prone mess that is medical coding and billing.

Medical Coding and Billing is an Expensive, Error-Prone Mess (But It’s Fixable)

Consider Becker’s Hospital Review’s top five issues in medical documentation and coding:

  1. Human error
  2. Complex documentation
  3. Mismatched fee/service invoices
  4. Denied/delayed claims
  5. Repayment

The first three problems are particularly frustrating and challenging. But they’re also totally solvable. And we’ll get to the solution in a bit.

First, however, we need to understand why medical coding and billing systems are so error-prone.

Medical Coding Systems are Too Damn Complicated

Medical coding and billing is supposed to go like this:

  1. A patient visits a healthcare provider
  2. The provider takes notes on diagnoses made and treatments recommended during the visit, and creates records of facilities used (like MRI scanners)
  3. A professional medical coder translates these notes and documents into alphanumeric or pure-numeric codes, and then uploads those codes into an electronic health record system
  4. The healthcare provider uses these codes to submit claims for the appropriate amounts to the insurance company, patient, and other parties involved

Simple in theory. But as we’ve gained more knowledge of diseases and ailments, so too have our medical coding systems expanded.

Today, ICD-10 contains more than 71,000 procedural codes and 70,000 diagnosis codes. Current Procedural Terminology (CPT) and Healthcare Common Procedural Coding System (HCPCS), the other mainstream systems, both have many thousands more.

What’s more, medical coding and billing systems are presided over by arcane sets of regulations, laws and guidelines. These are constantly changing, but every stakeholder is expected to stay up-to-date.

Most Medical Coding Mistakes are Both Inevitable and Hugely Problematic

The actual coding processes are extremely complicated, too. Many procedures involve half a dozen or more codeable components. Afterwards, each component has to be translated into an EHR code from the doctor’s notes.

But doctors’ handwriting is notoriously difficult to decipher, and it’s easy to imagine an overworked coder misreading or mistyping an entry.

The result of this system? Some labs are reporting medical coding error rates of up to 90% due to documentation issues.

Faulty claims get denied by the payor and must be reprocessed. This creates delays and leads to lost revenue for the provider. And in cases involving Medicare, the Office of the Inspector General is quick to fight back against over-billing with massive lawsuits.

So what we have now is an extremely complicated system that by its very nature leads to denied claims, big delays, lost revenue, and lawsuits.

But the problems don’t stop there.

Medical Coding Data Extraction Is Really Expensive

Here’s an example of an HCPCS Code Addition memorandum:

Documents like this are distributed regularly throughout the year for all medical coding systems. In fact, AAPC expects more than 350 updates to CPT in 2018 alone.

Now, you could hire a group of people to receive these documents and make updates to your EHR or data warehouse by hand. But this is expensive: Glassdoor reports an average salary of more than $41,500 for professional medical coders, and you’ll need to hire a bunch of them.

What’s more, human error is both inevitable and potentially catastrophic, as we pointed out earlier with the help of Becker’s Hospital Review.

So now you’re looking at $83,000 to $207,500 and beyond just to stay on top of medical code additions and updates, with an all-but-guaranteed chance of error.

One option would be to instruct your coders to focus on accuracy. But this will slow them down, increase delays, and cost more in the long run.

Or, you can engage with Lexalytics.

Using a combination of AI and natural language processing, we accelerate these processes while reducing your vulnerability to human error.

Combining AI and Natural Language Processing for Faster, Better Medical Coding Data Extraction

Without getting into details, our natural language processing technology is really, really good at recognizing people, places, and other named entities within a document.

A large part of this is done with machine learning-based pattern recognition. And what is a medical code, if not a pattern?

By combining trained machine learning models (that collectively form an “AI”) with our underlying natural language processing, we both extract medical coding data and understand the underlying content.

In short, we:

  1. Identify the implicit underlying structure in an otherwise unstructured document
  2. Extract that data
  3. Structure and prepare it for you to use

For example, here’s what our medical coding data extraction code looks like for the HCPCS Code Additions document above:

This code identifies and extracts the important data from the PDF and then structures it into an Excel file, like so:



Voila! Suddenly that complex medical coding data is no longer trapped in the original PDF. Now it’s extracted, indexed and categorized, ready for insertion into your EHR, database or data warehouse.

But why stop there?

Our understanding of the underlying text enables us to do some really interesting things with that newly-structured data, such as…

Building an Intelligent Medical Billing Recommendation System with AI and NLP

Let’s say your goal is to create an intelligent recommendation system to help your physicians prescribe the right treatments and assist your invoicing team in billing the correct amounts.

This system must:

  1. Contain an up-to-date database of healthcare billing codes
  2. Reliably connect the right codes to the right treatments
  3. Make intelligent recommendations
  4. Not fall apart in a light breeze

And while we’re at it, wouldn’t it be great if this same recommendation system could give your staff important, context-rich details about where a particular treatment code is applicable, when it’s reimbursable, and how to bill for it?

Maybe it could even look something like this:

(Yes, we actually built this system for a client. Want us to build something like it for you? Let’s talk.)

Because we actually understand the underlying context of the text data we extract, we can configure a system to make detail-rich recommendations that minimize the chance of human error. This is a great illustration of the power of combining AI and natural language processing.

Resiliency is Important for Medical Coding Data Extraction

One last note before we wrap up.

There are lots of data extraction tools for you to choose from. But most of them use fixed templates to grab content from documents. Which means that if the format changes, the system breaks.

Unlike those companies, Lexalytics uses machine learning to build a system that learns as it goes, based on hints about how to identify where things are (such as what a medical code looks like, or a date, or a company entity).

This means that our systems:

  • Can learn and get better as they go
  • Understand the underlying data they extract
  • Can use that to do useful stuff (such as make recommendations)
  • Are resilient to underlying format changes in the documents

Our systems don’t break, they learn.

Want to engage our medical coding data extraction services? Drop us a line.

Categories: Announcements, Machine Learning, Natural Language Processing, Newsletter, Technology