Should You Build or Buy Text Mining Software? Depends.

Should you DIY?

Screen Shot 2015-07-20 at 4.00.21 PMA while back, I decided to build my own computer.

It was a fun, but ultimately time consuming, experience. I had to research online, go shopping for parts, and then figure out how to put them together.

Am I writing from that computer right now? No.

I’m on my store-bought Macbook – because when I need to get work done, I can’t wait until I’ve figured out how to troubleshoot the blue screen of death or whatever other issue is currently plaguing my cobbled together machine (I think, to be fair, I’m just not very good at building computers).

Now if I were only interested in how computers were built, or if I wanted to become a computer repair technician, spending time fixing the computer I built by myself would be a valuable use of my time.

But my goal when I get to work is not to learn how to become a computer repairman – it’s to open up a word doc and get to work.

For that I don’t need a computer I built, I just need a computer that works straight out of the box.

The same basic logic applies to the question of whether you should buy software for text mining, or build your own. Only, in the case of text and sentiment analysis, there are many more variables when considering whether to build or buy text mining software.

A Whole Lot of Moving Parts

  • text mining systemLanguage Detection
  • Part of Speech Tagging
  • Named Entity Recognition
  • Sentiment Analysis
  • Syntax Analysis
  • Entity Sentiment

Each of these components is going to require consideration when building any text mining system from scratch..

Language detection, Part of Speech Tagging, and Named Entity recognition are typically machine learning tasks. Each one requires its own annotated corpus of tens of thousands of documents per-language. Each language requires training, and must constantly be kept up to date as language and usage evolve.

And unless you want your sentiment analysis to be highly limited, you’re going to need more than machine learning. For more useful results, it’s much better to use Natural Language Processing techniques to build out sentiment and then use syntax analysis to associate the sentiment back to the entities. Of course, that will require heavy investment in the core NLP technology for each language supported.

mechanic fixing carYou could do all this, but it’s going to be expensive and time consuming.

For the most basic case: document sentiment for a specific language in a particular vertical, where we ignore Part of Speech tagging and just use machine learning, you’re still looking at somewhere between 12 to 18 weeks to get things up and running.

And if you need more customization and configuration? No dice. You’d have to get a new training set and go through the whole process over again.

Using a Mature, Off-the-Shelf Text Mining System

When you let someone else do the work of paying constant attention to all the moving parts that make up a text mining system, you have a lot more time to take care of customers, to make sales – in short, you have a lot more time to do your job.

Instead of spending months building something rigid and, at best, somewhat useful, you can get immediate results with out-of-the-box usability.What we analyze - 65% Twitter, 10% news articles, 10% surveys, and 15% mixed

Prebuilt software, like Salience and Semantria, has already been heavily optimized for scalability. Our software processes billions of documents a day, and it does it fast.

We’re able to scale to handle the largest of loads because we’re implemented in C for speed, and we’ve learned a lot of tricks for handling language efficiently over the ten years we’ve been shipping commercial software.

Salience and Semantria are also a snap to customize. Because we use a combination of deep learning and NLP techniques, we can offer rich customization through immediately accessible configuration files.

Build or Buy Text Mining Software?

If your goal is to learn as much as you can about natural language processing, or your goal is to research new natural language processing techniques, then there really isn’t any substitute for building something yourself.

happy, whistling car driverHowever, if your goal is simply to get the best, quickest results from your text so that you can worry about other business problems, then you’re definitely going to be best served by using a pre-existing could service or buying off-the-shelf-software.

For more on whether to build or buy text mining software, click to download our white-paper: “Build Vs. Buy For Text Mining”

