Categorization in 2 Minutes

Categorization of Text

Categorization is a core function of text mining software. It’s most easily explained with an example: I wonder who will win in the California mid-term congressional elections? would be classified or associated with a topic called “Politics”. If your input contains many documents similar to the above, our text mining tool will show you that your customers are very interested in politics, without you ever having to read a single document yourself. In fact, if you tune the classification models further, they’ll show just which area of politics people are discussing: in this case, the California congressional mid-terms.

Methods for Categorizing

Everyone categorizes content, but few text analytics tools do it as well as Lexalytics. Our categorization processes are effective, reliable, and fully customizable, capable of showing you everything from the broader picture down to the minutiae that drive informed business decisions.

Categories help you sort large volumes of text, without actually reading them. Take 10,000 consumer Tweets and categorize them under politics, gaming, religion, food, or whatever else the consumers are discussing; sort through hundreds of academic papers to find the ones relevant to your research; sift through thousands of TripAdvisor reviews to see what areas of your hotel need improving. Analyzing the equivalent number of documents by hand would take thousands of man-hours; automatic categorizing of text saves you time and returns immediately-actionable results.

categorization diagram

We have lots of ways you can do categorization. Two methods are based on the Concept Matrix: Concept Topics (aka User Categories in Semantria), and Auto-Categories. Concept topics are good for very broad categories like “food” – and not good for surgical stuff. Concept Topics/User Categories are configured/customized by the user. Auto-Categories are configurable in Salience, but not in Semantria. They are based on the top 5000 Wikipedia articles that are then rolled up into ~120 categories. (Things like “boats” and “automobiles” and “math”). The most commonly used categories are our Query Topics- they are Boolean phrases that are simple to configure.

The Three Methods of Categorization

Lexalytics provides Three powerful ways to categorize content: query topics (simple search categories), model-based classifiers (machine-learning systems), and the Lexalytics Concept Matrix, a sophisticated web of relationships and associations between words and phrases.

All three categorization systems provide powerful, reliable content categorization and are fully customizable to the user’s needs. Users can utilize Lexalytics’ pre-defined query topics and pre-trained models, or train their own model-based classifiers to sort content into whatever categories fit their business.

Our classification techniques deliver meaningful information on the themes and topics that your consumers are focusing on — so that you can act immediately, safe in the knowledge that you are making an informed decision to further your business.