Concept Topics

Other kinds of text mining categorizers take lots of work to set up and/or are inherently brittle.

A conservative estimate for a query based categorizer with 100 buckets is probably on the order of 50 hours to get it up and running, and then you will have ongoing efforts to ‘tweak’ your category definitions as that one piece of rogue content gets misclassified. 

Lexalytics Concept Topics are designed to reduce the burden of this content analysis configuration through the use of our new Concept Matrix which has been generated from all of the content in Wikipedia™. 

In the Salience Five release package, we ship with a number of example Concept Topics.  Here are two of them.  The words next to "Food and Agriculture" are literally all there is to the definition of the Concept Topic

  • Agriculture    farming, agriculture, farmer
  • Food    food, meals, vegetable, meat, fruit

Consider the following sentences and you can see how they match to each of the concept topics:




I like chicken.


No match

I like chickens.

No match.


I like to eat chicken.



Here are a few other examples from the Salience Five release:

  • Aviation    aviation, airplane, flying   
  • Banking    banking, bank, mortgage, checking, savings   
  • Beverages    beverage, alcohol, soda   
  • Biotechnology    biotech, biotechnology, applied_biology, gene_therapy, genetic_engineering  
  • Business    business, management, executive, company, shareholder, mba   
  • Crime    crime, murder, arrested, theft, burglary, criminal, arraignment   
  • Disasters    disaster, tornado, earthquake, volcano, meteor, apocalypse, explosion, devastation   
  • Economics    economics, economist, GDP, game_theory, demand_curve

So, the sentence "American Airlines had to announce a gate change." correctly categorizes to Aviation, even though the word "Airline" doesn't occur anywhere in the aviation category.

Concept Topics will revolutionize categorization and the semantic analysis process.   Read on if you'd like to understand how much easier to use they are than other techniques.