Data Directories

Lexalytics Salience Engine is a highly configurable set of semantic analysis software libraries. Most of the configuration is available in flat text files that live in a unix directory-like structure. (Windows users, don't let that scare you.)

Each of these directories has a number of different configuration files in it. You can get a complete list of what's available for your text analysis configuration here.

As examples, some of the files set up named entity normalization, some set up relationship patterns, some are for sentiment analysis and opinion mining, and others are related to themes. We ship with default lists for famous people, important companies, sports figures, our main sentiment dictionary, and several others.

When starting a Salience session, you simply tell it which directory you want it to use. You can have different processors in the same system using completely different data directories.

Our non-English sentiment analysis language support is accomplished through the use of data directories - you just point Salience at the French (for example) directory, and off it goes.

The base Salience installation includes a default data directory (best for longer content), as well as another data directory that is optimized for Twitter to better analyze social media. (Read this for more information on the sort of optimizations we've performed for short form, grammatically weak content like Tweets.)

While this whole "data directory" thing may seem like a bit of pedantic configuration information, the power inherent in being able to completely control the behavior of the engine, and to be able to swap in new configurations simply through re-instantiating Salience (an operation of a few seconds) allows for unparalleled flexibility in our engine -- which means that you can make it fit like a glove into your own application.