5 Tips for Making Salience Work For You

  3 m, 5 s

There's a lot of ways to tune Salience: from tweaking a few, simple sentiment weights to developing custom patterns that explore the relationships between complex grammatical patterns. The quickest and easiest ways to tune the engine to your own needs is through our options. These are simple choices you can make to tell the engine how you want your results processed.

Options are set via the API, so a few simple lines of codes change what the engine does. You can check here to see how options are set in the programming language you're using.

Here are five options you might consider changing, based on your text analysis needs:

 1. Tagging Threshold

Some data sources are filled exclusively with high quality, interesting content. Other feeds...aren't. If you're worried about corrupted articles filled not with words but arcane computer symbols throwing off your counts, consider changing our tagging threshold. This defines the minimum percentage of actual text characters the engine requires to do any processing. On the flip side, if your content does contain many odd characters for a legitimate reason, this option might need turning down.

2. Process HTML Content

If you're passing html into Salience, this is an option you should definitely set. When turned on, we'll strip out all the tags and even do our best to differentiate the content of a page from the sidebar and inserted advertising. An important caveat! If your content is not actually html, setting this option will still activate the advertising stripping algorithm and may cause you to lose some content. So make sure to turn it off when you're done processing html.

3. Anaphora Resolution

What we call anaphora, you might call pronouns, and Anapohra Resolution means counting those pronouns as instances of the mentions they refer to. This is usually what you want, but you can turn this feature off if you prefer. Sometimes you're more interested only in how often a company or person was called out explicitly by name.

4. Neutral Upper and Lower Bound

When processing an individual document, Salience returns raw sentiment scores for you to process as you desire. What's the cutoff for a 'positive' mention versus a 'negative' mention? How many gradations are desired? It varies from use case to use case, so we leave that choice to you.

When processing a collection, though, Salience will group your results for you. These options allow you to specify the score cutoffs used for what a 'positive' mention and a 'negative' mention. As with other options, when there's no right answer we leave the final decision in your hands.

5. Theme Topics

We've talked about options for managing content quality, options for managing different file formats and options for controlling how your results are calculated. One final class of options can be used to get a little more speed out of Salience. Theme Topics are an example of an interesting feature Saliene calculates, but which you don't have to pay a performance cost for if you aren't using.

You can write Boolean Queries or Concept Queries to discover which documents discuss which topics. We've also got themes, which are noun phrases that convey ideas discussed in an article. If you want to know which themes go with which topics, Theme Topics tell you just that. If, though, you aren't using this feature, the calculations performed by Salience to discover them aren't of any benefit to you. So turning this option off will give theme calculations a small boost.

This is just a sampling of the options available in Salience, check out the full list if you want to see what else is available, and keep an eye on our release notes for new options in new releases of Salience.

Categories: Product Information, Text Mining

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>