Class Word

java.lang.Object
com.lexalytics.salience.Word

public class Word
extends java.lang.Object

A Word object represents a token in the content being processed in Salience. Generally, this corresponds to what humans think of as words, but also includes punctuation and symbols.

This object is returned as a member of the Sentence objects returned from calls to the Salience.getDocumentDetails() API method.

  • Constructor Summary

    Constructors 
    Constructor Description
    Word​(java.lang.String sToken, java.lang.String sPOSTag, java.lang.String sStem, float fSentiment, boolean bInvert, boolean bPostFixed, int nByteOffset, int nByteLength, int nCharOffset, int nCharLength)
    Creates a new Word.
  • Method Summary

    Modifier and Type Method Description
    int GetByteLength()
    Returns the byte length for the phrase (in UTF-8).
    int GetCharLength()
    Returns the character length for the phrase (actual characters.
    java.lang.String getPOSTag()
    The part-of-speech tag for the token.
    float getSentiment()
    The sentiment score for the token.
    int GetStartByte()
    Returns the byte offset within the document for the phrase (in UTF-8).
    int GetStartChar()
    Returns the character offset within the document for the phrase (actual characters.
    java.lang.String getStem()
    The stemmed form of the token.
    java.lang.String getToken()
    The text of the token.
    boolean isInverter()
    A flag indicating if the token is or is part of a sentiment inverting construction such as a negator.
    boolean isPostFixed()
    A flag indicating that for display, the following term shouldn't be separated by a space

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Word

      public Word​(java.lang.String sToken, java.lang.String sPOSTag, java.lang.String sStem, float fSentiment, boolean bInvert, boolean bPostFixed, int nByteOffset, int nByteLength, int nCharOffset, int nCharLength)
      Creates a new Word. This is not intended for client use. Words are created by Salience Engine when related API methods are called.
      Parameters:
      sToken - A String containing the token.
      sPOSTag - A String containing the part-of-speech tag for the token.
      sStem - A String containing the stemmed form of the token.
      fSentiment - The sentiment score for the individual token.
      bInvert - A flag indicating if the token inverts sentiment.
      bPostFixed - A flag indicating that for display, the following term shouldn't be separated by a space
      nByteOffset - The byte offset of the phrase (in UTF-8 representation).
      nByteLength - The length of the phrase in bytes (in UTF-8 representation).
      nCharOffset - The character offset of the phrase (actual characters. Java for example counts some emoji as two characters, so there may be discrepencies)
      nCharLength - The character length of the phrase (actual characters. Java for example counts some emoji as two characters, so there may be discrepencies)
  • Method Details

    • getToken

      public java.lang.String getToken()
      The text of the token.
      Returns:
      A String containing the token.
    • getPOSTag

      public java.lang.String getPOSTag()
      The part-of-speech tag for the token. The tagset used for Salience part-of-speech tags are based on the Penn Treebank tagset. The full tagset can be found on the Lexalytics developer wiki.
      Returns:
      A String containing the part-of-speech tag.
    • getStem

      public java.lang.String getStem()
      The stemmed form of the token.
      Returns:
      A String containing the stemmed token.
    • getSentiment

      public float getSentiment()
      The sentiment score for the token.
      Returns:
      The sentiment score for the token.
    • isInverter

      public boolean isInverter()
      A flag indicating if the token is or is part of a sentiment inverting construction such as a negator.
      Returns:
      A boolean flag indicating if the token is an inverter.
    • isPostFixed

      public boolean isPostFixed()
      A flag indicating that for display, the following term shouldn't be separated by a space
      Returns:
      true if the following word should follow immediately in display without an intervening space.
    • GetStartByte

      public int GetStartByte()
      Returns the byte offset within the document for the phrase (in UTF-8).
      Returns:
      The byte offset for phrase.
    • GetByteLength

      public int GetByteLength()
      Returns the byte length for the phrase (in UTF-8).
      Returns:
      The length in bytes of phrase.
    • GetStartChar

      public int GetStartChar()
      Returns the character offset within the document for the phrase (actual characters. Java for example counts some emoji as two characters, so there may be discrepencies).
      Returns:
      The character offset for phrase.
    • GetCharLength

      public int GetCharLength()
      Returns the character length for the phrase (actual characters. Java for example counts some emoji as two characters, so there may be discrepencies)..
      Returns:
      The length in characters of phrase.