Class OCRCharacterAttribute

java.lang.Object
com.lexalytics.salience.OCRCharacterAttribute

public class OCRCharacterAttribute
extends java.lang.Object

An OCRCharacterAttribute object represents the location of an OCR'ed character on a document page and the confidence that the character was read correctly.

A vector of attributes for suspect characters is passed as a parameter to a call to the Salience.CorrectOCRErrors(java.util.Vector<com.lexalytics.salience.OCRCharacterAttribute>, float) API call to guide the error correction process. Only those words whose characters appear in this attribute vector will be corrected. If the vector is empty, all words in the document will be checked and corrected if neccessary.

  • Field Summary

    Fields 
    Modifier and Type Field Description
    float fConfidence  
    float fHeight  
    float fWidth  
    float fXPosition  
    float fYPosition  
    int nCharOffset  
    int nPage  
  • Constructor Summary

    Constructors 
    Constructor Description
    OCRCharacterAttribute​(int nCharOffset, float fConfidence, float fHeight, float fWidth, float fXPosition, float fYPosition, int nPage)
    Creates a new OCR character attribute.
  • Method Summary

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • OCRCharacterAttribute

      public OCRCharacterAttribute​(int nCharOffset, float fConfidence, float fHeight, float fWidth, float fXPosition, float fYPosition, int nPage)
      Creates a new OCR character attribute. Set any unused parameter to -1.
      Parameters:
      nCharOffset - Character offset of the OCR'ed character in the document text
      fConfidence - Confidence that the character was OCR'ed correctly, in range 0.0-1.0. Optional, set this to -1.0 if not used
      fHeight - Height of the character bounding box, in typographical points. Optional, set this to -1.0 if not used
      fWidth - Width of the character bounding box, in typographical points. Optional, set this to -1.0 if not used
      fXPosition - Distance of the left side of character bounding box from left side of document page, in typographical points. Optional, set this to -1.0 if not used
      fYPosition - Distance of the top of character bounding box from top of document page, in typographical points. Optional, set this to -1.0 if not used
      nPage - Page number on which character, starting from 0. Optional, set this to -1 if not used.