Query topics

Queries are useful when looking for specific terms within a document. Using Boolean logic, you can search for any type of speech pattern and extract exact phrases while ignoring everything else. Start with short queries at first and add on as you get a feel for how they work.

General Guidelines

  • Default max length: 1,500 characters (but may be changed - contact us)
  • Query cannot be empty
  • Operators must be CAPITALIZED
  • NEAR accepts values from 1 to 99 (e.g. NEAR/3)
  • Operators are always surrounded by words (e.g. coffee AND tea AND decaf)
  • Double-check opening and closing quotes and parentheses
  • Query terms cannot contain special characters
  • Special characters are: ! @ # $ % ^ ( ) _ - = ~ + [ ] { } ( ) | " ' : ; . , < > ? / 1 2 3 4 5 6 7 8 9 0 `
  • Using ? in comments of a query can cause issues with saving the query
  • Spaces are special characters
  • Terms containing special characters or phrases containing more than two words should be enclosed (escaped) in quotes (e.g. "#beautifulflowers", "customer service", "123 Ave Rosemont", "rendez-vous")
  • Queries can contain operators as a query term, but they must be enclosed in quotes

Operators

Note that operators must be capitalized, otherwise they will be treated as a query term. Query operators must also be preceded and followed by query terms or query phrases.

OR operator

Inside a query, the OR operator may be used to retrieve documents containing either of two terms.

Example:
onions OR cheese will detect "Onions make my eyes water", "My favorite cheese is cheddar", and "I want cheese and onions on my pizza".

AND operator

Inside a query, the AND operator may be used to retrieve documents containing both specified terms.

Example:
onions AND cheese will detect "I want cheese and onions on my pizza" or " I like cheese on my onion rings", but not "Onions make my eyes water" or "My favorite cheese is cheddar."

NEAR operator

A NEAR operator is effectively an AND operator where you can control the distance between the words. onions NEAR cheese means that the term cheese must exist within 10 words of onions. The default distance is 10 words, but you can vary the distance the NEAR operation uses by adding a number suffix such as onions NEAR/50 cheese, which means the onion must exist within 50 words of cheese. This window can be between 1 and 99.

Other examples include:
(onions OR bananas) NEAR/5 (cheese OR dinner) would tag "The banana split was included with dinner" and "The steak dinner with onions was my favorite." This query will not detect sentences like "The cheese platter on the dinner menu was superb" or "Bananas, strawberries, and ice cream are not a balanced dinner."

(onions NEAR/5 cheese) would tag a comment like "Do you want onions on top of your cheese?" but not "Their cheese is my favorite but only on the dish with caramelized onions."

🚧

Do not use the NEAR operator in the following fashions:

"onions NEAR/10 cheese" – this does nothing
onions "NEAR/10" cheese – this does nothing

ONEAR

An ONEAR operator works similarly to the NEAR operator while taking the word order into account. This lets you find instances of two words in a certain proximity, but they must appear in the order in which they're given.

NOTNEAR operator

A NOTNEAR operator is effectively a NOT operator where you can control the distance between the words. onions NOTNEAR cheese means that the term cheese cannot exist within 10 words of onions. The default distance is 10 words, but you can vary the distance the NOTNEAR operation uses by adding a number suffix such as onions NEAR/50 cheese, which means the onion cannot exist within 50 words of cheese. This window can be between 1 and 99.

WITH operator

A WITH operator requires that the two terms occur within the same sentence. As such, it is the same as a NEAR operator, with the exception that the match window between the two terms is not specified.

"onions WITH cheese" means that the term cheese must exist within the same sentence as onions.​

NOTWITH operator

A NOTWITH operator requires that the two terms cannot occur within the same sentence. As such, it is the same as a NOTNEAR operator, with the exception that the match window between the two terms is not specified.

"onions NOTWITH cheese" means that the term cheese cannot exist within the same sentence as onions.​

NOT operator

The NOT operator excludes any documents containing the term which follows it. onions NOT celery will return all uses of onion, excluding those that contain "celery." A query must contain at least one non-excluded term when using the NOT operator.

Example
onions NOT celery will detect "I like onions very much" but not "I like onions on my sandwich and celery on the side."

EXCLUDE operator

Two query terms of any type may be joined by an EXCLUDE operator, e.g. York EXCLUDE "New York". The effect is different than that of the NOT operator. The query will return documents with the word "York", excluding those that only contain occurrences of "New York".

Consider the following sample text:

I spent the day in York, visiting the magnificent cathedral. Then it was time to head back to London for my flight home to New York.

This text would generate the following results for the provided queries:
York NOT "New York": FALSE
York EXCLUDE "New York": TRUE

Terms and Phrases

Terms and phrases are query nodes that match literal words of a document. They can be combined with modifiers like wildcards to make them more flexible.

Terms

Terms are the simplest query node, consisting of a single word.

A query term cannot contain stopwords or special characters. Refer to the list of special characters in the general guidelines section.

To use stop words or special characters within a term, enclose it in double quotes. When a term is enclosed in double quotes, it is different than a phrase because it is treated like a single word.

Examples:

dog

""

"1337"

"c@$h"

Phrases

Phrases contain

Phrases must be enclosed in double quotes.

Examples:

"big dog"

"over the hill"

"a load of codswallop"

Single query terms are the simplest query element, consisting of a single word.

A query term cannot contain punctuation or other special characters like `! @ # $ % ^ ( ) _ = ~ + [ ] { } ( ) | " ' : ; . , < > ? / -

Phrases must be enclosed in double quotes. When a single word is enclosed in quotes, it is not treated as a phrase search: it is treated like a single word.

Parentheses

Queries can use parentheses to control the logic of the query and they may appear in any combination.

Two examples of queries with smart uses of parentheses are:
((onions OR cheese) AND celery) NOT horrible
(onions OR cheese) NEAR (horrible OR disgusting)

Every left parenthesis must have a corresponding right parenthesis. Queries can have nested parentheses up to 10 levels deep.

Wildcards

A wildcard character (*) may be used at the beginning or end of a query term or phrase. It allows the term/phrase to be tagged when preceeded or followed by any characters that are not whitespace.

Must be at the beginning or end of a term/phrase
Term/phrase must be at least a three letters in length
Prefix wildcard

*<term>

"*<phrase>"

Suffix wildcard

<term>*

"<phrase>*"

Combined prefix and suffix wildcard

*<term>*

"*<phrase>*"

For example:
excit* would match "excite", "exciting", "excitement", etc.

"running fast*" would match "running fast", "running faster", etc.

*mission would match "permission", "submission", "transmission", etc.

Nested Queries

Referencing a query is done by placing a carrot (^) at the beginning of a query name. It signals to the system to look for a query and use it in another query. For example, consider the following queries:

Dirty dirty OR filth OR disgust OR nasty
Bathroom bathroom OR toilet OR restroom OR lavatory**
Restaurant_Interior restaurant OR table OR chair OR carpet OR furniture OR plate OR cup**

Two queries can be combined to create a nested query.

For example:
Dirty Bathroom (^Dirty) AND (^Bathroom)

Query names being nested cannot contain spaces. Only the AND and OR operators function with nested queries.

Case Sensitivity

By default, query terms are handled in a case-insensitive manner. Case-sensitivity on a query term can be enforced using the ~ operator. ~Google NEAR/10 Microsoft will hit for the phrase "Both tech giants Microsoft and Google are investing heavily in mobile technologies" but it will not hit on the phrase "let me google that for you"

Stemming

By default, query terms are not stemmed. To stem query keywords, you must use the wildcard character *. Special characters may be used within query phrases if they are in quotations.

Correct Query:

Gepp OR Gunther OR Hasso OR "Hayden-Smith" OR Hirakubo OR Kanai OR Mathis OR Moeller OR "Nijssen_Smith" OR Sherman OR Shimizu OR "U'Ren" OR Daiji

Wrong Query:

Gepp OR Gunther OR Hasso OR Hayden-Smith OR Hirakubo OR Kanai OR Mathis OR Moeller OR Nijssen_Smith OR Sherman OR Shimizu OR U'Ren OR Daiji

Accents

If a query term is written without accents, the term will match text that has accents. For instance, if your query term is gate, you will also match the text gΓ’tΓ©.

If you have accents in your query terms, then only the exact form will be matched. For instance, if your query term is gΓ’tΓ©, you will not match gate.

Hashtags and mentions

If a query term is written without a leading hashtag or @ symbol, the query will match text with a leading hashtag or @ symbol. For example, if your query term is "nike" the query will match text of "#nike shoes" or "hey @nike I am just doing it."

Parts of speech tags

Querying terms that are used in a particular POS can be done by using an underscore (e.g, '_') between the term and the POS tag. For example: cook_VB will only match "cook" being used as a verb.

A list of POS tags can be found in the Tokenization and POS Tagging section of the documentation.

Queries referencing NLP features and metadata

Users can design queries that include references to NLP features and metadata. These references adhere to a different syntax and must be enclosed in curly brackets (i.e. {}). A query node referencing an NLP feature or metadata can exist on its own or be linked by Boolean operators to other query nodes.

❗️

Note:

  • Operators that reference NLP features or metadata do not follow the same syntax as our standard query language
    • Operators are lowercase and special characters take on different meanings than in the standard query syntax
  • Document metadata can be included in the metadata parameter in a call to queue documents
  • This syntax is contained within the scope of curly brackets (i.e. {})

Syntax for NLP features

The NLP features that can be referenced include entities and document sentiment.

Operators:

These operators can be used to include or exclude ranges of sentiment for entities and documents.

ComparisonDescription
<Less than
>Greater than

Entities:

The presence of an entity can be queried along with its assigned sentiment value. Including sentiment criteria is optional but allows you to restrict an entity match to a specific sentiment value range.

{entity <entity type> : sentiment <sentiment criteria>}
You must specify the NLP feature that you are querying, which is 'entity' in this case. Replace '<entity type>' with the entity type you want to match. This may be any of the entity types supported by entity extraction model, such as company, person, place, or product. User-defined and named entities are both matched. Sentiment is an optional parameter. You may include sentiment by replacing '<sentiment criteria>' with comparison operators and numerical values that you can arrange logically with the 'sentiment' keyword.

Examples of valid queries:

{entity Publication}
This will match on documents where at least one entity of type 'Publication' is present.

"merger announcement" NEAR/5 {entity company}
This will match on documents that contain the phrase "merger announcement" within five words of an entity of type 'company.'

{entity Publication: sentiment > 0.1}
This will match on documents that contain at least one entity of type 'Publication' that has a sentiment value greater than 0.1.

(critic OR rank OR poll) AND {entity Publication}
This will match on documents that contain the terms "critic", "rank", or "poll" along with at least one entity of type 'Publication.'

{entity Publication: -0.5 < sentiment < 0.2}
This will match on documents that contain at least one entity of type 'Publication' with sentiment a sentiment value between -0.5 and 0.2.

Document sentiment:

The assigned sentiment of a document can be queried .

{document: <sentiment criteria>}
To query for document sentiment, specify 'document' as the NLP feature and replace '<sentiment criteria>' and use the keyword 'sentiment' to do comparisons of document sentiment to single or ranges of numerical values.

❗️

NEAR and WITH operator behavior

When combining a document sentiment reference and a NEAR or WITH operator, the logic handles the document sentiment reference as if it were "everywhere" in document. If the document sentiment exists for a given text the condition for both operators is satisfied because the distance from a term or phrase or whether it is in the same sentence is undefined.

Here is an example.

Document text: We had an amazing time staying at the hotel.
Document sentiment: 0.4 (positive)

Query:hotel NEAR/3 {document: sentiment > 0.2}

This query would match on the document.

Examples of valid queries:

{document: sentiment > -0.2}
This will match documents that have a document sentiment value that is greater than -0.2.

{document: -0.5 < sentiment < 0.2}
This will match on documents that have a document sentiment value that is greater than -0.5 and less than 0.2.

(critic OR rank OR poll) AND {document: sentiment > 0.3}
This will match on documents that contain the terms "critic", "rank", or "poll" along with a document sentiment value greater than 0.3.

Syntax for sections

You can query the metadata of your document. Sections can be used to refer to any metadata fields included with your document.

❗️

Section name

Section name must be in lower case for the query to be valid.

{section <section name> <section criteria>}
To query for a section, use the 'section' keyword and replace '<section name>' with the name of the section. The name of the section must be in lower-case. Replace '<section criteria>' with the criteria you would like to match on.

❗️

NEAR and WITH operator behavior

When combining a section reference and a NEAR or WITH operator the logic handles the section reference as if it were "everywhere" in a document. If the section reference exists for a given text the condition for both operators is satisfied because the distance from a term or phrase or whether it is in the same sentence is undefined.

Here is an example.

Document text: We had an amazing time staying at the hotel.
Document metadata: satisfaction_score: 9

Query:hotel NEAR/3 {section satisfaction_score > 7}

This query would match on the document.

Operators:

The section field can include any Boolean statement using the following operators and functions:

Simple Arithmetic OperatorsDescription
+Addition
-Subtraction
*Multiplication
/Division
%Modulus

Note: If you have not defined variable types for the sections in document_model.dat, you will need to use type casting functions to use arithmetic operators. These are described in a table below.

Example queries:

{section int(NPS) * 2 == 1500}
This will match when a document has a value of 750 for a section NPS.

{section int(num_credits) - int(used_credits) > 0}
This will match on documents that have a difference greater than zero for sections num_credits and used_credits.

Boolean Logic OperatorsDescription
AND
&&
Logical AND
Either symbol can be used
OR
||
Logical OR
Either symbol can be used
NOTLogical NOT
&Bitwise AND
|Bitwise OR

Example queries:

{section (int(num_credits) - int(used_credits) > 0) || CX_SSI > 500}
This will match on documents that have a difference greater than 0 for sections num_credits and used_credits or a value greater than 500 in section CX_SSI.

{section "West" in Region NOT CX_SSI <= 300}
This will match on documents that have the substring "West" in section Region and a value greater than 300 for CX_SSI.

Comparison OperatorsDescription
==Equal-to
<=Less than or equal-to
>=Greater than or equal-to
<Less than
>Greater than
!=Not equal-to

Note:

  • With sections you cannot directly query for ranges of values like you can with entity sentiment and query sentiment
    • You must combine two comparisons of a section with logical AND to get the same effect.
  • Comparison operators infer type int or float based on context so using type casting functions is unnecessary

Example queries:

{section NPS >= 50}
This will match when a document has a value greater than or equal-to 50 for a section named NPS.

{section (NPS > 899 AND NPS < 1000)}
{section (NPS > 899 && NPS < 1000)}
These queries are identical and will match on documents that have a value in the range of 900 to 999 for the NPS section.

Type Casting FunctionsDescription
int(X)To integer
Casts type of section X to int
float(X)To float
Casts type of section X to float
string(X)To string
Casts type of section X to string

Example queries:

{section int(current_year) - int(birth_year) > 18}
This will match documents that have a difference greater than 18 for section current_year and birth_year.

{section float(price) * float(tax) <= 249.99}
This will match documents that have a product of less than or equal-to 249.99 for section price and tax.

Other Operators and FunctionsDescription
X ? Y : ZTernary
Evaluates to Z if Y is value of section X
to_lower(X)
casefold(X)
Casefolding
Folds the case of characters in Section X to lowercase
Either function can be used
Y in XSubstring
Evaluates to true if Y is a substring of section X
default(X, Y)Default value
Evaluates to value Y if section X is null

Example queries:

{section int(num_credits) ? 9999 : int(used_credits) < 500}
This query will match documents that have a value of less than 500 for used_credits on the condition that num_credits has a value of 9999.

{section "west" in casefold(region)}
This will match documents that have "west" as a substring with any variation of capitalization, such as West, WEST, or WeSt.

{section "west" in region}
For this query, the section name is 'region' and "west" is the substring we are looking for in that section. This will match on documents the contain the substring "west" in section region, such as "Northwest."

{section default(next_service_check_in, 1) < 2}
This will match documents that have a value that is less than 2 or null for next_service_check_in section.

Built-in variables

VariableDescription
document_lengthLength of document in tokens

Querying sections

Queries can be restricted to matching within a specified section. This requires the use of the operator IN and keyword channel.

The operator IN will attempt to match the preceding query node to the contents of the section specified after channel.

The keyword channel can be either the section name or alias of that section.

Syntax:

<query> IN {channel <channel name>}

Examples:

hello IN {channel message}
This will match on documents if the term "hello" exists in the section "conversation."

(hello AND "how are you") IN {channel message}
This will match on documents if the term "hello" and the phrase "how are you" exists in the section "conversation."

("cancel service" IN {channel caller}) AND ("offer discount service" IN {channel agent})
This will match on documents if the phrase "cancel service" exists in the section "caller" and the phrase "offer discount service" exists in the section "agent."

(^RequestHelpMacro IN {channel caller}) ONEAR/99 (^ProvideHelp IN {channel agent})
There are a couple of conditions that a document must meet to match on this query. First, the sub-query RequestHelpMacro must match on the contents of a section named "caller" or one that uses "caller" as an alias. Then, the sub-query ^ProvideHelp must match on the contents of a section named "agent" or one that uses "agent" as an alias. There is also the requirement of the ONEAR operator that the query node to the left must occur before the query node on the right and the match from each node be within 99 words from each other.

An alias is a way to represet one or more sections so that you address them in queries.

Note: When a section name is used as an alias for other sections a query will attempt to match on the section with that name and all sections with that alias.

This is useful for something like a call log where a conversation between two or more people takes place. Each person's side of the conversation can be queried independently.

Consider the example below, it represents a transcribed call log. Sections where the agent speaks use the alias "agent", sections where the caller is speaking use the alias "caller." This allows the sections to be queried independent of one another.

If we had the query:

order IN {channel caller}

This would match on the section 2, but not section 3.

"sections": [  
        {  
            "name": "agent-01",  
            "aliases" ["part-one", "agent"],  
            "value": "Hello, thank you for calling",  
            "section": 1,  
            "process_as_text": true,  
            "metadata": {  
                "start-secs": 0.0,  
                "end-secs": 4.5  
            }  
        },  
        {  
            "name": "caller-01",  
            "aliases" ["part-one", "caller"],  
            "value": "Hi, I'm calling about my order",  
            "section": 2,  
            "process_as_text": true,  
            "metadata": {  
                "start-secs": 4.6,  
                "end-secs": 10.2  
            }  
        },  
        {  
            "name": "agent-02",  
            "aliases" ["part-one", "agent"],  
            "value": "What's your order number",  
            "section": 3,  
            "process_as_text": true,  
            "metadata": {  
                "start-secs": 11.8,  
                "end-secs": 13.7  
            }  
        }
]

Scores

Query results will be accompanied by two scores, Query relevancy and query sentiment.

Query relevancy

Query Relevancy is a count of the query terms found within a document. It can be particularly effective in determining the effectiveness of your queries based on your text.

Consider the following text:

I have one cat and I used to have a dog too.

The query relevancy score for the query_ cat OR dog OR bird _will be 2 because the query detects two of the query terms.

Query sentiment

Query Sentiment is the sentiment for each query It is calculated by finding the query hits, finding sentiment terms near the hits, and averaging the score for all found terms.

Examples

The most important thing to keep in mind when creating queries is to keep them simple and organized. Here are some examples of queries that vary in complexity:

Germ

anti* OR bact* OR germ* OR "anti-bacterial"

This uses simple "OR" logic whiile incorporating the wildcard (*) to account for plural versions and typos/misspellings.

Internet Banking – Mobile Access

((internet OR online OR paperless) AND (bank*)) AND (mobile OR cell* OR phone* OR access*)

This is similar "OR" logic and wildcard usage like the last example. The AND operator requires the use of parentheses to keep the desired logic.

Price (Negative)

(pric* OR cost* OR fee* OR item*) AND (high OR expensive OR premium OR "so much" OR disappoint* OR spendy OR ("too" AND (high OR "much" OR expensive)) OR ("not" AND (good OR competitive* OR worth OR fair))) OR ("too expensive" OR "a little expensive")

Sometimes, customers have used two separate queries for a single term (i.e. instead of one query for price, there is one for Price (Positive) and one for Price (Negative)). A downside of this system is false positives/negatives can occur. For example, the comment "it has high quality and reasonable prices" would attach to Price (Positive) query and the Price (Negative) query, when it belongs with only the Price (Positive) query.

Price (Negative)

(pric* OR cost* OR fee* OR item*) AND (expensive OR premium OR "so much" OR disappoint* OR ("too" AND ("much" OR expensive)) OR ("not" AND (good OR competitive* OR worth OR fair))) OR (("too expensive" OR "a little expensive") AND (price* OR cost* OR fee* OR item*)) NEAR/8 (high OR courses)

To fix the problem above, we added an operator at the end of the query, removed "high", and added parentheses at the beginning and end of the original query. The "AND" and "NEAR/8" operators act to nullify the false negative by adding the qualification that high needs to be equal to or less than 8 characters from "price, cost, fee, or item".)