Search Features
Non-Quoted Search Strings
When a non-quoted search string is entered, the search algorithm implements both the stop word and stemming functions.
First, it determines if any of the terms are Stop Words for the appropriate language. If so, it ignores the term.
Secondly, the search mechanism implements the Stemming function.
Example:
- Search language:
English
- Search string:
by tomorrow night
- Matches:
- All English emails containing both (
tomorrow
ortomorrows
ortomorrow's
) and (night
ornightly
ornights
ornight's
), ignoring the stop wordby
- AND All language-unidentified emails containing the specific words
by
,tomorrow
andnight
Quoted Search Strings In Search Personal Mail
When a quoted search string is entered, Stop Words and Stemming are again invoked, but stop words behave slightly differently.
If the quoted search string includes a stop word, the stop word is replaced with a wildcard that matches any stop word.
Example:
- Search language:
English
- Search string:
"the financial report"
- Matches:
- All English emails containing "(any stop word)
financial
(stemming applied)report
(stemming applied)" - such asa financially reported
orthe financial reports
- AND All language-unidentified emails containing the exact phrase
the financial report
Quoted Search Strings In Search Company Archives
When using quoted search strings, neither Stop Words nor Stemming functions are applied to the string.
Example:
- Search string:
"by tomorrow night"
- Matches:
- All emails containing the exact phrase
by tomorrow night
in the search language specified plus the language-unidentified emails
To make searching more intuitive, the concept of stemming has been implemented. Stemming means that the search mechanism matches not only the exact term entered, but also returns matches based on the same ‘stem’ or root word of the search term(s).
Example:
Search string: finance mouse
- Matches:
finance mouse
,finance’s moused
andfinancing mouser
- But not:
financial mice
Stemming typically matches the stems to regular verbs, regular plurals and regular possessives.
Irregular formations are implemented… well… irregularly, so some experimentation may be needed to produce the desired search results. For example: mice
returns mouse
, but mouse
does not return mice
.
Depending on the language being searched, words are stemmed differently. Typically in English, stemming is implemented based on suffixes. In other languages, stemming may be based on prefixes, infixes and/or suffixes, or not performed at all.
NOTE
If Chinese or Japanese is chosen from the Search Languages menu, the Simple Query search performs a literal search on the entered terms.To make searching more efficient, the concept of stop words has been implemented. Stop words are words that are not indexed in the search database (just like all the instances of the word 'the' would not appear in the index of a book) which greatly improves search response times.
Below is the list of currently implemented stop words for English:
a |
are |
be |
for |
it |
or |
Stop words are not implemented for the Chinese, Japanese or Hebrew languages. Stop words for all implemented languages are listed in Appendix A: Stop Words.
The exact function of stop words depends on the Search Language selection. If a specific language is selected, then the stop words for that language are used when returning search results. Plus, search results are returned for all emails whose language cannot be determined which match all search terms.
Example:
- Search language:
English
- Search string:
by tomorrow
- Matches:
- All English emails containing
tomorrow
, ignoring the stop wordby
- AND All language-unidentified emails containing both the words
by
andtomorrow
If the search language is set to Any
, the search algorithm determines if any of the terms are stop words for any language, and if so, it ignores the term for emails tagged with that specific language.
Example:
- Search language:
Any
- Search string:
by tomorrow
- Matches:
- All English emails containing
tomorrow
, ignoring the stop wordby
- AND All emails in all other language, including language-unidentified emails, containing both the words
by
andtomorrow
All queries are performed within documents
To be considered a match, multi-term searches require that all search terms appear in the contents of one document,
A document is an:
- Email body and metadata
- Individual attachment (whether email message or file) and its metadata
Metadata includes: Subject, From, To, Date, Attachment name, etc.
For example:
- Two search terms are given, and
- one term appears only in the body of the email
- and the second term appears only in the attachment,
- that message would not be returned as a match, because the message body and the attachment are considered to be separate documents.
Another example:
- The search term specified is the To recipient’s user name (user_name@company.com) and one of the matched message has three attachments.
- Because the To recipient’s email address appears in the metadata for all attachments, that single message would return four matches, because the message body and the attachment are considered to be separate documents.