Navigation:  PRS Essential Reference > Document Management > Document Search Tool >

Words and Phrases

Previous pageReturn to chapter overviewNext page

Here's how the PRS Document Indexer manages search text:



The Document Indexer doesn't distinguish between uppercase letters and lowercase letters. A search for HoLiDay will return all documents that contain the word holiday or Holiday.


Words and Punctuation

The Indexer treats every documents as a sequence of terms. A term in this context is any string of letters and digits delimited either by punctuation, non alphanumeric characters or white space (spaces, tabs, ends of lines).


To be a word, a string does not have to be spelled correctly or be included in any dictionary. All that is required is that someone typed it as a single word in a document. Thus, the following are words if they appear delimited in a document: 300ZX, 602e21, WWW, HTTP.


In some common constructs non alphanumeric characters are included in the term, the following examples are treated as single terms:





Leading a trailing punctuation is always stripped so that C++ and .NET are stored as c and net.



A phrase is a string of words that are contiguous in a document, although they may be separated by any amount of white space or punctuation. They do not have to make sense grammatically; they just have to occur in a document as a contiguous sequence of words. For example:


President of the U.S.A. (4-word phrase) (2-word phrase)