Version (2011-07-20)

Bug fixes
  • FIXED: Updated to Python Markdown 2.0.3 in Windows version which resolves not recognizing abbreviated HTTPS linkes (<https:...>).

Version (2009-03-11)

Bug fixes
  • FIXED: No files were indexed when using the Win32 COM library if the index directory was set to a drive root (top level) directory.

Version (2009-02-12)

Bug fixes
  • Fixed error that occurs when using a format string that contains MP3 fields that were not present in the ID3 tags.

  • Fixed Hit object has no attribute pathname error when searching using Windows COM library.

Version (2008-12-20)

Additions and changes
Bug fixes
  • FIXED: Blank Microsoft Office 2007 .docx files generated errors.

Version (2008-12-08)

Additions and changes
  • Added parser for Microsoft Office 2007 .docx files.

  • Wrote simpler odt parser eliminating dependency.

Version (2007-12-28)

Additions and changes
  • Added Utils.Markdown() method to COM server — this method converts Markdown formatted text to XHTML.

  • Added parser to index Open Office ODT files using the Markdown in Python project’s utility.

Version (2007-11-15)

Bug fixes
  • Fixed an ascii encoding error that occurs when directing docsearch output to file.

Version (2007-11-04)

Additions and changes
  • Docindexer now handles unicode (the previous release was only comfortable with ascii).

  • The docsearch command’s searchdir argument is now optional (defaults to to the current directory if not specified).

  • Added -a,--and option to the docsearch command — it sets the default conjunction operator to AND.

  • The indexes directory has been renamed to .docindexer. If a directory with the deprecated name _docindex_ exists then it will be used instead.

Bug fixes
  • Resolved problems with non-ascii file names.

  • Fixed error parsing PDF and Word files with apostrophes in the file name on POSIX platforms.

Version (2007-09-06)

  • Ported DocIndexer to Python 2.5; PythonWin32 build 210; py2exe 0.6.6; PyLucene 2.1.0; antiword 0.36; pdftotext 3.02; Inno Setup 5.1.12.

  • Rewrote docindex and docsearch utilities — lots of new options.

  • PyLucene has resulted in much faster performance and the query language is richer than the previous Lupy based version.

  • Changes and additions to COM API:

    • Renamed docindexer OpenIndex() files argument to includes.

    • Added excludes argument to docindexer OpenIndex() method.

    • Optional optimize argument for indexer CloseIndex() method has been moved to the OpenIndex() method.

    • New BuildIndex() method.

    • Renamed searcher BoolSearch() method to QuerySearch() (BoolSearch() deprecated but retained for backward compatibilty).

    • New indexer BytesIndexed read-only attribute.

    • New searcher ParsedQuery and TotalHits read-only attributes.

    • The indexer LogFile attribute has been deprecated.

    • The log file now defaults to no log file.

  • Changed license from GPL to MIT.

Version (2006-09-15)

  • Phrases can now be included in multi-term search queries (previously only words could be included in multi-term queries). Likewise the BoolSearch COM method now accepts phrase terms.

  • Non-space word separators are handled correctly. For example "" is internally translated to the phrase "www python org"

  • The query syntax has been relaxed: spaces following + and - prefixes are automatically removed and spaces that should precede + and - prefixes are automatically added.

  • Added the QueryString method to docindexer.searcher — it returns the most recently parsed document search query.

Version (2005-07-27)

  • FIXED: Error introduced in version caused by converting Windows temporary directory to short file name when saving and executing antiword.bat batch file.

Version (2005-04-13)

  • Converted Windows temporary directory to short file name when saving and executing antiword.bat batch file.

  • FIXED: Indexing would fail when encountering a directory without read access permissions. DocIndexer now silently skips directories without read access.

Version (2004-12-03)

  • Added docindexer.utils COM class containing the TextContent() function for retrieving the text from a document file.

  • Added optional logfile parameter to OpenIndex() to allow caller to override default log file path.

Version (2004-08-04)

  • Microsoft Word documents are now indexed using Antiword which is over twice as fast as the previous MS Word based parser. The old MS Word based parser is still included but is not enabled by default.

  • Added -c, --config option to docindex which prints configuration information.

Version (2003-10-27)

  • Fixed LookupError: no codec search functions registered: can’t find encoding when running the compiled docsearch.exe under Windows.

Version (2003-10-22)

  • no longer throws an error when listing file names containing non-ASCII characters.

Version (2003-10-10)

  • Fixed PDF parser failure on Win9x platforms (included w9xpopen.exe in binary distribution).

  • Log file finish time is now correct (was previously printing the start time).

Version (2003-10-09)

  • Added PDF parser.

  • Word documents which open dirty no longer prompt to be saved after they are indexed.

Version (2003-10-05)

  • Added --analyze option to docindex utility which analyzes a file’s contents and prints analyzed text terms. Usage: --analyze filename
  • Added --logfile=FILE option to docindex utility to log console messages.

  • Parsers are now more sensible — parser instances are used instead of parser classes.

  • Documentation updates and additions.

  • main.aap script now updates CHANGLOG.txt release date correctly.

  • Added icons to docindex.exe and docsearch.exe utilities.

  • Added setup.cfg to add version information to py2exe generated exe’s.

  • Unexpected parser errors are now handled (the file is skipped and processing continues).

Version (2003-09-30)

  • Added LogFile property to the COM server Indexer which returns the path name of the log file.

  • Found files names are now delivered ranked by search score by COM server.

  • docsearch now lists found files names ranked by search score.

  • docsearch displays found file search score and index date.

Version (2003-09-25)

  • Added modular parser architecture so new file types can be easily added.

  • Added cross platform native Python HTML and text file parsers (no longer uses Windows dependent MS Word and much faster).

  • The DocIndexer COM server now writes a log file called log.txt in the server’s install directory. The log contains a transcript of the most recent indexing run.

  • Exposed IndexedCount and SkippedCount properties from COM server.

  • Added dryrun argument to COM OpenIndex method.

  • By default the indexer attempts to index all files, skipping those that it is unable to parse.

  • Optimized MS Word parser more than doubling the performance.

  • MS Word parser faster and less prone to stop for user input.

  • MS Word temporary files now skipped.

Version (2003-09-19)

  • Documentation tidy ups.

  • build script enhancements.

Version 0.1 (2003-09-16)

  • Initial release.