Change Log

mxTextTools Change Log

The change log includes a detailed description of all changes to this package in the recent releases.
Version: 3.2.1

Change from 3.0.0 to 3.1.0

  • Fixed a segfault when trying to unpickle a UnicodeTagTable. Reported by Andrew Dalke.

Changes from 2.0.3 to 3.0.0

Version 3.0.0 introduces full Unicode support to mxTextTools and the Tagging Engine which was implemented for one of our customers. As a result, a few things had to be restructured and modified. Hopefully, the new design decisions will provide more room for future enhancements.

The new version is expected to behave nearly 100% backward compatible to previous versions. If needed, aliases or factory functions were provided to maintain interface compatibility.

  • Moved command constant definitions from to the C extension.

  • Restructured tag commands and their numbering so that low-level commands come before the special ones. Old tag tables need to be "recompiled" due to this change !

  • Added a Tag-Table compiler. The tagging engine will now only work with compiled TagTables.

  • Restored Python 1.5.2 compatibility (all Unicode usages are optional).

  • Made TE polymorph w/r underlying datatype and created two versions: one for unsigned char and one for Py_UNICODE.

  • Wrote Tag-Table cache support.

  • tag() now accepts keyword arguments.

  • Merged BMS and FS into a new TextSearch object. The used algorithm is now an argument to this single object constructor.

  • Passing an unknown search object type to the TE is now an error.

  • Nearly all instances where a SystemError could have been raised now raise an mxTextTools.Error instead.

  • Removed support for buffer-compatible input objects. This will probably be reintegrated in some future release.

  • Added new AllInCharSet and IsInCharSet commands.

  • Implemented Unicode support in search objects using a trivial algorithm. Translation is not supported for Unicode.

  • Added a huge set of regression tests for all the C APIs and the Tagging Engine.

  • Fixed a bug in the strip APIs which caused a core dump in situations where the complete string contents would have been stripped. Thanks to Jeffrey Chang for finding this one.

  • Fixed a bug in the handling of SubTable: the subtaglist entries of the tag table entries pointed recursively to the taglist containing them. This was updated to the documented behaviour of using None for the subtaglist entries.

  • Added support for a context object which is passed along while processing a tag table with the Tagging Engine.

  • Added more type casts to the C code to make some pedantic compilers happy (eg. the Mac OS X one).

  • Fixed a bug found by Simon Cusack in the example.

  • Fixed a bug in tagdict(). Thanks to Joel Rosdahl for reporting this.

  • Fixed a bug in tag() which caused the IsNot/IsNotIn commands to scan beyond the end of the text slice (eventually causing a segfault). Thanks to Reinhard Engel for reporting this.

  • Added test to the Tag Table Compiler to check for empty match strings. These are no longer allowed for low-level commands (which wouldn't match in such a case anyway). This allows the Tagging Engine to run faster, since it doesn't have to check for this case anymore.

Changes from 2.0.2 to 2.0.3:

  • Added isascii().

Changes from 2.0.0 to 2.0.2:

  • Fixed a bug in the example. Thanks to Michael Husmann for finding this one.

  • Fixed a memory leak in the CallTag processing.