[egenix-users] Re: mxTextTools issues

M.-A. Lemburg mal at egenix.com
Wed Jan 18 18:02:20 CET 2006


Scot Wilcoxon wrote:
> I tried to use mxTextTools in a pywikipedia bot (Python script for
> improving Wikipedia.org articles).  I thought I'd point out several
> issues I noticed.
> 
> 1. No co-nested patterns.  I needed to handle several patterns which
> could be nested within other patterns, for example [[...]] could contain
> {{...}} and vice versa.  Having parsing rules for those two patterns
> worked OK, but I could not seem to create a forward declaration so the
> parsing table which was declared first could refer to the second parsing
> table.  (Creating a null or simple table for just seemed to include that
> in the first table rather than being replaced with the second table when
> it was defined.)

You can have nesting by using nested tables. These then work
recursively, do rollback, etc.

> 2. Painful debugging.  I realize when mxTextTools was created that
> Python did not have the Logging module available.  It is hard to find
> how parsing is proceeding and figuring out why a parse pattern is not
> being recognized.   It doesn't help that the tree-printing function
> isn't documented nor how to invoke it (I had to figure it out from the
> source code).
>
> 3. Infinite loops.  Yes, I found the Charming Python jump_count loop
> detector.

mxTextTools uses a very low-level state machine to do the actual
parsing - that's why it's so fast, but also makes programming
the patterns a bit cumbersome. That's the price to pay for
performance, I guess.

If you want low-level tracing, you'll have to compile mxTextTools
with debugging enabled. If you then run Python with -d command
line flag, the module will create a log file mxTextTools.log
which has lots and lots of details.

Compiling the debug version is easy:

   python setup.py mx_autoconf --enable-debugging install

The 2.1.0 version of mxTextTools makes writing tag tables a
lot easier by supporting jump target strings and is fully
Unicode aware.

This is our latest snapshot:

http://www.egenix.com/files/python/egenix-mx-base-2.1.0-2005-05-01.zip

> 4. The documentation for WordStart and WordEnd at one point (I don't
> remember where) does not make the difference between them apparent.

WordStart leaves the head on the first char of the word, WordEnd
on the last.

> 5. Community.  This email address is the only apparent contact point.  A
> public forum of some sort might help make it apparent whether
> mxTextTools is a live or dead project, would allow people to help each
> other, and people with problems could search for similar previously
> solved problems.

egenix-users is our user mailing list, you can use that as
forum. We usually listen to what our users have to say :-)

The archives are also scanned by Google, so it's easy to do
searches.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 18 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::



More information about the egenix-users mailing list