[egenix-users] A quiestion from a mx.TextTools newbie

M.-A. Lemburg mal at egenix.com
Fri Mar 6 23:14:16 CET 2009


On 2009-03-06 19:21, Wenping Wang wrote:
> Hello Friends,
> 
> I'm new to mx.TextTools & am learning this powerful tool by reading David Mertz's excellent book "Text Processing in Python".  Naturally I tried the examples in the book.  One particular example concerning mx.TextTools is the one under "mx.TextTools.AppendTagobj" on Page 307.
> 
> Unfortunately, that example from David's book doesn't work.  I was able to tweak the code to make it work.  But I was puzzled why it doesn't work.  I attach the segment of codes here.
> 
> #--- example code starts
> from mx.TextTools import *
> 
> #--- 1st approach: failure
> words = (('word', AllIn+AppendTagobj, alpha),
> (None, AllIn, whitespace, MatchFail, -1))
> tag('this and that', words)
> 
> #--- 2nd approach: success
> words = (('word', AllIn, alpha),
> (None, AllIn, whitespace, +1),
> (None,EOF,Here,-2))
> tag('this and that', words)
> 
> #--- 3rd approach: success
> word = []
> def emiter(tl,txt,l,r,s):
>     word.append(txt[l:r])
> 
> words = ((emiter, AllIn+CallTag, alpha),
> (None, AllIn, whitespace, MatchFail, -1))
> tag('this and that', words)
> #--- example code ends
> 
> Can someone provide me some insight why the 1st approach doesn't work while the 2nd & 3rd work?  BTW, I'm using mx.TextTools 3.1.2 on Windows.

It fails because the tagging engine pointers moves beyond the
last 't' in the string and there is no EOF check in the tag
table.

AllIn will never succeed on EOF.

BTW: mxTextTools has support for jump labels, so it's usually
better to use those (the compiler will then convert these to offsets),
e.g.

from mx.TextTools import *

words = (
  'parse_word',
  ('word', AllIn, alpha, 'parse_whitespace'),
  'parse_whitespace',
  (None, AllIn, whitespace, 'test_for_eof', 'parse_word'),
  'test_for_eof',
  (None, EOF, Here, 'parse_word'),
  )

print tag('this and that', words)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 06 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/



More information about the egenix-users mailing list