[egenix-users] Continuous searching of text thru all characters on line

M.-A. Lemburg mal at lemburg.com
Mon Jun 24 11:07:05 CEST 2002


Pekka Niiranen wrote:
> I am using the mxTextTool in mxBase 2.1.0b2.
> 
> I am parsing a line that may contain multiple non-overlapping matches:
> 
> ---- code starts ---
> import os
> import pprint
> from mx.TextTools import *
> 
> letter_set = set(alpha)
> linput = "aa?BB!aa?DD!aa"
> head_pos = None
> 
> 
> def pr(taglist,txt,l,r,subtag):
>     """Print matched string"""
>     print txt[l:r]
> 
> matchtable = ((pr, AllIn+CallTag, '?', +1),
>               (pr, AllInSet+CallTag, letter_set, +1),
>               (pr, AllIn+CallTag, '!', +1, MatchOk),
>               (None, Fail, Here)) #This is needed in order to avoid
> infinite loop
> 
> 
> tagtable = ((None, AllInSet, letter_set, +1),
>             ('m', Table+AppendMatch, matchtable),
>             (None, Table, ThisTable)) # Continue searching after first
> match on line.
> 
> 
> result,taglist,next = tag(linput, tagtable)
> print taglist
> print "-------"
> 
> ---- code ends ---
> 
> The problem is that "print taglist" returns only ['?BB!'] instead of
> ['?BB!', ?DD!']
> i.e the recursive call of tagtable is not added into taglist. However,
> as function pr
> reveals, ?DD! is found by mxTextTool.

The reason is that failing sub table matches restore the tag list
to what it was before recursion. You should remove the (None, Fail, Here)
and replace (pr, AllIn+CallTag, '!', +1, MatchOk) with
(pr, AllIn+CallTag, '!', MatchFail, MatchOk).

> Is it possible to add all the matched strings into a single table that
> does not subtables ?
> (not ['?BB!, [?DD!]])

Yes. The command SubTable does this for you.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/
Meet us at EuroPython 2002:                 http://www.europython.org/




More information about the egenix-users mailing list