[egenix-users] PROBLEM: eGenix.com mxBASE beta 3

Pekka Niiranen krissepu at vip.fi
Fri Jul 26 22:58:31 CEST 2002


Hi,

        I tried the latest beta 3 by:

        a) compiling it myself from sources and
        b) installing from the precompiled package for python v2.2

        Of the scripts below only the script that uses Simpleparse returns
anything.
        The others run without errors, but return [].

        They all run OK with the beta 2 though.

        I am using Windows 2000 professional, Python 2.2.1 and Winpython
v148.

-pekka-


Pekka Niiranen wrote:

> Thank you all for your help and inspiration! It is payback time ;)
>
> I have tried past two months to create parser that returns
> strings limited by two different letters. The strings can be nested.
> I considered recursive call of regular expression to be too slow
> and decided to use mxTextTools 2.1 beta2 and the latest alpha of
> Simpleparse 2.0.
>
> Below are three solutions I found.
> Note that Simpleparse creates different tagtable as the "manually"
> found.
>
> Further ideas to be implemented:
>
> 1) Input of limiting letters as parameters (easy)
> 2) Unicode support
> 3) Test for equal amount of limiting letters before calling of parser
> (will this speed up the solution ?)
> 4) Parsing one line at a time without looping thru lines of the text
> with "while" or "for"
>     (maybe "None, AllNotIn, '()\n'" )
>
> One development idea to mxTextTools:
>
> 1) Instead of using list of tables to recurse, would it be possible to
> use "global jump" to outside of current table ?
>
> --- solution 1 starts (with limiting letters)---
>
> from mx.TextTools import *
> text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
> tables = [] # used for recursion only
>
> tab = ('start',
>        (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "("
> then recurse
>        (None,Is,')',+1,MatchOk), # If current character is ")" then stop
> or return from recursion
>        (None,AllNotIn,'()',0,'start'), # Search all characters except
> "(" and ")"
>        'nesting',
>        ('group',SubTable+AppendMatch,((None,Is,'(',0,+1), # Since we
> have looked ahead, collect "(" -sign
>                                       (None,SubTableInList,
> (tables,0)))), # Recurse
>        (None,Jump,To,'start')) # After recursion jump back to 'start'
>
> tables.append(tab) # Add tab to tables
>
> if __name__ == '__main__':
>
>     result, taglist, nextindex = tag(text,tab)
>     print taglist
>
> --- solution 1 ends ---
>
> --- solution 2 starts  (without limiting letters) ---
>
> from mx.TextTools import *
>
> text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>
> tab = ('start',
>        (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is
> seen stop recursion
>        (None, Is, '(', 'letters', +1),
>        ('group', SubTable+AppendMatch, ThisTable), # Recurse
>        (None, Skip, 1, 0, 'start'), # Last character in recursion was
> ")" so jump over it back to 'start'
>        'letters',
>        (None, AllNotIn, '()', 0, 'start')) # Collect all characters
> except "(" and ")"
>
> result,taglist,next = tag(text, tab)
> print taglist
>
> --- solution 2 ends ---
>
> --- solution 3 starts (Simpleparse solution) ---
>
> from simpleparse.parser import Parser
> from mx.TextTools import *
>
> declaration = r'''
> >line<  := (a/match)+
> match   := '(', line, ')'
> <a>     := -[()]
> '''
> text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>
> parser = Parser(declaration)
> success, children, nextcharacter = parser.parse(text, production =
> "line")
> print_tags(text,children)
>
> --- solution 3 ends ---
>
> -pekka-




More information about the egenix-users mailing list