[egenix-users] PROBLEM: eGenix.com mxBASE beta 3

Mon Jul 29 23:23:33 CEST 2002

Fine,

but the line:

(None,EOF,Here,MatchOk)

will make text = "aa(AA" match too. If I analysed it correctly,
it is because EOF matches allways. Would it be possible
to add mxTextTool parameter that will make EOF cause failing if necessary ?

Something like: "if EOF is encountered here, fail the whole subgroup ?"

-pekka-

"M.-A. Lemburg" wrote:

> Pekka Niiranen wrote:
> > Hi,
> >
> >         I tried the latest beta 3 by:
> >
> >         a) compiling it myself from sources and
> >         b) installing from the precompiled package for python v2.2
> >
> >         Of the scripts below only the script that uses Simpleparse returns
> > anything.
> >         The others run without errors, but return [].
> >
> >         They all run OK with the beta 2 though.
>
> If they did, then you've hit a bug in beta2. Here are the corrected
> versions. Note that the problem was with the EOF handling. If AllNotIn
> doesn't match at least one char it'll fail and using 0 as jne offset
> causes the same effect as MatchFail.
>
> #--- solution 1 starts (with limiting letters)---
>
> from mx.TextTools import *
>
> def test1():
>
>      text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>
>      tables = [] # used for recursion only
>
>      tab = ('start',
>             (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "(" then recurse
>             (None,Is,')',+1,MatchOk), # If current character is ")" then stop or return from recursion
>             (None,AllNotIn,'()',+1,'start'), # Search all characters except "(" and ")"
>             (None,EOF,Here,MatchOk),
>             'nesting',
>             ('group',SubTable+AppendMatch,
>              ((None,Is,'(',MatchFail,+1), # Since we have looked ahead, collect "(" -sign
>               (None,SubTableInList, (tables,0)),  # Recurse
>               )
>              ),
>             (None,Jump,To,'start')) # After recursion jump back to 'start'
>
>      tables.append(tab) # Add tab to tables
>
>      result, taglist, nextindex = tag(text,tab)
>      print result, nextindex
>      print taglist
>
> #--- solution 2 starts  (without limiting letters) ---
>
> def test2():
>
>      text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>
>      tab = ('start',
>             (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is seen stop recursion
>             (None, Is, '(', 'letters', +1),
>             ('group', SubTable+AppendMatch, ThisTable), # Recurse
>             (None, Skip, 1, MatchFail, 'start'), # Last character in recursion was ")" so jump over it back to 'start'
>             'letters',
>             (None, AllNotIn, '()', +1, 'start'),  # Collect all characters except "(" and ")"
>             (None, EOF, Here, MatchOk),
>             )
>
>      result,taglist,nextindex = tag(text, tab)
>      print result, nextindex
>      print taglist
>
> print 'Test 1:'
> test1()
> print
>
> print 'Test 2:'
> test2()
> print
>
> >         I am using Windows 2000 professional, Python 2.2.1 and Winpython
> > v148.
> >
> > -pekka-
> >
> >
> > Pekka Niiranen wrote:
> >
> >
> >>Thank you all for your help and inspiration! It is payback time ;)
> >>
> >>I have tried past two months to create parser that returns
> >>strings limited by two different letters. The strings can be nested.
> >>I considered recursive call of regular expression to be too slow
> >>and decided to use mxTextTools 2.1 beta2 and the latest alpha of
> >>Simpleparse 2.0.
> >>
> >>Below are three solutions I found.
> >>Note that Simpleparse creates different tagtable as the "manually"
> >>found.
> >>
> >>Further ideas to be implemented:
> >>
> >>1) Input of limiting letters as parameters (easy)
> >>2) Unicode support
> >>3) Test for equal amount of limiting letters before calling of parser
> >>(will this speed up the solution ?)
> >>4) Parsing one line at a time without looping thru lines of the text
> >>with "while" or "for"
> >>    (maybe "None, AllNotIn, '()\n'" )
> >>
> >>One development idea to mxTextTools:
> >>
> >>1) Instead of using list of tables to recurse, would it be possible to
> >>use "global jump" to outside of current table ?
> >>
> >>--- solution 1 starts (with limiting letters)---
> >>
> >>from mx.TextTools import *
> >>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
> >>tables = [] # used for recursion only
> >>
> >>tab = ('start',
> >>       (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "("
> >>then recurse
> >>       (None,Is,')',+1,MatchOk), # If current character is ")" then stop
> >>or return from recursion
> >>       (None,AllNotIn,'()',0,'start'), # Search all characters except
> >>"(" and ")"
> >>       'nesting',
> >>       ('group',SubTable+AppendMatch,((None,Is,'(',0,+1), # Since we
> >>have looked ahead, collect "(" -sign
> >>                                      (None,SubTableInList,
> >>(tables,0)))), # Recurse
> >>       (None,Jump,To,'start')) # After recursion jump back to 'start'
> >>
> >>tables.append(tab) # Add tab to tables
> >>
> >>if __name__ == '__main__':
> >>
> >>    result, taglist, nextindex = tag(text,tab)
> >>    print taglist
> >>
> >>--- solution 1 ends ---
> >>
> >>--- solution 2 starts  (without limiting letters) ---
> >>
> >>from mx.TextTools import *
> >>
> >>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
> >>
> >>tab = ('start',
> >>       (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is
> >>seen stop recursion
> >>       (None, Is, '(', 'letters', +1),
> >>       ('group', SubTable+AppendMatch, ThisTable), # Recurse
> >>       (None, Skip, 1, 0, 'start'), # Last character in recursion was
> >>")" so jump over it back to 'start'
> >>       'letters',
> >>       (None, AllNotIn, '()', 0, 'start')) # Collect all characters
> >>except "(" and ")"
> >>
> >>result,taglist,next = tag(text, tab)
> >>print taglist
> >>
> >>--- solution 2 ends ---
> >>
> >>--- solution 3 starts (Simpleparse solution) ---
> >>
> >>from simpleparse.parser import Parser
> >>from mx.TextTools import *
> >>
> >>declaration = r'''
> >>
> >>>line<  := (a/match)+
> >>
> >>match   := '(', line, ')'
> >><a>     := -[()]
> >>'''
> >>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
> >>
> >>parser = Parser(declaration)
> >>success, children, nextcharacter = parser.parse(text, production =
> >>"line")
> >>print_tags(text,children)
> >>
> >>--- solution 3 ends ---
> >>
> >>-pekka-
> >
> >
> >
> > _______________________________________________________________________
> > eGenix.com User Mailing List                     http://www.egenix.com/
> > http://lists.egenix.com/mailman/listinfo/egenix-users
>
> --
> Marc-Andre Lemburg
> CEO eGenix.com Software GmbH
> _______________________________________________________________________
> eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
> Python Consulting:                               http://www.egenix.com/
> Python Software:                    http://www.egenix.com/files/python/
>
> _______________________________________________________________________
> eGenix.com User Mailing List                     http://www.egenix.com/
> http://lists.egenix.com/mailman/listinfo/egenix-users