[egenix-users] PROBLEM: eGenix.com mxBASE beta 3

M.-A. Lemburg mal at lemburg.com
Sat Jul 27 12:41:42 CEST 2002


Pekka Niiranen wrote:
> Hi,
> 
>         I tried the latest beta 3 by:
> 
>         a) compiling it myself from sources and
>         b) installing from the precompiled package for python v2.2
> 
>         Of the scripts below only the script that uses Simpleparse returns
> anything.
>         The others run without errors, but return [].
> 
>         They all run OK with the beta 2 though.

If they did, then you've hit a bug in beta2. Here are the corrected
versions. Note that the problem was with the EOF handling. If AllNotIn
doesn't match at least one char it'll fail and using 0 as jne offset
causes the same effect as MatchFail.

#--- solution 1 starts (with limiting letters)---

from mx.TextTools import *

def test1():

     text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"

     tables = [] # used for recursion only

     tab = ('start',
            (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "(" then recurse
            (None,Is,')',+1,MatchOk), # If current character is ")" then stop or return from recursion
            (None,AllNotIn,'()',+1,'start'), # Search all characters except "(" and ")"
            (None,EOF,Here,MatchOk),
            'nesting',
            ('group',SubTable+AppendMatch,
             ((None,Is,'(',MatchFail,+1), # Since we have looked ahead, collect "(" -sign
              (None,SubTableInList, (tables,0)),  # Recurse
              )
             ),
            (None,Jump,To,'start')) # After recursion jump back to 'start'

     tables.append(tab) # Add tab to tables

     result, taglist, nextindex = tag(text,tab)
     print result, nextindex
     print taglist

#--- solution 2 starts  (without limiting letters) ---

def test2():

     text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"

     tab = ('start',
            (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is seen stop recursion
            (None, Is, '(', 'letters', +1),
            ('group', SubTable+AppendMatch, ThisTable), # Recurse
            (None, Skip, 1, MatchFail, 'start'), # Last character in recursion was ")" so jump over it back to 'start'
            'letters',
            (None, AllNotIn, '()', +1, 'start'),  # Collect all characters except "(" and ")"
            (None, EOF, Here, MatchOk),
            )

     result,taglist,nextindex = tag(text, tab)
     print result, nextindex
     print taglist

print 'Test 1:'
test1()
print

print 'Test 2:'
test2()
print


>         I am using Windows 2000 professional, Python 2.2.1 and Winpython
> v148.
> 
> -pekka-
> 
> 
> Pekka Niiranen wrote:
> 
> 
>>Thank you all for your help and inspiration! It is payback time ;)
>>
>>I have tried past two months to create parser that returns
>>strings limited by two different letters. The strings can be nested.
>>I considered recursive call of regular expression to be too slow
>>and decided to use mxTextTools 2.1 beta2 and the latest alpha of
>>Simpleparse 2.0.
>>
>>Below are three solutions I found.
>>Note that Simpleparse creates different tagtable as the "manually"
>>found.
>>
>>Further ideas to be implemented:
>>
>>1) Input of limiting letters as parameters (easy)
>>2) Unicode support
>>3) Test for equal amount of limiting letters before calling of parser
>>(will this speed up the solution ?)
>>4) Parsing one line at a time without looping thru lines of the text
>>with "while" or "for"
>>    (maybe "None, AllNotIn, '()\n'" )
>>
>>One development idea to mxTextTools:
>>
>>1) Instead of using list of tables to recurse, would it be possible to
>>use "global jump" to outside of current table ?
>>
>>--- solution 1 starts (with limiting letters)---
>>
>>from mx.TextTools import *
>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>tables = [] # used for recursion only
>>
>>tab = ('start',
>>       (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "("
>>then recurse
>>       (None,Is,')',+1,MatchOk), # If current character is ")" then stop
>>or return from recursion
>>       (None,AllNotIn,'()',0,'start'), # Search all characters except
>>"(" and ")"
>>       'nesting',
>>       ('group',SubTable+AppendMatch,((None,Is,'(',0,+1), # Since we
>>have looked ahead, collect "(" -sign
>>                                      (None,SubTableInList,
>>(tables,0)))), # Recurse
>>       (None,Jump,To,'start')) # After recursion jump back to 'start'
>>
>>tables.append(tab) # Add tab to tables
>>
>>if __name__ == '__main__':
>>
>>    result, taglist, nextindex = tag(text,tab)
>>    print taglist
>>
>>--- solution 1 ends ---
>>
>>--- solution 2 starts  (without limiting letters) ---
>>
>>from mx.TextTools import *
>>
>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>
>>tab = ('start',
>>       (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is
>>seen stop recursion
>>       (None, Is, '(', 'letters', +1),
>>       ('group', SubTable+AppendMatch, ThisTable), # Recurse
>>       (None, Skip, 1, 0, 'start'), # Last character in recursion was
>>")" so jump over it back to 'start'
>>       'letters',
>>       (None, AllNotIn, '()', 0, 'start')) # Collect all characters
>>except "(" and ")"
>>
>>result,taglist,next = tag(text, tab)
>>print taglist
>>
>>--- solution 2 ends ---
>>
>>--- solution 3 starts (Simpleparse solution) ---
>>
>>from simpleparse.parser import Parser
>>from mx.TextTools import *
>>
>>declaration = r'''
>>
>>>line<  := (a/match)+
>>
>>match   := '(', line, ')'
>><a>     := -[()]
>>'''
>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>
>>parser = Parser(declaration)
>>success, children, nextcharacter = parser.parse(text, production =
>>"line")
>>print_tags(text,children)
>>
>>--- solution 3 ends ---
>>
>>-pekka-
> 
> 
> 
> _______________________________________________________________________
> eGenix.com User Mailing List                     http://www.egenix.com/
> http://lists.egenix.com/mailman/listinfo/egenix-users


-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/




More information about the egenix-users mailing list