[egenix-users] PROBLEM: eGenix.com mxBASE beta 3

M.-A. Lemburg mal at lemburg.com
Mon Jul 29 22:37:21 CEST 2002


Pekka Niiranen wrote:
> Fine,
> 
> but the line:
> 
> (None,EOF,Here,MatchOk)
> 
> will make text = "aa(AA" match too. If I analysed it correctly,
> it is because EOF matches allways. Would it be possible
> to add mxTextTool parameter that will make EOF cause failing if necessary ?
> 
> Something like: "if EOF is encountered here, fail the whole subgroup ?"

EOF only matches iff the head position is beyond the right slice
of the text slice being processed. If you need balanced parens,
you should rewrite the tab tables to have the nesting table match
both the opening and the closing paren.

> -pekka-
> 
> 
> "M.-A. Lemburg" wrote:
> 
> 
>>Pekka Niiranen wrote:
>>
>>>Hi,
>>>
>>>        I tried the latest beta 3 by:
>>>
>>>        a) compiling it myself from sources and
>>>        b) installing from the precompiled package for python v2.2
>>>
>>>        Of the scripts below only the script that uses Simpleparse returns
>>>anything.
>>>        The others run without errors, but return [].
>>>
>>>        They all run OK with the beta 2 though.
>>
>>If they did, then you've hit a bug in beta2. Here are the corrected
>>versions. Note that the problem was with the EOF handling. If AllNotIn
>>doesn't match at least one char it'll fail and using 0 as jne offset
>>causes the same effect as MatchFail.
>>
>>#--- solution 1 starts (with limiting letters)---
>>
>>from mx.TextTools import *
>>
>>def test1():
>>
>>     text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>
>>     tables = [] # used for recursion only
>>
>>     tab = ('start',
>>            (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "(" then recurse
>>            (None,Is,')',+1,MatchOk), # If current character is ")" then stop or return from recursion
>>            (None,AllNotIn,'()',+1,'start'), # Search all characters except "(" and ")"
>>            (None,EOF,Here,MatchOk),
>>            'nesting',
>>            ('group',SubTable+AppendMatch,
>>             ((None,Is,'(',MatchFail,+1), # Since we have looked ahead, collect "(" -sign
>>              (None,SubTableInList, (tables,0)),  # Recurse
>>              )
>>             ),
>>            (None,Jump,To,'start')) # After recursion jump back to 'start'
>>
>>     tables.append(tab) # Add tab to tables
>>
>>     result, taglist, nextindex = tag(text,tab)
>>     print result, nextindex
>>     print taglist
>>
>>#--- solution 2 starts  (without limiting letters) ---
>>
>>def test2():
>>
>>     text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>
>>     tab = ('start',
>>            (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is seen stop recursion
>>            (None, Is, '(', 'letters', +1),
>>            ('group', SubTable+AppendMatch, ThisTable), # Recurse
>>            (None, Skip, 1, MatchFail, 'start'), # Last character in recursion was ")" so jump over it back to 'start'
>>            'letters',
>>            (None, AllNotIn, '()', +1, 'start'),  # Collect all characters except "(" and ")"
>>            (None, EOF, Here, MatchOk),
>>            )
>>
>>     result,taglist,nextindex = tag(text, tab)
>>     print result, nextindex
>>     print taglist
>>
>>print 'Test 1:'
>>test1()
>>print
>>
>>print 'Test 2:'
>>test2()
>>print
>>
>>
>>>        I am using Windows 2000 professional, Python 2.2.1 and Winpython
>>>v148.
>>>
>>>-pekka-
>>>
>>>
>>>Pekka Niiranen wrote:
>>>
>>>
>>>
>>>>Thank you all for your help and inspiration! It is payback time ;)
>>>>
>>>>I have tried past two months to create parser that returns
>>>>strings limited by two different letters. The strings can be nested.
>>>>I considered recursive call of regular expression to be too slow
>>>>and decided to use mxTextTools 2.1 beta2 and the latest alpha of
>>>>Simpleparse 2.0.
>>>>
>>>>Below are three solutions I found.
>>>>Note that Simpleparse creates different tagtable as the "manually"
>>>>found.
>>>>
>>>>Further ideas to be implemented:
>>>>
>>>>1) Input of limiting letters as parameters (easy)
>>>>2) Unicode support
>>>>3) Test for equal amount of limiting letters before calling of parser
>>>>(will this speed up the solution ?)
>>>>4) Parsing one line at a time without looping thru lines of the text
>>>>with "while" or "for"
>>>>   (maybe "None, AllNotIn, '()\n'" )
>>>>
>>>>One development idea to mxTextTools:
>>>>
>>>>1) Instead of using list of tables to recurse, would it be possible to
>>>>use "global jump" to outside of current table ?
>>>>
>>>>--- solution 1 starts (with limiting letters)---
>>>>
>>>
>>>>from mx.TextTools import *
>>>
>>>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>>>tables = [] # used for recursion only
>>>>
>>>>tab = ('start',
>>>>      (None,Is+LookAhead,'(',+1,'nesting'), # If next character is "("
>>>>then recurse
>>>>      (None,Is,')',+1,MatchOk), # If current character is ")" then stop
>>>>or return from recursion
>>>>      (None,AllNotIn,'()',0,'start'), # Search all characters except
>>>>"(" and ")"
>>>>      'nesting',
>>>>      ('group',SubTable+AppendMatch,((None,Is,'(',0,+1), # Since we
>>>>have looked ahead, collect "(" -sign
>>>>                                     (None,SubTableInList,
>>>>(tables,0)))), # Recurse
>>>>      (None,Jump,To,'start')) # After recursion jump back to 'start'
>>>>
>>>>tables.append(tab) # Add tab to tables
>>>>
>>>>if __name__ == '__main__':
>>>>
>>>>   result, taglist, nextindex = tag(text,tab)
>>>>   print taglist
>>>>
>>>>--- solution 1 ends ---
>>>>
>>>>--- solution 2 starts  (without limiting letters) ---
>>>>
>>>
>>>>from mx.TextTools import *
>>>
>>>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>>>
>>>>tab = ('start',
>>>>      (None, Is+LookAhead, ')', +1, MatchOk), # When character ")" is
>>>>seen stop recursion
>>>>      (None, Is, '(', 'letters', +1),
>>>>      ('group', SubTable+AppendMatch, ThisTable), # Recurse
>>>>      (None, Skip, 1, 0, 'start'), # Last character in recursion was
>>>>")" so jump over it back to 'start'
>>>>      'letters',
>>>>      (None, AllNotIn, '()', 0, 'start')) # Collect all characters
>>>>except "(" and ")"
>>>>
>>>>result,taglist,next = tag(text, tab)
>>>>print taglist
>>>>
>>>>--- solution 2 ends ---
>>>>
>>>>--- solution 3 starts (Simpleparse solution) ---
>>>>
>>>
>>>>from simpleparse.parser import Parser
>>>>from mx.TextTools import *
>>>
>>>>declaration = r'''
>>>>
>>>>
>>>>>line<  := (a/match)+
>>>>
>>>>match   := '(', line, ')'
>>>><a>     := -[()]
>>>>'''
>>>>text = "aa(AA)a((BB))aa((CC)DD)aa(EE(FF))aa(GG(HH(II)JJ)KK)aa"
>>>>
>>>>parser = Parser(declaration)
>>>>success, children, nextcharacter = parser.parse(text, production =
>>>>"line")
>>>>print_tags(text,children)
>>>>
>>>>--- solution 3 ends ---
>>>>
>>>>-pekka-
>>>
>>>
>>>
>>>_______________________________________________________________________
>>>eGenix.com User Mailing List                     http://www.egenix.com/
>>>http://lists.egenix.com/mailman/listinfo/egenix-users
>>
>>--
>>Marc-Andre Lemburg
>>CEO eGenix.com Software GmbH
>>_______________________________________________________________________
>>eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
>>Python Consulting:                               http://www.egenix.com/
>>Python Software:                    http://www.egenix.com/files/python/
>>
>>_______________________________________________________________________
>>eGenix.com User Mailing List                     http://www.egenix.com/
>>http://lists.egenix.com/mailman/listinfo/egenix-users
> 
> 
> 
> _______________________________________________________________________
> eGenix.com User Mailing List                     http://www.egenix.com/
> http://lists.egenix.com/mailman/listinfo/egenix-users

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/




More information about the egenix-users mailing list