[egenix-users] segfault in mxte_impl.h on tag()

Ivan Beschastnikh ivan at cs.washington.edu
Tue Mar 16 08:05:57 CET 2010

Yes, it is an unfortunate string for the stack. Ideally the code would
throw an exception near the stack recursion depth limit, but I'll find
some other way to detect this. Thanks for the work on this.


On Tue, Mar 16, 2010 at 3:02 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> M.-A. Lemburg wrote:
>> M.-A. Lemburg wrote:
>>> Ivan Beschastnikh wrote:
>>>> Hi,
>>>> I'm attempting to run tag() on a string that has 1314211 chars in it
>>>> (~1.3MB). The same program works fine on other relatively large
>>>> strings, although they probably don't have as bad a structure as this
>>>> particular string.
>>>> Running the python program with gdb I'm getting:
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> 0xb7d17d2f in mxTextTools_TaggingEngine (textobj=0xb6b99008,
>>>> sliceleft=422374, sliceright=1314211, table=0x84c97f0,
>>>> taglist=0x85aa5cc, context=0x0, next=0xbf6ca178) at
>>>> mx/TextTools/mxTextTools/mxte_impl.h:62
>>>> Is the string too long, or is something else the culprit here?
>>>> Attached is tagit.tar.gz which includes tagit.py. You can run it to
>>>> reproduce the segfault.
>>> Thanks for reporting the problem. We will open a ticket for it
>>> and try to reproduce it.
>>> Regarding the size of the string: This is unlikely a problem, since
>>> mxTextTools can easily handle several GB of text.
>> So far, we've had one run with a segfault. All others (without
>> shell limits) completed without problems.
>> Unfortunately, that one run with the segfault did not generate
>> a core dump, so we'll have to keep on trying.
>> It's possible that the stack size limit is causing the core dump.
> Rerunning the script with just the coredumpsize limit
> set to unlimited results in an more or less immediate core
> dump, so the stack size must be cause of the problem.
> FWIW: The script runs through fine with a stack size of 32MB.
> I haven't checked the complete stack depth at the time of
> the core dump with an 8MB limit, but it was certainly more
> than 5100 level deep.
> Note that the tagging engine in mxTextTools does not apply
> any of the stack checks that Python does. Perhaps it should
> to give a more meaningful error message than a seg fault.
> In any case, the input you are passing to the script opens
> more than 40000 {{-braces without closing them, so it's
> no surprise that you're hitting the stack limit while
> searching for balanced braces.
> --
> Marc-Andre Lemburg
> eGenix.com
> Professional Python Services directly from the Source  (#1, Mar 16 2010)
>>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/

More information about the egenix-users mailing list