[egenix-users] segfault in mxte_impl.h on tag()

M.-A. Lemburg mal at egenix.com
Tue Mar 16 11:02:19 CET 2010

M.-A. Lemburg wrote:
> M.-A. Lemburg wrote:
>> Ivan Beschastnikh wrote:
>>> Hi,
>>> I'm attempting to run tag() on a string that has 1314211 chars in it
>>> (~1.3MB). The same program works fine on other relatively large
>>> strings, although they probably don't have as bad a structure as this
>>> particular string.
>>> Running the python program with gdb I'm getting:
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0xb7d17d2f in mxTextTools_TaggingEngine (textobj=0xb6b99008,
>>> sliceleft=422374, sliceright=1314211, table=0x84c97f0,
>>> taglist=0x85aa5cc, context=0x0, next=0xbf6ca178) at
>>> mx/TextTools/mxTextTools/mxte_impl.h:62
>>> Is the string too long, or is something else the culprit here?
>>> Attached is tagit.tar.gz which includes tagit.py. You can run it to
>>> reproduce the segfault.
>> Thanks for reporting the problem. We will open a ticket for it
>> and try to reproduce it.
>> Regarding the size of the string: This is unlikely a problem, since
>> mxTextTools can easily handle several GB of text.
> So far, we've had one run with a segfault. All others (without
> shell limits) completed without problems.
> Unfortunately, that one run with the segfault did not generate
> a core dump, so we'll have to keep on trying.
> It's possible that the stack size limit is causing the core dump.

Rerunning the script with just the coredumpsize limit
set to unlimited results in an more or less immediate core
dump, so the stack size must be cause of the problem.

FWIW: The script runs through fine with a stack size of 32MB.
I haven't checked the complete stack depth at the time of
the core dump with an 8MB limit, but it was certainly more
than 5100 level deep.

Note that the tagging engine in mxTextTools does not apply
any of the stack checks that Python does. Perhaps it should
to give a more meaningful error message than a seg fault.

In any case, the input you are passing to the script opens
more than 40000 {{-braces without closing them, so it's
no surprise that you're hitting the stack limit while
searching for balanced braces.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Mar 16 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the egenix-users mailing list