[egenix-users] BeeDict memory usage

M.-A. Lemburg mal at lemburg.com
Tue Sep 17 11:59:45 CEST 2002


M.-A. Lemburg wrote:
> Daniel Naber wrote:
> 
>> On Monday 16 September 2002 17:53, you wrote:
>>
>>
>>>> index that calls free_cache() on every 50th file gets less matches
>>>> when searching (yes, the call to free_cache() is really the only
>>>> difference in the program).
>>>
>>>
>>> That's strange indeed. Can you come up with a short demo which
>>> displays the problem ?
>>
>>
>>
>> Okay, this is not very short, as it seems you need a certain amount of 
>> data to trigger the problem. Call this script like this:
>>
>> ./FullText2.py /data/bigindex/test/ widget
>>
>> The first parameter is a directory, the second one a search term. Then 
>> look for "####" in the script and comment in the free_cache() call and 
>> run the script again with the same parameters and you should get less 
>> matches when free_cache is called, and the data files are also 
>> smaller. If it doesn't work I can send you an archive of about 30 HTML 
>> files that let you reproduce the problem .
> 
> 
> Thanks for the script. I can reproduce the problem here, but
> still don't understand what is causing it. The table index size
> is the same in both cases, the file sizes differs.
> 
> This could relate to the way you store the data: using dictionaries
> of lists as values in the BeeDict. I'll have to investigate this
> some more.

Ok, I've tracked down the problem.

There are two things to watch out for:

1. When modifying mutable values in place you have to explicitly
    reassign the dictionary item after all modifications have taken
    place. This is necessary to mark the item as modified so that
    a subsequent .commit() can write it back to the on-disk version,
    e.g.
    # get value
    listvalue = d['key']
    # modify in place
    listvalue.append(1)
    # mark as modified
    d['key'] = listvalue

2. You should call .commit() before calling .free_cache() in order
    to free up more memory. .free_cache() will otherwise only remove
    items from the in-memory cache which have not been marked
    modified. Since you are mostly adding new items in your script,
    almost all entries are marked as modified, so the effect without
    .commit() is minimal.

In the egenix-mx-base 2.1 final release I'll add a new parameter
maxcachesize to BeeDicts which lets you tune the cache size
on a per-object basis.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/




More information about the egenix-users mailing list