[egenix-users] BeeDict memory usage

Tue Sep 17 21:36:18 CEST 2002

Daniel Naber wrote:
> On Tuesday 17 September 2002 10:59, you wrote:
> 
> 
>>Ok, I've tracked down the problem.
>>
>>There are two things to watch out for:
> 
> 
> That helps, thanks! Indexing now needs 25% of the memory it used to need, 
> but it's also 4 times as slow - but this had to happen I guess. I wonder 
> how search engines like htdig can have such a fast indexing. It's probably 
> because they have somehow heavily optimized their data structures for 
> full-text indexing.

I think that the solution is to use a specialized key between
the on-disk dictionary and the indexer -- often used terms
should probably be kept in this cache and only written to disk
at the very end.

The fact that you can subclass the BeeDict class should help
with this: you can easily implement your own caching strategy,
e.g. for indexing you don't need .rollback transaction support,
so a priority queue driven cache strategy probably fits better.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/