[egenix-users] UCS4 vs UCS2 in Linux

M.-A. Lemburg mal at egenix.com
Thu Feb 25 10:10:40 CET 2010

Baiju M wrote:
> Hi,
>     Our application only require support latin-1 character.
> Do you reccomend to use UCS2 version of mxODBCZopeDA for Linux ?
> If we use UCS4, does it going to use more memory ?

The Zope DA 1.0 defaults to using 8-bit strings for text interaction
with the database. For 2.0, we are going to add an option to use
Unicode or a mixed setup as well.

UCS2 vs. UCS4 only affects Unicode objects, so if you don't use
Unicode, there's not a lot of difference between the two builds.

If you do intend to go with Unicode in the future, the question of
memory consumption and conversion performance arises. Both are
only relevant if you store lots of text in Unicode.

UCS2 uses 2 bytes to store a single character, UCS4 4 bytes.
Most ODBC drivers on Linux use 2 byte characters for Unicode,
so if you use UCS4, the Zope DA would have to convert between
Python's UCS4 and the driver's UCS2.

Note that Python's default is UCS2 (on all platforms), because
it was found to be a good compromise between memory consumption
and flexibility.

The glibc on Linux chose to go with UCS4, which is why many
Linux distributions use UCS4 as default Python build. The BSDs
are yet undecided: FreeBSD now ships with UCS4 as default build.
Mac OS X uses UCS2.

UCS4 makes things easier for Asian scripts, since these sometimes
use characters outside the UCS2 range. In such a case, UCS2 has
to use 2 storage units to store a single characters (the so-called
surrogates). UCS4 avoids this, since a single storage unit can
handle all Unicode characters.

However, it's not the all-inclusive "doesn't break when sliced"
solution that some propagate it as. Just like UCS2, UCS4 has the
same problems with slicing when it comes to combining characters,
e.g. an "e" combined with a "´" to form "é".

Hope that provides some guidelines.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 25 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the egenix-users mailing list