[egenix-users] Data truncation by Microsoft ODBC driver for NVARCHAR

Wed Sep 27 11:02:09 CEST 2017

Hi Jan,

I'm glad this fixes your problem. We've seen several such issues
with ODBC drivers before.

In general, using NATIVE_UNICODE_STRINGFORMAT
is the best way to go, since this causes least surprises (and
raises errors early in Python when e.g. sending data to the
database which cannot be properly encoded).

The only reason, we're not making this the default is
backwards compatibility.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 27 2017)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
                      http://www.malemburg.com/

On 27.09.2017 09:43, Jan Murre wrote:
> Hi Marc-Andre,
> 
> The situation is that the NVARCHAR(30) field is filled with this string
> "GUARANTEE 2 mobilhomes côte à ". That is 30 characters.
> When it comes back from a cursor.fetchall() the resulting string is:
> 
> "GUARANTEE 2 mobilhomes c\xc3\xb4te \xc3".
> 
> So, the utf-8 chars are in the results, but the results has been wrongly
> truncated to 30 bytes.
> 
> I tried with the "NATIVE_UNICODE_STRINGFORMAT" and that gives me correct
> results!!
> I thought I tried this already, but maybe not in the right combination
> with other params.
> I am glad this fixes our problem!
> 
> However, the truncation maybe is still some kind of bug in the driver.
> The driver being used is the native MS SQL 13.0 (and also 13.1) driver
> for Linux (RHEL).
> 
> Regards, Jan
> 
> 
> 
> 
> On Tue, Sep 26, 2017 at 11:06 PM, M.-A. Lemburg <mal at egenix.com
> <mailto:mal at egenix.com>> wrote:
> 
>     Hi Jan,
> 
>     just to clarify: you have the field filled with 29 characters
>     and the last one is a character which needs two bytes UTF-8
>     representation ?
> 
>     You may want to try to set the connection's .stringformat
>     to NATIVE_UNICODE_STRINGFORMAT. This will result in mxODBC
>     requesting data as Unicode.
> 
>     However, please note that your specific case may also be
>     a bug in the driver, since these often use UTF-8 strings
>     internally to store Unicode data and then "forget" to
>     adjust the buffer lengths to accommodate for the increase
>     in size when the strings have multi-byte representations.
> 
>     If you could provide an example and specific driver and database
>     versions, we can try to replicate the problem.
> 
>     Thanks,
>     --
>     Marc-Andre Lemburg
>     eGenix.com
> 
>     Professional Python Services directly from the Experts (#1, Sep 26 2017)
>     >>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>     >>> Python Database Interfaces ...           http://products.egenix.com/
>     >>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
>     ________________________________________________________________________
> 
>     ::: We implement business ideas - efficiently in both time and costs :::
> 
>        eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>         D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>                Registered at Amtsgericht Duesseldorf: HRB 46611
>                    http://www.egenix.com/company/contact/
>     <http://www.egenix.com/company/contact/>
>                           http://www.malemburg.com/
> 
> 
>     On 26.09.2017 15:08, Jan Murre wrote:
>     > Hi,
>     >
>     > I am query-ing a MS SQL database from Redhat Linux using the
>     "Microsoft
>     > ODBC Driver 13 for SQL Server".
>     >
>     > There is a NVARCHAR(30) field in our database that is filled with data
>     > having a 2-byte utf-8 char on the last position. When query-ing,
>     the ODBC
>     > driver issues this warning:
>     >
>     > mx.ODBC.Error.Warning: ('01004', 0, '[Microsoft][ODBC Driver 13
>     for SQL
>     > Server]String data, right truncation', 8668)
>     >
>     > This results in corrupted data in the resultsset, because only the
>     first
>     > byte of this 2-byte utf-8 char is in the column.
>     >
>     > I tried with serveral settings for 'connection.encoding' and
>     > 'connections.stringformat', but without success.
>     >
>     > Is this an ODBC driver issue? Would it be possible to work around
>     this with
>     > certain settings of mxODBC?
>     >
>     > Regards, Jan
>     >
>     >
>     >
>     >
>     >
>     _______________________________________________________________________
>     > eGenix.com User Mailing List                   
>      http://www.egenix.com/
>     > https://www.egenix.com/mailman/listinfo/egenix-users
>     <https://www.egenix.com/mailman/listinfo/egenix-users>
>     >
> 
>