Python - Our key to Efficiency

Proposal: Decentralzing Numeric Argument Coercion


Version: 0.6
Introduction
When implementing numeric or other related operations, it is often desirable to provide not only operations between operands of one type only, e.g. integer + integer, but to generalize the idea behind the operation to other type combinations as well, e.g. integer + float.

A common approach to this mixed type situation is to provide a method of "lifting" the operands to a common type (coercion) and then use that type's operand method as execution mechanism. Yet, this strategy has a few drawbacks:

  • the "lifting" process creates at least one new (temporary) operand object,
  • since the coercion method is not being told about the operation that is to follow, it is not possible to implement operation specific coercion of types,
  • there is no elegant way to solve situations were a common type is not at hand, and
  • the coercion method will always have to be called prior to the operation's method itself.
  • A fix for this situation is obviously needed, since these drawbacks make implementations of types needing these features very cumbersome, if not impossible. As an example, have a look at the DateTime and DateTimeDelta types, the first being absolute, the second relative. You can always add a relative value to an absolute one, giving a new absolute value. Yet, there is no common type which the existing coercion mechanism could use to implement that operation.
    General Idea
    Instead of using a central coercion method, the process of handling different operand types is simply left to the operation. If the operation finds that it cannot handle the given operand type combination, it may return a special singleton as indicator.

    Note that "numbers" (anything that implements the number protocol, or part of it) written in Python already use the first part of this strategy - it is the C level API that we focus on here.

    Technical Implementation
    To maintain nearly 100% backward compatibility we have to be very careful to make numbers that don't know anything about the new strategy (old style numbers) work just as well as those that expect the new scheme (new style numbers). Furthermore, binary compatibility is a must, meaning that the interpreter may only access and use new style operations if the number indicates the availability of these.

    How does a new style number indicate that it is new style ?

    A new style number is considered by the interpreter as such if and only it the nb_coerce sub slot of the numeric slot is set to PY_NEWSTYLENUMBER. Old style numbers will either have set this slot to the address of a C function, or to NULL.

    How are new style number slots to be implemented ?

    The main difference between old style slots and new style ones is that the slot functions can no longer assume to be passed arguments of identical type. New style slots must check all arguments for proper type and implement the necessary conversions themselves. This may seem to cause more work on the behalf of the type implementor, but is in fact no more difficult than writing the same kind of routines for an old style coercion slot.

    If a new style slot finds that it cannot handle the passed argument type combination, it may return a new reference of the special singleton Py_NotImplemented to the caller. This will cause the caller to try the other operands operation slots until it finds a slot that does implement the operation for the specific type combination. If none of the possible slots succeed, it raises a TypeError.

    Here is a sample numeric slot (that of the standard integer object converted to new style):

    static PyNumberMethods int_as_number = {
     (binaryfunc)int_add, /*nb_add*/
     (binaryfunc)int_sub, /*nb_subtract*/
     (binaryfunc)int_mul, /*nb_multiply*/
     (binaryfunc)int_div, /*nb_divide*/
     (binaryfunc)int_mod, /*nb_remainder*/
     (binaryfunc)int_divmod, /*nb_divmod*/
     (ternaryfunc)int_pow, /*nb_power*/
     (unaryfunc)int_neg, /*nb_negative*/
     (unaryfunc)int_pos, /*nb_positive*/
     (unaryfunc)int_abs, /*nb_absolute*/
     (inquiry)int_nonzero, /*nb_nonzero*/
     (unaryfunc)int_invert, /*nb_invert*/
     (binaryfunc)int_lshift, /*nb_lshift*/
     (binaryfunc)int_rshift, /*nb_rshift*/
     (binaryfunc)int_and, /*nb_and*/
     (binaryfunc)int_xor, /*nb_xor*/
     (binaryfunc)int_or, /*nb_or*/
     PY_NEWSTYLENUMBER, /*nb_coerce*/
     (unaryfunc)int_int, /*nb_int*/
     (unaryfunc)int_long, /*nb_long*/
     (unaryfunc)int_float, /*nb_float*/
     (unaryfunc)int_oct, /*nb_oct*/
     (unaryfunc)int_hex, /*nb_hex*/

     /* New style slots: */
     (binaryfunc)int_cmp, /*nb_cmp*/
    };

    Only one new slot is currently necessary. The new slots are only accessed in case nb_coerce is set to PY_NEWSTYLENUMBER, thus retaining backward binary compatibility.

    How does the interpreter implement numeric operations ?

    To make the implementation easy to understand (the whole topic is esoteric enough), a new layer in the handling of numeric operations is introduced. This layer takes care of all the different cases that need to be taken into account when dealing with all the possible combinations of old and new style numbers. It is implemented by the two functions _PyNumber_BinaryOperation() and _PyNumber_TernaryOperation(), which both are internal functions that only the functions in Objects/abstract.c have access to. The numeric API (PyNumber_*) is easy to adapt to this new layer.

    As a side-effect all numeric slots can be NULL-checked (this has to be done anyway, so the added feature comes at no extra cost).

    The scheme used by the layer to execute a binary operation is as follows:
     

    Operand 1: 
    v
    Operand 2: 
    w
    Action taken
    new new v.op(v,w), w.op(v,w)
    new old v.op(v,w), coerce(v,w), v.op(v,w)
    old new w.op(v,w), coerce(v,w), v.op(v,w)
    old old coerce(v,w), v.op(v,w)
    Legend:
    new -- new style number
    old -- old style number

    The indicated action sequence is executed from left to right until either the operation succeeds and a valid result (!= Py_NotImplemented) is returned or an exception is raised. Exceptions are returned to the calling function as-is. If a slot returns Py_NotImplemented, the next item in the sequence is executed.

    Note that coerce(v,w) will use the old style nb_coerce slot methods via a call to PyNumber_Coerce().

    Ternary operations have a few more cases to handle:
     

    Operand 1: 
    v
    Operand 2: 
    w
    Operand 3: 
    z
    Action taken
    new new new v.op(v,w,z), w.op(v,w,z), z.op(v,w,z)
    new old new v.op(v,w,z), z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
    old new new w.op(v,w,z), z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
    old old new z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
    new new old v.op(v,w,z), w.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
    new old old v.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
    old new old w.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
    old old old coerce(v,w,z), v.op(v,w,z)
    The same notes as above, except that coerce(v,w,z) actually does:

        if z != Py_None:
            coerce(v,w), coerce(v,z), coerce(w,z)
        else:
            # treat z as absent variable
            coerce(v,w)

    The current implementation uses this scheme already (there's only one ternary slot: nb_pow(a,b,c)).

    Note that the numeric protocol is also used for some other related tasks, e.g. sequence concatenation. These can also benefit from the new mechanism by implementing right-hand operations for type combinations that would otherwise fail to work. As an example, take string concatenation: currently you can only do string + string. With the new mechanism, a new string-like type could implement new_type + string and string + new_type, even though strings don't know anything about new_type.

    What other changes are necessary to implement this new style scheme ?

    Since comparisons also rely on coercion (every time you compare an integer to a float, the integer is first converted to float and then compared...), a new slot to handle numeric comparisons is needed:

    PyObject *nb_cmp(PyObject *v, PyObject *w)
    This slot should compare the two objects and return an integer object stating the result. Currently, this result integer may only be -1, 0, 1. If the slot cannot handle the type combination, it may return a reference to Py_NotImplemented.
    This slot is still in flux since David Ascher's rich comparison proposal is supposed to work in this context too.

    Numeric comparisons are handled by a new numeric protocol API:

    PyObject *PyNumber_Compare(PyObject *v, PyObject *w)
    Compare the two objects as "numbers" and return an integer object stating the result. Currently, this result integer may only be -1, 0, 1. In case the operation cannot be handled by the given objects, a TypeError is raised.
    The PyObject_Compare() API needs to adjusted accordingly to make use of this new API.

    Other changes include adapting some of the built-in functions (e.g. cmp()) to use this API as well. Also, PyNumber_CoerceEx() will need to check for new style numbers before calling the nb_coerce slot. New style numbers don't provide a coercion slot and thus cannot be explicitly coerced.

    What does not work on new style numbers ?

    Obviously, since the coercion slot is gone, explicitly coercing new style numbers is not supported.

    What are the benefits of this proposal ?

    • Creation of temporary objects for mixed numeric operations is no longer necessary in most cases.
    • New types can implement operations that involve other types that do not know anything about the new type, e.g. integer * vector can now be implemented directly (by the vector type).
    • Numeric types can support type combinations only for certain operations, e.g. integer * vector works, while integer + vector doesn't.
    • Cleaner code, both in the type implementation and the abstract numeric layer.
    • A better understanding of what is going on...
    What about the four standard numeric types (integer, float, complex, long) ?

    All of them should be converted to the new style numbers, since it makes them faster (a tad) and more flexible.

    OK, I like the proposal, now who's going to write the code ?

    Most of the proposal is already implemented. All you have to do is download the patch and apply it (according to the included instructions) to a clean Python 1.5.1 source distribution. Please post any feedback either to me or the newsgroup.

    The patch includes a special feature that helps you while debugging new style numbers: if you compile the source with compiler switch '-DCOERCION_DEBUG', debug code will be enabled. Starting Python with switch '-d', e.g. 'python -d myscript.py', will then print lots of information about what the new style coercion mechanism is doing.

    Things that still need to be done:

    • convert complex and longs to new style
    • fix the comparison methods to enable David's proposal
    • fix the numeric slots of Python instance to make better use of the new operations
    • add the NotImplemented functionality to Python instances
    • clear up this document a little

    © 1998, Copyright by Marc-André Lemburg; All Rights Reserved. mailto: mal@lemburg.com