NEP 30 — Duck typing for NumPy arrays - implementation#

Author:

Peter Andreas Entschev <pentschev@nvidia.com>

Author:

Stephan Hoyer <shoyer@google.com>

Status:

Superseded

Replaced-By:

NEP 56 — Array API standard support in NumPy’s main namespace

Type:

Standards Track

Created:

2019-07-31

Updated:

2019-07-31

Resolution:

https://mail.python.org/archives/list/numpy-discussion@python.org/message/Z6AA5CL47NHBNEPTFWYOTSUVSRDGHYPN/

Abstract#

We propose the __duckarray__ protocol, following the high-level overview described in NEP 22, allowing downstream libraries to return arrays of their defined types, in contrast to np.asarray, that coerces those array_like objects to NumPy arrays.

Detailed description#

NumPy’s API, including array definitions, is implemented and mimicked in countless other projects. By definition, many of those arrays are fairly similar in how they operate to the NumPy standard. The introduction of __array_function__ allowed dispatching of functions implemented by several of these projects directly via NumPy’s API. This introduces a new requirement, returning the NumPy-like array itself, rather than forcing a coercion into a pure NumPy array.

For the purpose above, NEP 22 introduced the concept of duck typing to NumPy arrays. The suggested solution described in the NEP allows libraries to avoid coercion of a NumPy-like array to a pure NumPy array where necessary, while still allowing that NumPy-like array libraries that do not wish to implement the protocol to coerce arrays to a pure NumPy array via np.asarray.

Usage Guidance#

Code that uses np.duckarray is meant for supporting other ndarray-like objects that “follow the NumPy API”. That is an ill-defined concept at the moment – every known library implements the NumPy API only partly, and many deviate intentionally in at least some minor ways. This cannot be easily remedied, so for users of np.duckarray we recommend the following strategy: check if the NumPy functionality used by the code that follows your use of np.duckarray is present in Dask, CuPy and Sparse. If so, it’s reasonable to expect any duck array to work here. If not, we suggest you indicate in your docstring what kinds of duck arrays are accepted, or what properties they need to have.

To exemplify the usage of duck arrays, suppose one wants to take the mean() of an array-like object arr. Using NumPy to achieve that, one could write np.asarray(arr).mean() to achieve the intended result. If arr is not a NumPy array, this would create an actual NumPy array in order to call .mean(). However, if the array is an object that is compliant with the NumPy API (either in full or partially) such as a CuPy, Sparse or a Dask array, then that copy would have been unnecessary. On the other hand, if one were to use the new __duckarray__ protocol: np.duckarray(arr).mean(), and arr is an object compliant with the NumPy API, it would simply be returned rather than coerced into a pure NumPy array, avoiding unnecessary copies and potential loss of performance.

Implementation#

The implementation idea is fairly straightforward, requiring a new function duckarray to be introduced in NumPy, and a new method __duckarray__ in NumPy-like array classes. The new __duckarray__ method shall return the downstream array-like object itself, such as the self object, while the __array__ method raises TypeError. Alternatively, the __array__ method could create an actual NumPy array and return that.

The new NumPy duckarray function can be implemented as follows:

def duckarray(array_like):
    if hasattr(array_like, '__duckarray__'):
        return array_like.__duckarray__()
    return np.asarray(array_like)

Example for a project implementing NumPy-like arrays#

Now consider a library that implements a NumPy-compatible array class called NumPyLikeArray, this class shall implement the methods described above, and a complete implementation would look like the following:

class NumPyLikeArray:
    def __duckarray__(self):
        return self

    def __array__(self):
        raise TypeError("NumPyLikeArray can not be converted to a NumPy "
                         "array. You may want to use np.duckarray() instead.")

The implementation above exemplifies the simplest case, but the overall idea is that libraries will implement a __duckarray__ method that returns the original object, and an __array__ method that either creates and returns an appropriate NumPy array, or raises a``TypeError`` to prevent unintentional use as an object in a NumPy array (if np.asarray is called on an arbitrary object that does not implement __array__, it will create a NumPy array scalar).

In case of existing libraries that don’t already implement __array__ but would like to use duck array typing, it is advised that they introduce both __array__ and``__duckarray__`` methods.

用法

下面是一个示例,说明如何__duckarray__使用协议来编写 stack基于 的函数及其产生的结果。concatenate选择此处的示例不仅是为了演示该duckarray函数的用法,也是为了演示其对 NumPy API 的依赖关系(通过检查数组的shape属性来演示)。请注意,该示例仅仅是 NumPy 在第一轴上实际实现的简化版本 stack,并且假设 Dask 已实现该__duckarray__方法。

def duckarray_stack(arrays):
    arrays = [np.duckarray(arr) for arr in arrays]

    shapes = {arr.shape for arr in arrays}
    if len(shapes) != 1:
        raise ValueError('all input arrays must have the same shape')

    expanded_arrays = [arr[np.newaxis, ...] for arr in arrays]
    return np.concatenate(expanded_arrays, axis=0)

dask_arr = dask.array.arange(10)
np_arr = np.arange(10)
np_like = list(range(10))

duckarray_stack((dask_arr, dask_arr))   # Returns dask.array
duckarray_stack((dask_arr, np_arr))     # Returns dask.array
duckarray_stack((dask_arr, np_like))    # Returns dask.array

相比之下,仅使用np.asarray(在撰写本 NEP 时,这是库开发人员用来确保数组类似于 NumPy 的常用方法)会产生不同的结果:

def asarray_stack(arrays):
    arrays = [np.asanyarray(arr) for arr in arrays]

    # The remaining implementation is the same as that of
    # ``duckarray_stack`` above

asarray_stack((dask_arr, dask_arr))     # Returns np.ndarray
asarray_stack((dask_arr, np_arr))       # Returns np.ndarray
asarray_stack((dask_arr, np_like))      # Returns np.ndarray

向后兼容性#

该提案不会在 NumPy 中引起任何向后兼容性问题,因为它仅引入了一个新函数。然而,选择引入该协议的下游库可能会选择删除通过或 函数__duckarray__将数组强制返回 NumPy 数组的能力,以防止将此类数组强制返回纯 NumPy 数组的意外影响(正如一些库已经这样做的那样,例如如 CuPy 和 Sparse),但仍然留下未实现该协议的库,可以选择利用将对象提升为纯 NumPy 数组。np.arraynp.asarraynp.duckarrayarray_like

之前的提案和讨论#

这里提出的鸭子类型协议在NEP 22中进行了高层描述 。

此外,在numpy/numpy #13831中进行了有关协议和相关提案的更长时间的讨论