ndarray的子类化相对简单,但是与其他Python对象相比,它具有一些复杂性。在此页面上,我们解释了允许您对ndarray进行子类化的机制,以及实现子类的含义。
视图转换是标准的ndarray机制,通过该机制,您可以获取任何子类的ndarray,并将该数组的视图作为另一个(指定的)子类返回:
>>> import numpy as np
>>> # create a completely useless ndarray subclass
>>> class C(np.ndarray): pass
>>> # create a standard ndarray
>>> arr = np.zeros((3,))
>>> # take a view of it, as our useless subclass
>>> c_arr = arr.view(C)
>>> type(c_arr)
<class 'C'>
当numpy发现需要从模板实例创建新实例时,ndarray子类的新实例也可以通过与View Cast非常相似的机制来实现。这种情况最明显的地方是在获取子类数组的切片时。例如:
>>> v = c_arr[1:]
>>> type(v) # the view is of type 'C'
<class 'C'>
>>> v is c_arr # but it's a new instance
False
切片是原始数据的视图c_arr
。因此,当我们从ndarray进行查看时,我们将返回一个相同类的新ndarray,它指向原始数据。
在使用ndarray时,还有其他需要我们提供此类视图的要点,例如复制array(c_arr.copy()
),创建ufunc输出数组(有关ufuncs和其他函数,另请参见__array_wrap__)以及简化方法(如)
c_arr.mean()
。
这些路径都使用相同的机器。我们在这里有所不同,因为它们导致您的方法输入不同。具体来说, 视图转换意味着您已经从ndarray的任何潜在子类中创建了数组类型的新实例。 通过模板创建new 意味着您已经从现有实例创建了类的新实例,例如,允许您跨子类特有的属性进行复制。
如果我们将ndarray子类化,则不仅需要处理数组类型的显式构造,还需要处理View强制转换或 从template创建new。NumPy拥有执行此操作的机制,并且这种使子类化成为稍微不标准的机制。
ndarray用于支持子类中的视图和new-from-template的机制有两个方面。
首先是使用该ndarray.__new__
方法进行对象初始化的主要工作,而不是使用更常用的__init__
方法。第二个方法是使用该__array_finalize__
方法允许子类在从模板创建视图和新实例之后进行清理。
__new__
和__init__
¶__new__
是标准的Python方法,如果存在,则在__init__
创建类实例之前被调用。有关更多详细信息,请参见python __new__文档。
例如,考虑以下Python代码:
class C(object):
def __new__(cls, *args):
print('Cls in __new__:', cls)
print('Args in __new__:', args)
# The `object` type __new__ method takes a single argument.
return object.__new__(cls)
def __init__(self, *args):
print('type(self) in __init__:', type(self))
print('Args in __init__:', args)
意味着我们得到:
>>> c = C('hello')
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
type(self) in __init__: <class 'C'>
Args in __init__: ('hello',)
当我们调用时C('hello')
,该__new__
方法将其自身的类作为第一个参数,并将传递的参数作为字符串
'hello'
。在python调用之后__new__
,它通常(如下所示)调用我们的__init__
方法,其输出__new__
作为第一个参数(现在是一个类实例),然后是传递的参数。
如您所见,对象可以在__new__
方法或方法中或在__init__
方法和/或方法两者中初始化,实际上ndarray没有__init__
方法,因为所有初始化都在__new__
方法中完成。
Why use __new__
rather than just the usual __init__
? Because
in some cases, as for ndarray, we want to be able to return an object
of some other class. Consider the following:
class D(C):
def __new__(cls, *args):
print('D cls is:', cls)
print('D args in __new__:', args)
return C.__new__(C, *args)
def __init__(self, *args):
# we never get here
print('In D __init__')
meaning that:
>>> obj = D('hello')
D cls is: <class 'D'>
D args in __new__: ('hello',)
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
>>> type(obj)
<class 'C'>
The definition of C
is the same as before, but for D
, the
__new__
method returns an instance of class C
rather than
D
. Note that the __init__
method of D
does not get
called. In general, when the __new__
method returns an object of
class other than the class in which it is defined, the __init__
method of that class is not called.
This is how subclasses of the ndarray class are able to return views that preserve the class type. When taking a view, the standard ndarray machinery creates the new ndarray object with something like:
obj = ndarray.__new__(subtype, shape, ...
where subdtype
is the subclass. Thus the returned view is of the
same class as the subclass, rather than being of class ndarray
.
That solves the problem of returning views of the same type, but now
we have a new problem. The machinery of ndarray can set the class
this way, in its standard methods for taking views, but the ndarray
__new__
method knows nothing of what we have done in our own
__new__
method in order to set attributes, and so on. (Aside -
why not call obj = subdtype.__new__(...
then? Because we may not
have a __new__
method with the same call signature).
__array_finalize__
¶__array_finalize__
is the mechanism that numpy provides to allow
subclasses to handle the various ways that new instances get created.
Remember that subclass instances can come about in these three ways:
explicit constructor call (obj = MySubClass(params)
). This will
call the usual sequence of MySubClass.__new__
then (if it exists)
MySubClass.__init__
.
Our MySubClass.__new__
method only gets called in the case of the
explicit constructor call, so we can’t rely on MySubClass.__new__
or
MySubClass.__init__
to deal with the view casting and
new-from-template. It turns out that MySubClass.__array_finalize__
does get called for all three methods of object creation, so this is
where our object creation housekeeping usually goes.
For the explicit constructor call, our subclass will need to create a
new ndarray instance of its own class. In practice this means that
we, the authors of the code, will need to make a call to
ndarray.__new__(MySubClass,...)
, a class-hierarchy prepared call to
super(MySubClass, cls).__new__(cls, ...)
, or do view casting of an
existing array (see below)
For view casting and new-from-template, the equivalent of
ndarray.__new__(MySubClass,...
is called, at the C level.
The arguments that __array_finalize__
receives differ for the three
methods of instance creation above.
The following code allows us to look at the call sequences and arguments:
import numpy as np
class C(np.ndarray):
def __new__(cls, *args, **kwargs):
print('In __new__ with class %s' % cls)
return super(C, cls).__new__(cls, *args, **kwargs)
def __init__(self, *args, **kwargs):
# in practice you probably will not need or want an __init__
# method for your subclass
print('In __init__ with class %s' % self.__class__)
def __array_finalize__(self, obj):
print('In array_finalize:')
print(' self type is %s' % type(self))
print(' obj type is %s' % type(obj))
Now:
>>> # Explicit constructor
>>> c = C((10,))
In __new__ with class <class 'C'>
In array_finalize:
self type is <class 'C'>
obj type is <type 'NoneType'>
In __init__ with class <class 'C'>
>>> # View casting
>>> a = np.arange(10)
>>> cast_a = a.view(C)
In array_finalize:
self type is <class 'C'>
obj type is <type 'numpy.ndarray'>
>>> # Slicing (example of new-from-template)
>>> cv = c[:1]
In array_finalize:
self type is <class 'C'>
obj type is <class 'C'>
The signature of __array_finalize__
is:
def __array_finalize__(self, obj):
One sees that the super
call, which goes to
ndarray.__new__
, passes __array_finalize__
the new object, of our
own class (self
) as well as the object from which the view has been
taken (obj
). As you can see from the output above, the self
is
always a newly created instance of our subclass, and the type of obj
differs for the three instance creation methods:
When called from the explicit constructor, obj
is None
When called from view casting, obj
can be an instance of any
subclass of ndarray, including our own.
When called in new-from-template, obj
is another instance of our
own subclass, that we might use to update the new self
instance.
Because __array_finalize__
is the only method that always sees new
instances being created, it is the sensible place to fill in instance
defaults for new object attributes, among other tasks.
This may be clearer with an example.
import numpy as np
class InfoArray(np.ndarray):
def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
strides=None, order=None, info=None):
# Create the ndarray instance of our type, given the usual
# ndarray input arguments. This will call the standard
# ndarray constructor, but return an object of our type.
# It also triggers a call to InfoArray.__array_finalize__
obj = super(InfoArray, subtype).__new__(subtype, shape, dtype,
buffer, offset, strides,
order)
# set the new 'info' attribute to the value passed
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# ``self`` is a new object resulting from
# ndarray.__new__(InfoArray, ...), therefore it only has
# attributes that the ndarray.__new__ constructor gave it -
# i.e. those of a standard ndarray.
#
# We could have got to the ndarray.__new__ call in 3 ways:
# From an explicit constructor - e.g. InfoArray():
# obj is None
# (we're in the middle of the InfoArray.__new__
# constructor, and self.info will be set when we return to
# InfoArray.__new__)
if obj is None: return
# From view casting - e.g arr.view(InfoArray):
# obj is arr
# (type(obj) can be InfoArray)
# From new-from-template - e.g infoarr[:3]
# type(obj) is InfoArray
#
# Note that it is here, rather than in the __new__ method,
# that we set the default value for 'info', because this
# method sees all creation of default objects - with the
# InfoArray.__new__ constructor, but also with
# arr.view(InfoArray).
self.info = getattr(obj, 'info', None)
# We do not need to return anything
Using the object looks like this:
>>> obj = InfoArray(shape=(3,)) # explicit constructor
>>> type(obj)
<class 'InfoArray'>
>>> obj.info is None
True
>>> obj = InfoArray(shape=(3,), info='information')
>>> obj.info
'information'
>>> v = obj[1:] # new-from-template - here - slicing
>>> type(v)
<class 'InfoArray'>
>>> v.info
'information'
>>> arr = np.arange(10)
>>> cast_arr = arr.view(InfoArray) # view casting
>>> type(cast_arr)
<class 'InfoArray'>
>>> cast_arr.info is None
True
该类不是很有用,因为它具有与裸ndarray对象相同的构造函数,包括传入缓冲区和形状等。我们可能希望构造函数能够从通常的numpy调用中获取已经形成的ndarray np.array
并返回一个对象。
这是一个使用已经存在的标准ndarray的类,将其强制转换为我们的类型,并添加一个额外的属性。
import numpy as np
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# see InfoArray.__array_finalize__ for comments
if obj is None: return
self.info = getattr(obj, 'info', None)
所以:
>>> arr = np.arange(5)
>>> obj = RealisticInfoArray(arr, info='information')
>>> type(obj)
<class 'RealisticInfoArray'>
>>> obj.info
'information'
>>> v = obj[1:]
>>> type(v)
<class 'RealisticInfoArray'>
>>> v.info
'information'
__array_ufunc__
对于ufuncs ¶1.13版中的新功能。
子类可以通过覆盖默认ndarray.__array_ufunc__
方法来覆盖在其上执行numpy ufuncs时发生的情况。执行此方法而不是执行ufunc,并且应返回操作结果或未NotImplemented
实现所请求的操作。
的签名__array_ufunc__
是:
def __array_ufunc__(ufunc, method, *inputs, **kwargs):
- *ufunc* is the ufunc object that was called.
- *method* is a string indicating how the Ufunc was called, either
``"__call__"`` to indicate it was called directly, or one of its
:ref:`methods<ufuncs.methods>`: ``"reduce"``, ``"accumulate"``,
``"reduceat"``, ``"outer"``, or ``"at"``.
- *inputs* is a tuple of the input arguments to the ``ufunc``
- *kwargs* contains any optional or keyword arguments passed to the
function. This includes any ``out`` arguments, which are always
contained in a tuple.
典型的实现将转换任何属于自己类的实例的输入或输出,使用将所有内容传递给超类
super()
,并在可能的反向转换之后最终返回结果。举例来说,来自测试案例采取
test_ufunc_override_with_super
在core/tests/test_umath.py
,如下。
input numpy as np
class A(np.ndarray):
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
args = []
in_no = []
for i, input_ in enumerate(inputs):
if isinstance(input_, A):
in_no.append(i)
args.append(input_.view(np.ndarray))
else:
args.append(input_)
outputs = kwargs.pop('out', None)
out_no = []
if outputs:
out_args = []
for j, output in enumerate(outputs):
if isinstance(output, A):
out_no.append(j)
out_args.append(output.view(np.ndarray))
else:
out_args.append(output)
kwargs['out'] = tuple(out_args)
else:
outputs = (None,) * ufunc.nout
info = {}
if in_no:
info['inputs'] = in_no
if out_no:
info['outputs'] = out_no
results = super(A, self).__array_ufunc__(ufunc, method,
*args, **kwargs)
if results is NotImplemented:
return NotImplemented
if method == 'at':
if isinstance(inputs[0], A):
inputs[0].info = info
return
if ufunc.nout == 1:
results = (results,)
results = tuple((np.asarray(result).view(A)
if output is None else output)
for result, output in zip(results, outputs))
if results and isinstance(results[0], A):
results[0].info = info
return results[0] if len(results) == 1 else results
因此,此类实际上并没有做任何有趣的事情:它只是将其自己的任何实例转换为常规ndarray(否则,我们将获得无限递归!),并添加了一个info
字典,该字典指示将其转换为哪些输入和输出。因此,例如
>>> a = np.arange(5.).view(A)
>>> b = np.sin(a)
>>> b.info
{'inputs': [0]}
>>> b = np.sin(np.arange(5.), out=(a,))
>>> b.info
{'outputs': [0]}
>>> a = np.arange(5.).view(A)
>>> b = np.ones(1).view(A)
>>> c = a + b
>>> c.info
{'inputs': [0, 1]}
>>> a += b
>>> a.info
{'inputs': [0, 1], 'outputs': [0]}
注意,另一种方法是使用而不是调用。对于此示例,结果将是相同的,但是如果另一个操作数也定义,则存在差异。例如,假设我们评估了
,其中是另一个具有重写的类的实例。如果在示例中使用,
将注意到带有覆盖,这意味着它本身无法评估结果。因此,它将返回NotImplemented,而我们的class 也将返回
。然后,控制权将传递给,后者要么知道如何处理我们并产生结果,要么不知道如何处理并返回NotImplemented,从而引发。getattr(ufunc,
methods)(*inputs, **kwargs)
super
__array_ufunc__
np.add(a, b)
b
B
super
ndarray.__array_ufunc__
b
A
b
TypeError
如果相反,我们将替换super
为,我们将有效地做到。再次,
将被调用,但现在它把另一个视为另一个参数。它很可能会知道如何处理,并向我们返回该类的新实例。我们的示例类未设置为处理此问题,但如果(例如)使用
重新实现,则它可能是最好的方法。getattr(ufunc, method)
np.add(a.view(np.ndarray), b)
B.__array_ufunc__
ndarray
B
MaskedArray
__array_ufunc__
最后一点:如果super
路由适合于给定的类,则使用它的好处是它有助于构造类层次结构。例如,假设我们的另一个类在其
实现中B
也使用,并且我们创建了一个依赖于两者的类,即(为简单起见,没有另一个
覆盖)。然后,实例上的所有ufunc都将传递到,调用将转到
,调用将转到
,从而允许并进行协作。super
__array_ufunc__
C
class C(A, B)
__array_ufunc__
C
A.__array_ufunc__
super
A
B.__array_ufunc__
super
B
ndarray.__array_ufunc__
A
B
__array_wrap__
用于ufuncs和其他功能¶在numpy 1.13之前,只能使用__array_wrap__
和调整ufuncs的行为
__array_prepare__
。这两个允许更改ufunc的输出类型,但是与相比__array_ufunc__
,不允许对ufunc
进行任何更改。希望最终弃用这些功能,但是__array_wrap__
其他numpy函数和方法(例如)也将使用它们squeeze
,因此,目前仍需要完整的功能。
从概念上讲,__array_wrap__
在允许子类设置返回值的类型并更新属性和元数据的意义上,“总结操作”。让我们用一个例子来说明这是如何工作的。首先,我们返回到更简单的示例子类,但是具有不同的名称和一些打印语句:
import numpy as np
class MySubClass(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
print('In __array_finalize__:')
print(' self is %s' % repr(self))
print(' obj is %s' % repr(obj))
if obj is None: return
self.info = getattr(obj, 'info', None)
def __array_wrap__(self, out_arr, context=None):
print('In __array_wrap__:')
print(' self is %s' % repr(self))
print(' arr is %s' % repr(out_arr))
# then just call the parent
return super(MySubClass, self).__array_wrap__(self, out_arr, context)
我们在新数组的实例上运行ufunc:
>>> obj = MySubClass(np.arange(5