BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News NumPy 1.20 Released with Runtime SIMD Support and Type Annotations

NumPy 1.20 Released with Runtime SIMD Support and Type Annotations

This item in japanese

The newly released NumPy 1.20 features performance and documentation improvements. Developers can now use type annotations for NumPy functions. A wider use of SIMD (Single Instruction, Multiple Data) instructions increases the execution speed of universal functions (ufunc). NumPy’s documentation additionally sees significant improvements.

The NumPy library code is now annotated with type information, a move facilitated by NumPy no longer supporting Python 2. One contributor explained the rationale behind the move as follows:

When we started numpy-stubs a few years ago, putting type annotations in NumPy itself seemed premature. We still supported Python 2, which meant that we would need to use awkward comments for type annotations.
Over the past few years, using type annotations has become increasingly popular, even in the scientific Python stack. For example, off-hand I know that at least SciPy, pandas, and xarray have at least part of their APIs type annotated. Even without annotations for shapes or dtypes, it would be valuable to have near-complete annotations for NumPy, the project at the bottom of the scientific stack.

Developers can additionally use new types — ArrayLike and DTypeLike. The ArrayLike type is used for objects that can be converted to arrays. The DTypeLike is used for objects that can be converted to dtypes. A data type object (numpy.dtype) specifies the content of the fixed-size block of memory corresponding to an array item and includes in particular information about the item data type (e.g., integer, float), size of the data, byte order (little-endian or big-endian), and more. The two new types empower the type checker to recognize inefficient patterns and warn the users. The documentation explains:

The DTypeLike type tries to avoid creation of dtype objects using dictionary of fields like below:

x = np.dtype({"field1": (float, 1), "field2": (int, 3)})

Although this is valid Numpy code, the type checker will complain about it, since its usage is discouraged.

The new numpy.typing module contains the new type aliases and can be imported at runtime:

from numpy.typing import ArrayLike
x: ArrayLike = [1, 2, 3, 4]

NumPy 1.20 also enables multi-platform SIMD compiler optimizations. NumPy is now able to detect the SIMD instructions made available by the CPU and optimize for them. Users can configure the runtime optimization behavior through several new build arguments. The --cpu-baseline argument is used to specify the minimal set of required optimizations. The --cpu-dispatch specifies the dispatched set of additional optimizations — with a default value of max -xop -fma4 that enables all CPU features, except for AMD legacy features. With --disable-optimization, users may opt out of the new improvements.

Using NumPy 1.20 entails upgrading to Python 3.7 or newer. With a view to improving NumPy’s online presence and friendliness to new users, the new NumPy release significantly improved its documentation — the release mentioned merging 185 related pull requests in what is an ongoing effort.

NumPy 1.20 is a large release with 684 pull requests contributed by 184 people merged. The full release notes are available online and include information about additional features and deprecations.

Some users have welcomed the new type annotations, not without making a comparison with Julia, an alternative dynamically-typed programming language aimed specifically at performant scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing. One user said on HackerNews:

The type annotation story is indeed better with Julia, but having type annotations for NumPy is beneficial for many users for whom Julia isn’t a win, where number crunching isn’t the main thing going on and Python’s better library situation is important and you want to avoid the complication of calling Python from Julia.

NumPy is an open-source Python library adding support for large, multi-dimensional, homogeneously-typed arrays, and matrices. NumPy includes a set of mathematical functions to create and transform these arrays, linear algebra routines, and more. NumPy is at the core of SciPy, a Python-based ecosystem of open-source software for mathematics, science, and engineering. NumPy allows data scientists to use a productive scripting language for data analysis tasks.

Rate this Article

Adoption
Style

BT