Array abstraction in Clojure

2013-01-08

A recent post by Mike Anderson motivated me to write down my own ideas about an ideal multi-dimensional array API in Clojure:

https://gist.github.com/4477420

Tl;dr:

  • numpy.ndarray is a success, we need something similar: a homogeneous, fixed-size, multi-dimensional array,
  • arrays as views is a cool idea,
  • we need a generic multi-dimensional array API first (without linear algebra and statistics); the objective is to make any array-like output consumable by any Clojure library,
  • proposed core API: get (random access), shape (=> vector), slice (=> arrays as views), reshape and index-map (index space transforms), map (lifting scalar functions), stack (composing arrays => arrays as views), apply-dim (dimensionality reduction), copy (force views, make contiguous mutable copies), to-ndarray (conversion from other collections, no-op conversion from another array); I suppose the core API to be a protocol (except for copy and to-ndarray),
  • candidates for the core API: element type query and conversion, search (argmin, argmax, etc), zero-copying sorting (argsort), partial and complete reductions,
  • utility functions: all the high-level operations expressed through the core API operations (transpose, flatten, cyclic-shift, etc), input-output functions (loadtxt,savetxt, imread, imwrite),
  • some implementations may provide linear algebra, statistics, image and signal processing algorithms; but they are optional (not part of the core API); it's a good idea to standardize on them too.

The proposed core API is not orthogonal. Strictly speaking, reshape is not orthogonal to index-map, but it's easier to implement as a separate function; apply-dim is redundant if we have slice and map, but I suppose some implementations may optimize it; slice can be merged with get; copy and to-ndarray can be merged too, no-op conversion may be moved to utilities.

Open questions:

  • what is the best choice for the "default" array implementations (what do copy and to-ndarray _usually_ produce?); ideally it should be able to accomodate any Clojure number as well as user-defined types like Complex.
  • if some functionality goes to utility functions (like transpose implemented via index-map), how do optimized array implementations overload them?