Announcement for numarray

We have been working on a reimplementation of Numeric, the
numeric array manipulation extension module for Python. 
The reimplementation is virtually a complete rewrite.

While we think this version is not quite mature enough for
most to use in everyday projects, we are interested in
feedback on the user interface and the open issues mentioned
below. While we believe that performance is generally very
good for arrays larger then 100K elements, there are functions
and aspects of numarray which have not been optimized yet, so
please be careful about drawing conclusions about its efficiency
from limited test cases. We also welcome those that would like
to contribute to this effort by helping with the development,
or adding libraries.

This new version was developed for number of reasons.
To summarize, we regularly deal with large datasets and
the new version gives us the capabilities that we feel are
necessary for working with such datasets. In particular:

1) Avoiding promotion of array types in expressions involving
   Python scalars (e.g., 2.*<Float32 array> should not result
   in a Float64 array).
2) Ability to use memory mapped files [largely implemented].
3) Ability to access data in fields of arrays of records as
   numeric arrays without copying the data to a new array.
4) Ability to reference byteswapped data or non-aligned data
   (as might be found in record arrays) without producing new
   temporary arrays.
5) Reuse temporary arrays in expressions when possible
   [not implemented yet]
6) Provide more convenient use of index arrays (put and take).

A new implementation was decided upon since many of the
existing numpy developers agree that the existing implementation
is not suitable for massive changes and enhancements. Furthermore
Guido is very reluctant to accept Numeric into the Standard
Library unless it was rewritten.

This version has nearly the fully functionality of the basic
Numeric (only masked arrays and the convolve function are
missing, and the use of the ufunc methods reduce, accumulate,
and outer for complex types). No other libraries have been
adapted to to Numarray yet.

NUMARRAY IS NOT FULLY COMPATIBLE WITH NUMERIC. (But is very
similar in most respects). The incompatibilities will be
listed below. The C interface is completely different so
existing C extensions will not work with numarray.

Major Compatibility issues:

1) Coercion rules are different. Expressions involving
   scalars may not produce the same type of arrays
2) Types are represented by Type Objects rather than character
   codes (though the old character codes may still be used
   as arguments to the functions).
3) Arrays have no public attributes. Accessor functions must
   be used instead (e.g., to get shape for array x, one must
   use x.shape() instead of x.shape

There are more minor differences which we will try to catalog.

OPEN ISSUES:

There are still some open questions about the interface
which we hope to resolve with feedback from the Numeric 
commmunity. These issues are listed later. Depending on
what the consensus is, we may change aspects of the user
interface. It is not our intent to make the case for either
side. This should be done by the proponents of each view
in more detail on the numpy mailing list.

1) Currently a number of functions or methods (e.g., reduce)
   act on the first dimension rather than the last by default
   (as does Numeric). Some feel that it makes sense to always
   apply functions to the most rapidly varying dimension by 
   default and they are proposing that we change numarray to
   reflect this. Others say that the current behavior is more
   compatible with Python behavior, and that which dimension
   a function applies to depends on the function (reduce would
   apply to the first while FFT would apply to the last).
2) The ones and zeros functions should return Float64 arrays
   by default (currently Int32)
3) Complex comparisions should work, particularly for equality.
   (The current implementation does allow complex comparisions,
   even for <, >,  >=, etc.--only the real component is checked
   for those, both real and imaginary for =, !=.). Python
   does not permit complex comparisons for complex scalars.
   Allowing this avoids lots of special case code some say.
   But others say the incompatiblity with Python behavior is
   worse.
4) What is the necessary degree of optimization for small
   arrays? The current implementation has much of the code
   in Python. The performance on large arrays is generally 
   fast (or easy to optimize), but reducing the overhead for
   setting up array operations will require much more of the
   current implementation be rewritten in C or more complex
   algorithms in Python. Either approach involves significant
   work and a more difficult maintain module. We have not yet
   done any significant benchmarking lately, but we do have
   some order of magnitude estimates about relative performance.
   (The following numbers depend on the platform so do not take
   them too seriously). IDL is already at about half its
   asymptotic throughput for 200 element arrays. Numpy is at
   its half throughput for 2000 element arrays. We estimate
   that numarray is probably another order of magnitude worse,
   i.e., that 20K element arrays are at half the asymptotic 
   speed. How much should this be improved? We have some
   thoughts about other approaches that don't require this sort
   of optimization, but rather try casting iterative calculations
   on small arrays into a form that numarray can perform in 
   one call. These ideas will be elaborated on in a separate
   message.
5) The current implementation allows a user who messes with
   private attributes to crash Python. If they stick with
   the public interface, this should not be a problem. 
   Question: should it be bullet-proof enough so that it
   protects users from themselves?
6) We don't currently allow public array attributes to make
   the Python code simpler and faster (otherwise we will
   be forced to use __setattr__ and such). Is this change
   in interface acceptable.
7) What types are of highest importance to add: Int64, UInt32,
   Float128, Complex256.
8) What should the default be for behavior of index arrays with
   regard to negative indices? Out of range indices? What other
   options should be available.

New capabilities:

o Record arrays: arrays of records that have fixed format fields.
o Character arrays: arrays of fixed-length character strings.
o Support for memory mapped files.
o Use of index arrays within subscripts: e.g. 
  if ind = array([4, 4, 0, 2]) and x = 2*arange(6),
  x[inx] results in array([8, 8, 0, 4])
o Support for byteswapped representation.

Things yet to be done:

o Reimplement complex types in C.
o Masked Arrays
o More extensive C API; templates and examples for how to write
  C extensions.
o Add all the commonly available Numeric libraries to numarray.
o An array type for generic Python objects (various possibilities
  exist: one that requires them all to derive from a specified
  base class, and one that doesn't. Special purpose ones? (e.g.,
  one for Python strings?)
o Benchmarking and optimization. For example, a some of the
  functions could be reimplemented with much higher performance
  (e.g., array printing).

The early beta is available for download from ...

We are in the process of developing a revised version of the
Numeric manual for numarray (all in all, there is a great
deal of commonality).

We have developed a brief document summarizing the differences
between Numeric and numarray for use in the meantime.