My most used NumPy/SciPy functions

2013-01-04

I analyzed my recent Python scripts, and it appears that the most used NumPy/SciPy symbols are:

  1. asarray by a large margin
  2. linspace, three times less often than asarray
  3. mean
  4. dot
  5. sqrt
  6. roll
  7. hstack
  8. float32 (that's not a function, but a type)
  9. loadtxt
  10. linalg.norm
  11. arange (I use it four times less often than linspace)
  12. array (copy-by-default, much less used than asarray)
  13. where
  14. diff
  15. cumsum
  16. savetxt (two times less often than loadtxt)
  17. max
  18. cross
  19. sin
  20. ones

Overall, the most used functions

  • convert anything to array (mostly lists and nested lists)
  • do simple file input-output (loadtxt, savetxt)
  • construct new arrays from the existing blocks (hstack, roll...) and default arrays (ones, zeros, linspace, arange...)
  • do basic linear algebra (dot, cross, norm)
  • make common reductions (mean, cumsum, max, ...)
  • broadcast function application (sqrt, sin, ...)
  • define inter-element relations (roll, diff)

This may be useful to know when designing a custom array-like API (I still wish I had time to write a Clojure wrapper around Commons Math).

I counted only qualified and explicit imports ("import numpy as np", "from nump import foo"). I didn't count scripts which use from ... import *. That's too much work for grep and sed. I also didn't count use of array object's methods. This would require static analysis of Python code.

Do you know tools for such an analysis?