I don't like NumPy | slacker news

I skimmed the article and agree with it at a high level, though I haven't faced most of those problems personally.

I have several gripes with NumPy, or more broadly the notion of using Python to call a C/asm library that vectorizes math operations. A lot of people speak of NumPy like the solution to all your high-performance math needs in Python, which I think is disingenuous. The more custom logic you do, the less suitable the tools get. Pure Python numeric code is incredibly slow - like 1/30× compared to Java - and as you find parts that can't be vectorized, you have to drop back down to pure Python.

I would like to give the simple example of the sieve of Eratosthenes:

  def sieve_primeness(limit):
      result = [False] + [True] * limit
      for i in range(2, len(result)):
          if result[i]:
              for j in range(i * i, len(result), i):
                  result[j] = False

This code is scalar, and porting it to a language like C/C++/Rust/Java gives decent performance straight out of the box. Performance in Python is about 1/30× the speed, which is not acceptable.

People pretend that you can hand-wave the performance problem away by using NumPy. Please tell me how to vectorize this Python code. Go on, I'm waiting.

You can't vectorize the `if result[i]` because that controls whether the inner for-loop executes; so it must execute in pure Python. For the inner loop, you can vectorize that by creating a huge temporary array and then AND it with the result array, but that is extremely memory-inefficient compared to flipping bits of the result array in place, and probably messes up the cache too.

Alternatively, you can run the code in PyPy, but that usually gives a speed-up of maybe 3×, which is nowhere near enough to recover the 30× speed penalty.

Another example is that NumPy is not going to help you implement your own bigint library, because that also requires a lot of custom logic that executes between loop iterations. Meanwhile, I've implemented bigint in pure Java with acceptable performance and without issues.

Again, my point is that anytime you venture into custom numerical/logical code that is not the simple `C = A + B`, you enter a world of pain when it comes to Python performance or vectorization.