I don't like NumPy

(dynomight.net)

488 points MinimalAction | 1 comments | 15 May 25 16:05 UTC | HN request time: 0.201s | source

Show context

nayuki ◴[15 May 25 18:52 UTC] No.43998105[source]▶

I skimmed the article and agree with it at a high level, though I haven't faced most of those problems personally.

I have several gripes with NumPy, or more broadly the notion of using Python to call a C/asm library that vectorizes math operations. A lot of people speak of NumPy like the solution to all your high-performance math needs in Python, which I think is disingenuous. The more custom logic you do, the less suitable the tools get. Pure Python numeric code is incredibly slow - like 1/30× compared to Java - and as you find parts that can't be vectorized, you have to drop back down to pure Python.

I would like to give the simple example of the sieve of Eratosthenes:

  def sieve_primeness(limit):
      result = [False] + [True] * limit
      for i in range(2, len(result)):
          if result[i]:
              for j in range(i * i, len(result), i):
                  result[j] = False

This code is scalar, and porting it to a language like C/C++/Rust/Java gives decent performance straight out of the box. Performance in Python is about 1/30× the speed, which is not acceptable.

People pretend that you can hand-wave the performance problem away by using NumPy. Please tell me how to vectorize this Python code. Go on, I'm waiting.

You can't vectorize the `if result[i]` because that controls whether the inner for-loop executes; so it must execute in pure Python. For the inner loop, you can vectorize that by creating a huge temporary array and then AND it with the result array, but that is extremely memory-inefficient compared to flipping bits of the result array in place, and probably messes up the cache too.

Alternatively, you can run the code in PyPy, but that usually gives a speed-up of maybe 3×, which is nowhere near enough to recover the 30× speed penalty.

Another example is that NumPy is not going to help you implement your own bigint library, because that also requires a lot of custom logic that executes between loop iterations. Meanwhile, I've implemented bigint in pure Java with acceptable performance and without issues.

Again, my point is that anytime you venture into custom numerical/logical code that is not the simple `C = A + B`, you enter a world of pain when it comes to Python performance or vectorization.

replies(7): >>43998235 #>>43998289 #>>43998291 #>>43998312 #>>43998476 #>>43998592 #>>44006443 #

WCSTombs ◴[15 May 25 19:34 UTC] No.43998476[source]▶

>>43998105 #

> Please tell me how to vectorize this Python code. Go on, I'm waiting.

Here's a start:

    import numpy as np

    def sieve(limit):
        result = np.ones(limit + 2, dtype=bool)
        result[0] = False
        result[1] = False
        for i in range(2, limit + 1):
            if result[i]:
                yield i
                result[::i] = False

    for prime in sieve(100):
        print(prime)

As you said, you can't vectorize the outer loop because each iteration depends on the result of previous iterations, so I think you can't do too much better than that with this algorithm. (There are a few easy algorithm optimizations, but I think that's kind of orthogonal to your point, and anyway you can do them in any language.)

I would expect there's still a significant performance gap between that and, say, a simple C implementation, but that shouldn't be surprising, and personally I haven't encountered these supposed droves of people claiming that NumPy fully solves the performance gap. From what I've seen there was a general consensus that vectorizing with NumPy can significantly tighten the gap but can rarely fully close it.

replies(1): >>44011585 #

1. thaumasiotes ◴[17 May 25 02:27 UTC] No.44011585[source]▶

>>43998476 #

> As you said, you can't vectorize the outer loop because each iteration depends on the result of previous iterations

I don't see why not. That check for whether result[i] is prime is just a performance optimization. If you want to do a different performance optimization, you can do that without hurting anything.

Note that while it's true that multiples of prime numbers are composite, it's also true that multiples of composite numbers are composite.

↑