←back to thread

I don't like NumPy

(dynomight.net)
480 points MinimalAction | 1 comments | | HN request time: 0.32s | source
Show context
jamesblonde ◴[] No.43998616[source]
In Data for ML, everything has switch from NumPy (Pandas) to Arrow (Polars, DuckDB, Spark, Pandas 2.x, etc). However, Scikit-Learn is still a hold out, so it's Arrow from you data sources all to way to pre-processing pipelines in Scikit-Learn when you have to go back to NumPy. In practice, it now makes more sense to separate feature pipelines in Arrow from training pipelines with Pandas/NumPy and Scikit-Learn.*

*This is ML, not Deep Learning or Transformers.

replies(1): >>44008302 #
1. kccqzy ◴[] No.44008302[source]
Most Arrow arrays can be transformed into numpy arrays in a zero-copy manner. And having used both, I personally think Arrow is way more buggy than numpy: PyArrow segfaults for me about once a month when writing pure Python; numpy never segfaulted on me.