←back to thread

84 points mkalioby | 4 comments | | HN request time: 0.207s | source
1. sevensor ◴[] No.42190570[source]
Having seen a lot of work come to grief because of the decision to use pandas, anything that’s not pandas has my vote. Pandas: if you’re not using it interactively, don’t use it at all. This advice goes double if your use case is “read a csv.” Standard library in Python has you covered there.
replies(2): >>42190621 #>>42190663 #
2. ttyprintk ◴[] No.42190621[source]
Since DuckDB can read and write Pandas from memory, a team with varying Pandas fluency can benefit from learning DuckDB.
replies(1): >>42196654 #
3. c0balt ◴[] No.42190663[source]
Both duckdb and especially polars should also be mentioned here. Polars in particular is quite good Ime if you want a pandas-alike interface (it additionally also has a more sane interface).
4. adolph ◴[] No.42196654[source]
Since Pandas 2, Apache Arrow replaced NumPy as the backend for Pandas. Arrow is also used by Polars, DuckDB, Ibis, the list goes on.

https://arrow.apache.org/overview/

Apache Arrow solves most discussed problems, such as improving speed, interoperability, and data types, especially for strings. For example, the new string[pyarrow] column type is around 3.5 times more efficient. [...] The significant achievement here is zero-copy data access, mapping complex tables to memory to make accessing one terabyte of data on disk as fast and easy as one megabyte.

https://airbyte.com/blog/pandas-2-0-ecosystem-arrow-polars-d...