Query Your Python Lists

1. sevensor ◴[20 Nov 24 03:23 UTC] No.42190570[source]▶

Having seen a lot of work come to grief because of the decision to use pandas, anything that’s not pandas has my vote. Pandas: if you’re not using it interactively, don’t use it at all. This advice goes double if your use case is “read a csv.” Standard library in Python has you covered there.

replies(2): >>42190621 #>>42190663 #

2. ttyprintk ◴[20 Nov 24 03:36 UTC] No.42190621[source]▶

>>42190570 (TP) #

Since DuckDB can read and write Pandas from memory, a team with varying Pandas fluency can benefit from learning DuckDB.

replies(1): >>42196654 #

3. c0balt ◴[20 Nov 24 03:47 UTC] No.42190663[source]▶

>>42190570 (TP) #

Both duckdb and especially polars should also be mentioned here. Polars in particular is quite good Ime if you want a pandas-alike interface (it additionally also has a more sane interface).

4. adolph ◴[20 Nov 24 18:14 UTC] No.42196654[source]▶

>>42190621 #

Since Pandas 2, Apache Arrow replaced NumPy as the backend for Pandas. Arrow is also used by Polars, DuckDB, Ibis, the list goes on.

https://arrow.apache.org/overview/

Apache Arrow solves most discussed problems, such as improving speed, interoperability, and data types, especially for strings. For example, the new string[pyarrow] column type is around 3.5 times more efficient. [...] The significant achievement here is zero-copy data access, mapping complex tables to memory to make accessing one terabyte of data on disk as fast and easy as one megabyte.

https://airbyte.com/blog/pandas-2-0-ecosystem-arrow-polars-d...