←back to thread

FireDucks: Pandas but Faster

(hwisnu.bearblog.dev)
374 points sebg | 1 comments | | HN request time: 0.324s | source
Show context
thecleaner ◴[] No.42192790[source]
Sure but single node performance. This makes it not very useful IMO since quite a few data science folks work with Hadoop clusters or Snowflake clusters or DataBricks where data is distributed and querying is handled by Spark executors.
replies(2): >>42193202 #>>42193542 #
1. Kalanos ◴[] No.42193542[source]
Hadoop hasn't been relevant for a long time, which is telling.

Unless I had thousands of files to work with, I would be loathe to use cluster computing. There's so much overhead, cost, waiting for nodes to spin up, and cloud architecture nonsense.

My "single node" computer is a refurbished tower server with 256GB RAM and 50 threads.

Most of these distributed computing solutions arose before data processing tools started taking multi-threading seriously.