←back to thread

FireDucks: Pandas but Faster

(hwisnu.bearblog.dev)
398 points sebg | 1 comments | | HN request time: 0.224s | source
Show context
OutOfHere ◴[] No.42195321[source]
Don't use it:

> By providing the beta version of FireDucks free of charge and enabling data scientists to actually use it, NEC will work to improve its functionality while verifying its effectiveness, with the aim of commercializing it within FY2024.

In other words, it's free only to trap you.

replies(5): >>42195375 #>>42195631 #>>42197438 #>>42198018 #>>42204627 #
1. ritchie46 ◴[] No.42204627[source]
I don't trust their benchmarks. I ran their benchmarks source locally on my machine TPCH scale 10. Polars was orders of magnitudes faster and didn't SIGABORT at query 10 (I wasn't OOM).

    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[SIGINT] $ SCALE_FACTOR=10.0 make run-polars
    .venv/bin/python -m queries.polars
    {"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
    Code block 'Run polars query 1' took: 1.47103 s
    Code block 'Run polars query 2' took: 0.09870 s
    Code block 'Run polars query 3' took: 0.53556 s
    Code block 'Run polars query 4' took: 0.38394 s
    Code block 'Run polars query 5' took: 0.69058 s
    Code block 'Run polars query 6' took: 0.25951 s
    Code block 'Run polars query 7' took: 0.79158 s
    Code block 'Run polars query 8' took: 0.82241 s
    Code block 'Run polars query 9' took: 1.67873 s
    Code block 'Run polars query 10' took: 0.74836 s
    Code block 'Run polars query 11' took: 0.18197 s
    Code block 'Run polars query 12' took: 0.63084 s
    Code block 'Run polars query 13' took: 1.26718 s
    Code block 'Run polars query 14' took: 0.94258 s
    Code block 'Run polars query 15' took: 0.97508 s
    Code block 'Run polars query 16' took: 0.25226 s
    Code block 'Run polars query 17' took: 2.21445 s
    Code block 'Run polars query 18' took: 3.67558 s
    Code block 'Run polars query 19' took: 1.77616 s
    Code block 'Run polars query 20' took: 1.96116 s
    Code block 'Run polars query 21' took: 6.76098 s
    Code block 'Run polars query 22' took: 0.32596 s
    Code block 'Overall execution of ALL polars queries' took: 34.74840 s
    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch$ SCALE_FACTOR=10.0 make run-fireducks
    .venv/bin/python -m queries.fireducks
    {"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
    Code block 'Run fireducks query 1' took: 5.35801 s
    Code block 'Run fireducks query 2' took: 8.51291 s
    Code block 'Run fireducks query 3' took: 7.04319 s
    Code block 'Run fireducks query 4' took: 19.60374 s
    Code block 'Run fireducks query 5' took: 28.53868 s
    Code block 'Run fireducks query 6' took: 4.86551 s
    Code block 'Run fireducks query 7' took: 28.03717 s
    Code block 'Run fireducks query 8' took: 52.17197 s
    Code block 'Run fireducks query 9' took: 58.59863 s
    terminate called after throwing an instance of 'std::length_error'
      what():  vector::_M_default_append
    Code block 'Overall execution of ALL fireducks queries' took: 249.06256 s
    Traceback (most recent call last):
      File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 39, in <module>
        execute_all("fireducks")
      File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 22, in execute_all
        run(
      File "/home/ritchie46/miniconda3/lib/python3.10/subprocess.py", line 526, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['/home/ritchie46/Downloads/deleteme/polars-tpch/.venv/bin/python', '-m', 'fireducks.imhook', 'queries/fireducks/q10.py']' died with <Signals.SIGABRT: 6>.
    make: \*\* [Makefile:52: run-fireducks] Error 1
    (.venv) [fireducks]  ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[2] $