←back to thread

62 points eneuman | 1 comments | | HN request time: 0.207s | source
Show context
refactor_master ◴[] No.43376912[source]
It seems like this hack would be fine for notebooks, but not something I’d be interested in for production code.

Why not just something like this?

  def f(n):
      time.sleep(random.uniform(0.1, 0.3))  # Simulate network delay
      return pd.DataFrame({"A": [n, n+1], "B": [n*2, (n+1)*2]})

  with ThreadPoolExecutor() as ex:
    df = pd.concat(ex.map(f, range(3)), ignore_index=True)
replies(2): >>43377514 #>>43377647 #
isoprophlex ◴[] No.43377514[source]
indeed... the longer i write python, the more i just try to solve stuff with a simple ThreadPoolExecutor.

I think doing this is not the best choice for cpu-bound work, which is likely what you're running into with pandas, but nevertheless... I like how you can almost always slap a threadpool onto something and speed things up, with minimal cognitive overhead.

replies(3): >>43377574 #>>43377761 #>>43379109 #
1. ◴[] No.43379109[source]