One possible hand optimization is to push the aggregation below the joins, which makes the latter a few hundred rows.
replies(1):
SELECT
pickup.zone AS pickup_zone,
dropoff.zone AS dropoff_zone,
cnt AS num_trips
FROM
(select pickup_location_id, dropoff_location_id, count(*) as cnt from taxi_data_2019 group by 1,2) data
INNER JOIN
(SELECT * FROM zone_lookups WHERE Borough = 'Manhattan') pickup
ON pickup.LocationID = data.pickup_location_id
INNER JOIN
(SELECT * FROM zone_lookups WHERE Borough = 'Manhattan') dropoff
ON dropoff.LocationID = data.dropoff_location_id
ORDER BY num_trips desc
LIMIT 5;
┌───────────────────────┬───────────────────────┬───────────┐
│ pickup_zone │ dropoff_zone │ num_trips │
│ varchar │ varchar │ int64 │
├───────────────────────┼───────────────────────┼───────────┤
│ Upper East Side South │ Upper East Side North │ 536621 │
│ Upper East Side North │ Upper East Side South │ 455954 │
│ Upper East Side North │ Upper East Side North │ 451805 │
│ Upper East Side South │ Upper East Side South │ 435054 │
│ Upper West Side South │ Upper West Side North │ 236737 │
└───────────────────────┴───────────────────────┴───────────┘
Run Time (s): real 0.304 user 1.791931 sys 0.132745
(unedited query is similar to theirs, about .9s)And, doing the same to the optimized query drops it to about .265s.
Thanks for the reference; this question has been on my mind for some time.