Ask HN: Cursor or Windsurf?

1. monster_truck ◴[12 May 25 07:17 UTC] No.43960400[source]▶

>>43959710 (OP) #

Void. I'd rather run my own models locally https://voideditor.com/

replies(1): >>43960805 #

2. traktorn ◴[12 May 25 08:31 UTC] No.43960805[source]▶

>>43960400 (TP) #

Which model are you running locally? Is it faster than waiting for Claudes generation? What gear do you use?

replies(2): >>43962562 #>>43964972 #

3. jhonof ◴[12 May 25 13:15 UTC] No.43962562[source]▶

>>43960805 #

Not OP but for autocomplete I am running Qwen2.5-Coder-7B and I quantized it using Q2_K. I followed this guide:

https://blog.steelph0enix.dev/posts/llama-cpp-guide/#quantiz...

And I get fast enough autcomplete results for it to be useful. I have and NVIDIA 4060 RTX in a laptop with 8 gigs of dedicated memory that I use for it. I still use claude for chat (pair programming) though, and I don't really use agents.

4. monster_truck ◴[12 May 25 16:42 UTC] No.43964972[source]▶

>>43960805 #

That's the fun part, you can use all of them! And you don't need to use browser plugins or console scripts to auto-retry failures (there aren't any) or queue up a ton of tasks overnight.

Have a 3950X w/ 32GB ram, Radeon VII & 6900XT sitting in the closet hosting smaller models then a 5800X3D/128GB/7900XTX as my main machine.

Most any quantized model that fits in half of the vram of a single gpu (and ideally supports flash attention, optionally speculative decoding) will give you far faster autocompletes. This is especially the case with the Radeon VII thanks to the memory bandwidth.