Most active commenters

    ←back to thread

    483 points mraniki | 26 comments | | HN request time: 1.13s | source | bottom
    1. bratao ◴[] No.43534359[source]
    From my use case, the Gemini 2.5 is terrible. I have a complex Cython code in a single file (1500 lines) for a Sequence Labeling. Claude and o3 are very good in improving this code and following the commands. The Gemini always try to do unrelated changes. For example, I asked, separately, for small changes such as remove this unused function, or cache the arrays indexes. Every time it completely refactored the code and was obsessed with removing the gil. The output code is always broken, because removing the gil is not easy.
    replies(10): >>43534409 #>>43534423 #>>43534434 #>>43534511 #>>43534695 #>>43534743 #>>43535378 #>>43536361 #>>43536527 #>>43536933 #
    2. fl_rn_st ◴[] No.43534409[source]
    This reflects my experience 1:1... even telling 2.5 Pro to focus on the tasks given and ignore everything else leads to it changing unrelated code. It's a frustrating experience because I believe at its core it is more capable than Sonnet 3.5/3.7
    3. ldjkfkdsjnv ◴[] No.43534423[source]
    Yup, gemini 2.5 is bad.
    replies(1): >>43534715 #
    4. ekidd ◴[] No.43534434[source]
    How are you asking Gemini 2.5 to change existing code? With Claude 3.7, it's possible to use Claude Code, which gets "extremely fast but untrustworthy intern"-level results. Do you have a prefered setup to use Gemini 2.5 in a similar agentic mode, perhaps using a tool like Cursor or aider?
    replies(1): >>43534482 #
    5. bratao ◴[] No.43534482[source]
    For all LLMs, I´m using a simple prompt with the complete code in triple quotes and the command at the end, asking to output the complete code of changed functions. Then I use Winmerge to compare the changes and apply. I feel more confident doing this than using Cursor.
    replies(1): >>43535408 #
    6. redog ◴[] No.43534511[source]
    For me I had to upload the library's current documentation to it because it was using outdated references and changing everything that was working in the code to broken and not focusing on the parts I was trying to build upon.
    replies(2): >>43534560 #>>43535477 #
    7. amarcheschi ◴[] No.43534560[source]
    using outdated references and docs is something i've experienced more or less with every model i've tried, from time to time
    replies(2): >>43534603 #>>43535145 #
    8. rockwotj ◴[] No.43534603{3}[source]
    I am hoping MCP will fix this. I am building an MCP integration with kapa.ai for my company to help devs here. I guess this doesn’t work if you don’t add in the tool
    9. hyperbovine ◴[] No.43534695[source]
    Maybe the Unladen Swallow devs ended up on the Gemini team.
    10. itchyjunk ◴[] No.43534715[source]
    Were you also trying to edit the same code base as the GP or did you evaluate it on some other criteria where it also failed?
    replies(1): >>43534724 #
    11. ldjkfkdsjnv ◴[] No.43534724{3}[source]
    I take the same prompt and give it to 3.7, o1 pro, and gemini. I do this for almost everything, and these are large 50k+ context prompts. Gemini is almost always behind
    12. dagw ◴[] No.43534743[source]
    That matches my experience as well. Gemini 2.5 Pro seems better at writing code from scratch, but Claude 3.7 seems much better at refactoring my existing code.

    Gemini also seems more likely to come up with 'advanced' ideas (for better or worse). I for example asked both for a fast C++ function to solve an on the surface fairly simple computational geometry problem. Claude solved it in a straight ahead and obvious way. Nothing obviously inefficient, will perform reasonably well for all inputs, but also left some performance on the table. I could also tell at a glance that it was almost certainly correct.

    Gemini on the other hand did a bunch of (possibly) clever 'optimisations' and tricks, plus made extensive use of OpenMP. I know from experience that those optimisations will only be faster if the input has certain properties, but will be a massive overhead in other, quite common, cases.

    With a bit more prompting and questions from my part I did manage to get both Gemini and Claude to converge on pretty much the same final answer.

    13. simonw ◴[] No.43535145{3}[source]
    That's expected, because they almost all have training cut-off dates from a year ago or longer.

    The more interesting question is if feeding in carefully selected examples or documentation covering the new library versions helps them get it right. I find that to usually be the case.

    14. pests ◴[] No.43535378[source]
    > The Gemini always try to do unrelated changes. For example, I asked, separately, for small changes such as remove this unused function

    For anything like this, I don’t understand trying to invoke AI. Just open the file and delete the lines yourself. What is AI going to do here for you?

    It’s like you are relying 100% on AI when it’s a tool in your toolset.

    replies(2): >>43536131 #>>43537853 #
    15. pests ◴[] No.43535408{3}[source]
    Should really check out aider. Automates this but also does things like make a repo map of all your functions / signatures for non-included files so it can get more context.
    16. Jcampuzano2 ◴[] No.43535477[source]
    If you don't mind me asking how do you go about this?

    I hear people commonly mention doing this but I can't imagine people are manually adding every page of the docs for libraries or frameworks they're using since unfortunately most are not in one single tidy page easy to copy paste.

    replies(3): >>43535857 #>>43538489 #>>43540602 #
    17. dr_kiszonka ◴[] No.43535857{3}[source]
    If you have access to the documentation source, you can concatenate all files into one. Some software also has docs downloadable as PDF.
    18. joshmlewis ◴[] No.43536131[source]
    Playing devils advocate here, it's because removing a function is not always as simple as deleting the lines. Sometimes there are references to that function that you forgot about that the LLM will notice and automatically update for you. Depending on your prompt it will also go find other references outside of the single file and remove those as well. Another possibility is that people are just becoming used to interacting with their codebase through the "chat" interface and directing the LLM to do things so that behavior carries over into all interactions, even perceived "simple" ones.
    replies(1): >>43537178 #
    19. therealmarv ◴[] No.43536361[source]
    set temperature to 0.4 or lower.
    replies(1): >>43536656 #
    20. kristopolous ◴[] No.43536527[source]
    I mean it's really in how you use it.

    The focus on benchmarks affords a tendency to generalize performance as if it's context and user independent.

    Each model really is a different piece of software with different capabilities. Really fascinating to see how dramatically different people's assessments are

    21. mrinterweb ◴[] No.43536656[source]
    Adjusting temperature is something I often forget. I think Gemini can range between 0.0 <-> 2.0 (1.0 default). Lowering the temp should get more consistent/deterministic results.
    22. rom16384 ◴[] No.43536933[source]
    You can fix this using a system prompt to force it to reply just with a diff. It makes the generation much faster and much less prone to changing unrelated lines. Also try reducing the temperature to 0.4 for example, I find the default temperature of 1 too high. For sample system prompts see Aider Chat: https://github.com/Aider-AI/aider/blob/main/aider/coders/edi...
    23. matsemann ◴[] No.43537178{3}[source]
    Any IDE will do this for you a hundred times better than current LLMs.
    24. Fr3ck ◴[] No.43537853[source]
    I like to code with an LLMs help making iterative changes. First do this, then once that code is a good place, then do this, etc. If I ask it to make one change, I want it to make one change only.
    25. genewitch ◴[] No.43538489{3}[source]
    Have the AI write a quick script using bs4 or whatever to take the HTML dump and output json, then all the aider-likes can use that json as documentation. Or just the HTML, but that wastes context window.
    26. SweetSoftPillow ◴[] No.43540602{3}[source]
    https://github.com/mufeedvh/code2prompt https://github.com/yamadashy/repomix