←back to thread

265 points ctoth | 1 comments | | HN request time: 0.206s | source
Show context
plaidfuji ◴[] No.43748358[source]
Gemini 2.5 Pro is certainly a tipping point for me. Previous LLMs have been very impressive, especially on coding tasks (unsurprising as the answers to these have a preponderance of publicly available data). But outside of a coding assistant, LLMs til now felt like an extra helpful and less garbage-filled Google search.

I just used 2.5 Pro to help write a large research proposal (with significant funding on the line). Without going into detail, it felt to me like the only reason it couldn’t write the entire thing itself is because I didn’t ask it to. And by “ask it”, I mean: enter into the laughably small chat box the entire grant solicitation + instructions, a paragraph of general direction for what I want to explore, and a bunch of unstructured artifacts from prior work, and turn it loose. I just wasn’t audacious enough to try that from the start.

But as the deadline approached, I got more and more unconstrained in how far back I would step and let it take the reins - doing essentially what’s described above but on isolated sections. It would do pretty ridiculously complex stuff, like generate project plans and timelines, cross reference that correctly with other sections of text, etc. I can safely say it was a 10x force multiplier, and that’s being conservative.

For scientific questions (ones that should have publicly available data, not ones relying on internal data), I have started going to 2.5 Pro over senior experts on my own team. And I’m convinced at this point if I were to connect our entire research data corpus to Gemini, that balance would shift even further. Why? Because I can trust it to be objective - not inject its own political or career goals into its answers.

I’m at the point where I feel the main thing holding back “AGI” is people’s audacity to push its limits, plus maybe context windows and compute availability. I say this as someone who’s been a major skeptic up until this point.

replies(9): >>43748425 #>>43749118 #>>43749224 #>>43751750 #>>43753576 #>>43755736 #>>43756318 #>>43756466 #>>43812541 #
valenterry ◴[] No.43751750[source]
And yet it fails with every second refactoring that I ask it to do in a mediocre complicated codebase. What am I doing wrong?
replies(2): >>43756614 #>>43757018 #
1. oytis ◴[] No.43757018[source]
Might be because you are an expert in what you ask it to do, and actually care about the result. E.g. I'm not sure what a marketing or otherwise business professional would say about the work it did on the cheese business. What has caught my eye is that projected cost of doing business (salaries) is unrealistically low, especially as the volumes are expected to grow