←back to thread

467 points mraniki | 2 comments | | HN request time: 0.42s | source
1. dsign ◴[] No.43535155[source]
I guess depends on the task? I have very low expectations for Gemini, but I gave it a run with a signal processing easy problem and it did well. It took 30 seconds to reason through a problem that would have taken me between 5 to 10 minutes to reason. Gemini's reasoning was sound (but it took me a couple of minutes to decide that), and it also wrote the functions with the changes (which took me an extra minute to verify). It's not a definitive win in time, but at least there was an extra pair of "eyes"--or whatever that's called with a system like this one.

All in all, I think we humans are well on our way to become legal flesh[].

[] The part of the system to whip or throw in jail when a human+LLM commit a mistake.

replies(1): >>43535240 #
2. vonneumannstan ◴[] No.43535240[source]
>I guess depends on the task? I have very low expectations for Gemini, but I gave it a run with a signal processing easy problem and it did well. It took 30 seconds to reason through a problem that would have taken me between 5 to 10 minutes to reason. Gemini's reasoning was sound (but it took me a couple of minutes to decide that), and it also wrote the functions with the changes (which took me an extra minute to verify). It's not a definitive win in time, but at least there was an extra pair of "eyes"--or whatever that's called with a system like this one.

I wonder if you treat code from a Jr engineer the same way? Seems impossible to scale a team that way. You shouldnt need to verify every line but rather have test harnesses that ensure adherence to the spec.