←back to thread

323 points steerlabs | 2 comments | | HN request time: 0.001s | source
Show context
toddmorey ◴[] No.46191994[source]
Confident idiot: I’m exploring using LLM for diagram creation.

I’ve found after about 3 prompts to edit an image with Gemini, it will respond randomly with an entirely new image. Another quirk is it will respond “here’s the image with those edits” with no edits made. It’s like a toaster that will catch on fire every eighth or ninth time.

I am not sure how to mitigate this behavior. I think maybe an LLM as a judge step with vision to evaluate the output before passing it on to the poor user.

replies(5): >>46193250 #>>46193673 #>>46194370 #>>46194578 #>>46195816 #
1. RationPhantoms ◴[] No.46193250[source]
Whats your thoughts on the diagram as code movement? I'd prefer to have an LLM utilize those as it can atleast drive some determinism through it rather than deal with the slippery layer that is prompt control for visual LLMs.
replies(1): >>46197292 #
2. toddmorey ◴[] No.46197292[source]
I think that's the right approach and what I've been experimenting with. Diagram as code and then style transfer from output diagram to desired look. That's where I've had the most success.