←back to thread

549 points thecr0w | 3 comments | | HN request time: 0.001s | source
Show context
thuttinger ◴[] No.46184466[source]
Claude/LLMs in general are still pretty bad at the intricate details of layouts and visual things. There are a lot of problems that are easy to get right for a junior web dev but impossible for an LLM. On the other hand, I was able to write a C program that added gamma color profile support to linux compositors that don't support it (in my case Hyprland) within a few minutes! A - for me - seemingly hard task, which would have taken me at least a day or more if I didn't let Claude write the code. With one prompt Claude generated C code that compiled on first try that:

- Read an .icc file from disk

- parsed the file and extracted the VCGT (video card gamma table)

- wrote the VCGT to the video card for a specified display via amdgpu driver APIs

The only thing I had to fix was the ICC parsing, where it would parse header strings in the wrong byte-order (they are big-endian).

replies(3): >>46184840 #>>46185379 #>>46185476 #
littlecranky67 ◴[] No.46184840[source]
> Claude/LLMs in general are still pretty bad at the intricate details of layouts and visual things

Because the rendered output (pixels, not HTML/CSS) is not fed as data in the training. You will find tons of UI snippets and questions, but they rarely included screenshots. And if they do, the are not scraped.

replies(2): >>46185301 #>>46187357 #
1. ubercow13 ◴[] No.46187357[source]
Why wouldn't they be?
replies(1): >>46189234 #
2. littlecranky67 ◴[] No.46189234[source]
Why would they be?
replies(1): >>46196677 #
3. ubercow13 ◴[] No.46196677[source]
Well, I don't know but many LLMs are multimodal and understand pictures and images. You can upload videos to Gemini and they're tokenised and fed into the LLM. If some programming blog post has a screenshot with the result of some UI code, why would that not be scraped and used for training? Is there some reason that wouldn't be possible?