The barriers to AI engineering are crumbling fast

(blog.helix.ml)

251 points lewq | 1 comments | 14 Nov 24 14:56 UTC | HN request time: 0.196s | source

Show context

mark_l_watson ◴[14 Nov 24 16:44 UTC] No.42138010[source]▶

After just spending 15 minutes trying to get something useful accomplished, anything useful at all, with latest beta Apple Intelligence with a M1 iPad Pro (16G RAM), this article appealed to me!

I have been running the 32B parameters qwen2.5-coder model on my 32G M2 Mac and and it is a huge help with coding.

The llama3.3-vision model does a great job processing screen shots. Small models like smollm2:latest can process a lot of text locally, very fast.

Open source front ends like Open WebUI are improving rapidly.

All the tools are lining up for do it yourself local AI.

The only commercial vendor right now that I think is doing a fairly good job at an integrated AI workflow is Google. Last month I had all my email directed to my gmail account, and the Gemini Advanced web app did a really good job integrating email, calendar, and google docs. Job well done. That said, I am back to using ProtonMail and trying to build local AIs for my workflows.

I am writing a book on the topic of local, personal, and private AIs.

replies(5): >>42138175 #>>42139063 #>>42140813 #>>42141201 #>>42142652 #

bboygravity ◴[14 Nov 24 21:04 UTC] No.42141201[source]▶

>>42138010 #

can llama 3.3 vision do things like "there's a textbox/form field at location 1000, 800 with label "address"" ?

I did a quick and dirty prototype with Claud for this, but it returned everything with an offset and/or scaled.

Would be a killer app to be able to auto-fill any form using OCR.

replies(1): >>42147091 #

MaxLeiter ◴[15 Nov 24 14:11 UTC] No.42147091[source]▶

>>42141201 #

Were you using claude’s computer mode? It can do this

replies(1): >>42156005 #

1. bboygravity ◴[16 Nov 24 12:08 UTC] No.42156005[source]▶

>>42147091 #

No, I used the regular Claude which can also (somewhat) do this and uses the same image processing backend as "computer use" as far as I know (source: Antropic CEO interview with Lex Friedman recently).

Computer use is also not very good at it (often mis-clicking for example).

I'm guessing this will work flawlessly within 6 months to a year or so, but it doesn't seem ready yet.

↑