AI coding and the peanut butter and jelly problem

(iamcharliegraham.substack.com)

Show context

kenjackson ◴[12 Apr 25 00:12 UTC] No.43660091[source]▶

This is actually no different than for humans once you get past the familiar. It's like the famous project management tree story: https://pmac-agpc.ca/project-management-tree-swing-story

If anything, LLMs have surprised at much better they are than humans in understanding instructions for text based activities. But they are MUCH worse than humans when it comes to creating images/videos.

replies(2): >>43662572 #>>43662984 #

barotalomey ◴[12 Apr 25 09:59 UTC] No.43662984[source]▶

>>43660091 #

> If anything, LLMs have surprised at much better they are than humans in understanding instructions for text based activities.

That's demonstrateably false, as proven by both OpenAI's own research [1] and endless independent studies by now.

What is fascinating is how some people cling on false ideas about what LLM is and isnt.

Its a recurring fallacy that's bound to get it's own name any time soon.

1: https://news.ycombinator.com/item?id=43155825

replies(2): >>43663692 #>>43663986 #

kenjackson ◴[12 Apr 25 12:58 UTC] No.43663986[source]▶

>>43662984 #

You’re comparing an LLM to expert programmers. Compare an LLM on the same task versus the average college student. And try it for a math problem. A poetry problem. Ask it a more complex question about history or to do an analysis of an essay you wrote.

Put it this way — I’m going to give you a text based question to solve and you have a choice to get another human to solve it (randomly selected from adults in the US) or ChatGPT, and both will be given 30 minutes to read and solve the problem — which would you choose?

replies(1): >>43664125 #

1. aleph_minus_one ◴[12 Apr 25 13:18 UTC] No.43664125[source]▶

>>43663986 #

> Put it this way — I’m going to give you a text based question to solve and you have a choice to get another human to solve it (randomly selected from adults in the US) or ChatGPT, and both will be given 30 minutes to read and solve the problem — which would you choose?

You wouldn't randomly selected an arbitrary adult from the USA to do a brain surgery on you, so this argument is rabulistic.

replies(2): >>43664319 #>>43666211 #

2. kenjackson ◴[12 Apr 25 13:46 UTC] No.43664319[source]▶

>>43664125 (TP) #

Brain surgery requires a license.

But I do expect an arbitrary adult to be able to follow instructions.

Ok. How about you give me a text based task where you would pick the random adult over the LLM?

replies(2): >>43664551 #>>43673244 #

3. aleph_minus_one ◴[12 Apr 25 14:12 UTC] No.43664551[source]▶

>>43664319 #

> Brain surgery requires a license.

This is rather a red-tape problem. :-)

4. daveguy ◴[12 Apr 25 17:10 UTC] No.43666211[source]▶

>>43664125 (TP) #

I would chose a random person from my company that was hired to work in that domain to solve problems in that domain. Yes, regardless of the position. Accountant in the domain, yes. Office organizer in the domain, yes. Essentially anyone in the domain, yes. No offense, but by restricting the selection to the general human population you're setting a low bar for LLMs here.

replies(1): >>43667233 #

5. kenjackson ◴[12 Apr 25 19:25 UTC] No.43667233[source]▶

>>43666211 #

If the bar is for LLMs to replace domain experts about four years after introduction then yes, they are failing miserably.

But if you were to go back to 2020 and ask if your take a random human over a the state of the art AI to answer a text question you’d take the random human every time except for arithmetic (and you’d have to write it in math notation and not plain English).

And if you were to ask AI experts when would you chose an AI they’d say at least not for a decade or two, if ever.

replies(1): >>43674003 #

6. nyclounge ◴[13 Apr 25 14:55 UTC] No.43673244[source]▶

>>43664319 #

I think you and the parent may be talking about 2 different things.

Do I want to use an LLM to do it from business owner perspective? Yeah probably it is cheaper and more convenient. Which one I want to use, depending the problem we are solving here right?

I'm more concern about the integrity of the current digital infrastructure. In that sense I would NOT trust ANY thing really important to anything digital, much less to LLM. Can I use it for exploration then require an actually human expert approval/edit. Absolutely!

As long as the digital doesn't result in significant physical or financial damage.

Edit: and for HN ppl, of course the LLM will have have to be open weight and all and run locally in a air gaped GPU, preferably in a Faraday cage.

7. daveguy ◴[13 Apr 25 16:37 UTC] No.43674003{3}[source]▶

>>43667233 #

I wasn't talking about how impressive AI systems are, or how far they've come. I was talking about the fact that any random human with any experience in a specific field -- even though they are not a domain expert -- is going to do better than an LLM. Or, human common sense >>>> what LLMs are doing.

replies(1): >>43675229 #

8. kenjackson ◴[13 Apr 25 19:30 UTC] No.43675229{4}[source]▶

>>43674003 #

We will have to agree to disagree about your fundamental point.

replies(1): >>43677015 #

9. daveguy ◴[14 Apr 25 00:40 UTC] No.43677015{5}[source]▶

>>43675229 #

Fair enough. We will see.

↑