Andrej Karpathy: Software in the era of AI [video]

(www.youtube.com)

1479 points sandslash | 3 comments | 19 Jun 25 00:33 UTC | HN request time: 0.573s | source

Show context

gchamonlive ◴[19 Jun 25 01:31 UTC] No.44314670[source]▶

I think it's interesting to juxtapose traditional coding, neural network weights and prompts because in many areas -- like the example of the self driving module having code being replaced by neural networks tuned to the target dataset representing the domain -- this will be quite useful.

However I think it's important to make it clear that given the hardware constraints of many environments the applicability of what's being called software 2.0 and 3.0 will be severely limited.

So instead of being replacements, these paradigms are more like extra tools in the tool belt. Code and prompts will live side by side, being used when convenient, but none a panacea.

replies(4): >>44315052 #>>44316337 #>>44322007 #>>44323973 #

karpathy ◴[19 Jun 25 02:58 UTC] No.44315052[source]▶

>>44314670 #

I kind of say it in words (agreeing with you) but I agree the versioning is a bit confusing analogy because it usually additionally implies some kind of improvement. When I’m just trying to distinguish them as very different software categories.

replies(5): >>44315296 #>>44319138 #>>44319445 #>>44320206 #>>44320915 #

miki123211 ◴[19 Jun 25 03:57 UTC] No.44315296[source]▶

>>44315052 #

What do you think about structured outputs / JSON mode / constrained decoding / whatever you wish to call it?

To me, it's a criminally underused tool. While "raw" LLMs are cool, they're annoying to use as anything but chatbots, as their output is unpredictable and basically impossible to parse programmatically.

Structured outputs solve that problem neatly. In a way, they're "neural networks without the training". They can be used to solve similar problems as traditional neural networks, things like image classification or extracting information from messy text, but all they require is a Zod or Pydantic type definition and a prompt. No renting GPUs, labeling data and tuning hyperparameters necessary.

They often also improve LLM performance significantly. Imagine you're trying to extract calories per 100g of product, but some product give you calories per serving and a serving size, calories per pound etc. The naive way to do this is a prompt like "give me calories per 100g", but that forces the LLM to do arithmetic, and LLMs are bad at arithmetic. With structured outputs, you just give it the fifteen different formats that you expect to see as alternatives, and use some simple Python to turn them all into calories per 100g on the backend side.

replies(3): >>44316175 #>>44319590 #>>44322019 #

solaire_oa ◴[19 Jun 25 15:28 UTC] No.44319590[source]▶

>>44315296 #

I also think that structured outputs are criminally underused, but it isn't perfect... and per your example, it might not even be good, because I've done something similar.

I was trying to make a decent cocktail recipe database, and scraped the text of cocktails from about 1400 webpages. Note that this was just the text of the cocktail recipe, and cocktail recipes are comparatively small. I sent the text to an LLM for JSON structuring, and the LLM routinely miscategorized liquor types. It also failed to normalize measurements with explicit instructions and the temperature set to zero. I gave up.

replies(2): >>44320897 #>>44322772 #

hellovai ◴[19 Jun 25 21:41 UTC] No.44322772[source]▶

>>44319590 #

have you tried schema-aligned parsing yet?

the idea is that instead of using JSON.parse, we create a custom Type.parse for each type you define.

so if you want a:

   class Job { company: string[] }

And the LLM happens to output:

   { "company": "Amazon" }

We can upcast "Amazon" -> ["Amazon"] since you indicated that in your schema.

https://www.boundaryml.com/blog/schema-aligned-parsing

and since its only post processing, the technique will work on every model :)

for example, on BFCL benchmarks, we got SAP + GPT3.5 to beat out GPT4o ( https://www.boundaryml.com/blog/sota-function-calling )

replies(3): >>44323295 #>>44323460 #>>44323864 #

1. solaire_oa ◴[20 Jun 25 00:52 UTC] No.44323864[source]▶

>>44322772 #

Ok. Tried it, I'm not super impressed.

    Client: Ollama (phi4) - 90164ms. StopReason: stop. Tokens(in/out): 365/396
    ---PROMPT---
    user: Extract from this content:
    Grave Digger: 
     Ingredients
    
    - 1 1/2 ounces vanilla-infused brandy*
    
    - 3/4 ounce coffee liqueur
    
    - 1/2 ounce Grand Marnier
    
    - 1 ounce espresso, freshly brewed
    
    - Garnish: whipped cream
    
    - Garnish: oreo cookies, crushed
    
    Steps
    
    1.  Add all ingredients into a shaker with ice and shake until
        well-chilled.
    
    2.  Strain into a coupe.
    
    3.  Top with whipped cream and crushed Oreo cookies (discarding cream in
        center).
    
    *Vanilla-infused brandy: Cut 2 fresh vanilla pods lengthwise and place
    into a 750 mL bottle of brandy. Let sit for 3 to 5 days, shaking
    occasionally. Strain out pods and store.
    
    
    
    IngredientType
    ----
    - Alcohol
    - Sweetener
    - Sour
    - Aromatic
    - Bittering_agent
    - Food
    - Dilution
    
    LiquorType
    ----
    - Gin
    - Vodka
    - Rum
    - Whiskey
    - Tequila
    - Mezcal
    - Brandy
    - Cognac
    - Liqueur
    - Wine
    - Absinthe
    
    Answer in JSON using this schema:
    {
      Name: string,
      Ingredients: [
        {
          Text: string,
          Type: IngredientType,
          Liquor_type: LiquorType or null,
          Name_brand: string or null,
          Unit_of_measure: string,
          Measurement_or_unit_count: string,
        }
      ],
      Steps: [
        {
          Number: int,
          Instruction: string,
        }
      ],
      Garnishes: string[],
    }
    
    ---LLM REPLY---
    ```json
    {
      "Name": "Grave Digger",
      "Ingredients": [
        {
          "Text": "vanilla-infused brandy*",
          "Type": "Alcohol",
          "Liquor_type": "Brandy",
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1 1/2"
        },
        {
          "Text": "coffee liqueur",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": null,
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "3/4"
        },
        {
          "Text": "Grand Marnier",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": "Grand Marnier",
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "1/2"
        },
        {
          "Text": "espresso, freshly brewed",
          "Type": "Bittering_agent",
          "Liquor_type": null,
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1"
        }
      ],
      "Steps": [
        {
          "Number": 1,
          "Instruction": "Add all ingredients into a shaker with ice and shake until well-chilled."
        },
        {
          "Number": 2,
          "Instruction": "Strain into a coupe."
        },
        {
          "Number": 3,
          "Instruction": "Top with whipped cream and crushed Oreo cookies (discarding cream in center)."
        }
      ],
      "Garnishes": [
        "whipped cream",
        "oreo cookies, crushed"
      ]
    }
    ```
    ---Parsed Response (class Recipe)---
    {
      "Name": "Grave Digger",
      "Ingredients": [
        {
          "Text": "vanilla-infused brandy*",
          "Type": "Alcohol",
          "Liquor_type": "Brandy",
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1 1/2"
        },
        {
          "Text": "espresso, freshly brewed",
          "Type": "Bittering_agent",
          "Liquor_type": null,
          "Name_brand": null,
          "Unit_of_measure": "ounces",
          "Measurement_or_unit_count": "1"
        }
      ],
      "Steps": [
        {
          "Number": 1,
          "Instruction": "Add all ingredients into a shaker with ice and shake until well-chilled."
        },
        {
          "Number": 2,
          "Instruction": "Strain into a coupe."
        },
        {
          "Number": 3,
          "Instruction": "Top with whipped cream and crushed Oreo cookies (discarding cream in center)."
        }
      ],
      "Garnishes": [
        "whipped cream",
        "oreo cookies, crushed"
      ]
    }

Processed Recipe: { Name: 'Grave Digger', Ingredients: [ { Text: 'vanilla-infused brandy*', Type: 'Alcohol', Liquor_type: 'Brandy', Name_brand: null, Unit_of_measure: 'ounces', Measurement_or_unit_count: '1 1/2' }, { Text: 'espresso, freshly brewed', Type: 'Bittering_agent', Liquor_type: null, Name_brand: null, Unit_of_measure: 'ounces', Measurement_or_unit_count: '1' } ], Steps: [ { Number: 1, Instruction: 'Add all ingredients into a shaker with ice and shake until well-chilled.' }, { Number: 2, Instruction: 'Strain into a coupe.' }, { Number: 3, Instruction: 'Top with whipped cream and crushed Oreo cookies (discarding cream in center).' } ], Garnishes: [ 'whipped cream', 'oreo cookies, crushed' ] }

So, yeah, the main issue being that it dropped some ingredients that were present in the original LLM reply. Separately, the original LLM Reply misclassified the `Type` field in `coffee liqueur`, which should have been `Alcohol`.

replies(2): >>44338389 #>>44338516 #

2. hellovai ◴[21 Jun 25 15:42 UTC] No.44338389[source]▶

>>44323864 (TP) #

appreciate you tyring it. the reason it dropped the day was due to your type system not being understood by the LLM you're using.

the model replied with

       {
          "Text": "coffee liqueur",
          "Type": "Liqueur",
          "Liquor_type": "Liqueur",
          "Name_brand": null,
          "Unit_of_measure": "ounce",
          "Measurement_or_unit_count": "3/4"
        },

but you expected a { Text: string, Type: IngredientType, Liquor_type: LiquorType or null, Name_brand: string or null, Unit_of_measure: string, Measurement_or_unit_count: string, }

there's no way to cast `Liqueur` -> `IngredientType`. but since the the data model is a `Ingredient[]` we attempted to give you as many ingredients as possible.

The model itself being wrong isn't something we can do much about. that depends on 2 things (the capabilities of the model, and the prompt you pass in).

If you wanted to capture all of the items with more rigor you could write it in this way:

    class Recipe {
        name string
        ingredients Ingredient[]
        num_ingredients int
        ...

        // add a constraint on the type
        @@assert(counts_match, {{ this.ingredients|length == this.num_ingredients }})
    }

And then if you want to be very wild, put this in your prompt:

   {{ ctx.output_format }}
   No quotes around strings

And it'll do some cool stuff

3. hellovai ◴[21 Jun 25 15:58 UTC] No.44338516[source]▶

>>44323864 (TP) #

if you share your prompt with me on promptfiddle.com i can play around with it and see how i can make it better!

↑