←back to thread

466 points 0x63_Problems | 2 comments | | HN request time: 0s | source
Show context
perrygeo ◴[] No.42138092[source]
> Companies with relatively young, high-quality codebases benefit the most from generative AI tools, while companies with gnarly, legacy codebases will struggle to adopt them. In other words, the penalty for having a ‘high-debt’ codebase is now larger than ever.

This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns. As soon as your codebase gets a little bit "weird" (ie trying to do anything novel and interesting), the model chokes, starts hallucinating, and makes your job considerably harder.

Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff. The gap does appear to be widening, not shrinking. They work best where we need them the least.

replies(24): >>42138267 #>>42138350 #>>42138403 #>>42138537 #>>42138558 #>>42138582 #>>42138674 #>>42138683 #>>42138690 #>>42138884 #>>42139109 #>>42139189 #>>42140096 #>>42140476 #>>42140626 #>>42140809 #>>42140878 #>>42141658 #>>42141716 #>>42142239 #>>42142373 #>>42143688 #>>42143791 #>>42151146 #
cheald ◴[] No.42139109[source]
The niche I've found for LLMs is for implementing individual functions and unit tests. I'll define an interface and a return (or a test name and expectation) and say "this is what I want this to do", and let the LLM take the first crack at it. Limiting the bounds of the problem to be solved does a pretty good job of at least scaffolding something out that I can then take to completion. I almost never end up taking the LLM's autocompletion at face value, but having it written out to review and tweak does save substantial amounts of time.

The other use case is targeted code review/improvement. "Suggest how I could improve this" fills a niche which is currently filled by linters, but can be more flexible and robust. It has its place.

The fundamental problem with LLMs is that they follow patterns, rather than doing any actual reasoning. This is essentially the observation made by the article; AI coding tools do a great job of following examples, but their usefulness is limited to the degree to which the problem to be solved maps to a followable example.

replies(3): >>42140322 #>>42143531 #>>42143847 #
acrooks ◴[] No.42143531[source]
Yes this is the same for me. I’ve shifted my programming style so now I just write function signatures and let the AI do the rest for me. It has been a dream and works consistently well.

I’ll also often add hints at the top of the file in the form of comments or sample data to help keep it on the right track.

replies(1): >>42145308 #
eesmith ◴[] No.42145308{3}[source]
Here's one I wrote the other day which took a long time to get right. I'm curious on how well your AI can do, since I can't imagine it does a good job at it.

  # Given a data set of size `size' >= 0, and a `text` string describing
  # the subset size, return a 2-element tuple containing a text string
  # describing the complement size and the actual size as an integer. The
  # text string can be in one of four forms (after stripping leading and
  # trailing whitespace):
  #
  #  1) the empty string, in which case return ("", 0)
  #  2) a stringified integer, like "123", where 0 <= n <= size, in
  #   which case return (str(size-int(n)), size-int(n))
  #  3) a stringified decimal value like "0.25" where 0 <= x <= 1.0, in
  #   which case compute the complement string as str(1 - x) and
  #   the complement size as size - (int(x * size)). Exponential
  #   notation is not supported, only numbers like "3.0", ".4", and "3.14"
  #  4) a stringified fraction value like "1/3", where 0 <= x <= 1,
  #   in which case compute the complement string and value as #3
  #   but using a fraction instead of a decimal. Note that "1/2" of
  #   51 must return ("1/2", 26), not ("1/2", 25).
  #
  # Otherwise, return ("error", -1)

  def get_complement(text: str, size: int) -> tuple[str, int]:
    ...

For examples:

  get_complement("1/2", 100) == ("1/2", 50)
  get_complement("0.6", 100) == ("0.4", 40)
  get_complement("100", 100) == ("0", 0)
  get_complement("0/1", 100) == ("1/1", 100)
Some of the harder test cases I came up were:

get_complement("0.8158557553804697", 448_525_430): this tests the underlying system uses decimal.Decimal rather than a float, because float64 ends up on a 0.5 boundary and applies round-half-even resulting in a different value than the true decimal calculation, which does not end up with a 0.5. (The value is "365932053.4999999857944710")

get_complement("nan", 100): this is a valid decimal.Decimal but not allowed by the spec.

get_complement("1/0", 100): handle division-by-zero in fractions.Fraction

get_complement("0.", 100): this tests that the string complement is "1." or "1.0" and not "1"

get_complement("0.999999999999999", 100): this tests the complement is "0.000000000000001" and not "1E-15".

get_complement("0.5E0", 100): test that decimal parsing isn't simply done by decimal.Decimal(size) wrapped in an exception handler.

Also, this isn't the full spec. The real code reports parse errors (like recognizing the "1/" is an incomplete fraction) and if the value is out of range it uses the range boundary (so "-0.4" for input is treated as "0.0" and the complement is "1.0"), along with an error flag so the GUI can display the error message appropriately.

replies(1): >>42145682 #
1. acrooks ◴[] No.42145682{4}[source]
I suspect certain domains have higher performance than others. My normal use cases involve API calls, database calls, data transformation and AI fairly consistently does what I want. But in that space there are very repeatable patterns.

Also with your example above I probably would break the function down into smaller parts, for two reasons 1) you can more easily unit test the components; 2) generally I find AI performs better with more focused problems.

So I would probably first write a signature like this:

  # input examples = "1/2" "100" "0.6" "0.99999" "0.5E0" "nan"
  def string_ratio_to_decimal(text: str) -> number
Pasting that into Claude, without any other context, produces this result: https://claude.site/artifacts/58f1af0e-fe5b-4e72-89ba-aeebad...
replies(1): >>42146546 #
2. eesmith ◴[] No.42146546[source]
> I probably would break the function down into smaller parts

Sure. Internally I have multiple functions. Though I don't like unit testing below the public API as it inhibits refactoring and gives false coverage feedback, so all my tests go through the main API.

> Pasting that into Claude, without any other context

The context is the important part. Like the context which says "0.5E0" and "nan" are specifically not supported, and how the calculations need to use decimal arithmetic, not IEEE 754 float64.

Also, the hard part is generating the complement with correct formatting, not parsing float-or-fraction, which is first-year CS assignment.

> # Handle special values

Python and C accept "Infinity" as an alternative to "Inf". The correct way is to defer to the underlying system then check if the returned value is infinite or a NaN. Which is what will happen here because when those string checks fail, and the check for "/" fails, it will correctly process through float().

Yes, this section isn't needed.

> # Handle empty string

My spec says the empty string is not an error.

> numerator, denominator = text.split("/"); num = float(numerator); den = float(denominator)

This allows "1.2/3.4" and "inf/nan", which were not in the input examples and therefore support for them should be interpreted as accidental scope creep.

They were also not part of the test suite, which means the tests cannot distinguish between these two clearly different implementations:

  num = float(numerator)
  den = float(denominator)
and:

  num = int(numerator)
  den = int(denominator)
Here's a version which follows the same style as the linked-to code, but is easier to understand:

    if not isinstance(text, str):
        return None
    
    # Remove whitespace
    text = text.strip()
    
    # Handle empty string
    if not text:
        return None

    # Handle ratio format (e.g., "1/2")
    if "/" in text:
        try:
            numerator, denominator = text.split("/")
            num = int(numerator)
            den = int(denominator)
            if den == 0:
                return float("inf") if num > 0 else float("-inf") if num < 0 else float("nan")
            return num / den
        except ValueError:
            return None

    # Handle regular numbers (inf, nan, scientific notation, etc.)
    try:
        return float(text)
    except ValueError:
        return None
It still doesn't come anywhere near handling the actual problem spec I gave.