Here's one I wrote the other day which took a long time to get right. I'm curious on how well your AI can do, since I can't imagine it does a good job at it.
  # Given a data set of size `size' >= 0, and a `text` string describing
  # the subset size, return a 2-element tuple containing a text string
  # describing the complement size and the actual size as an integer. The
  # text string can be in one of four forms (after stripping leading and
  # trailing whitespace):
  #
  #  1) the empty string, in which case return ("", 0)
  #  2) a stringified integer, like "123", where 0 <= n <= size, in
  #   which case return (str(size-int(n)), size-int(n))
  #  3) a stringified decimal value like "0.25" where 0 <= x <= 1.0, in
  #   which case compute the complement string as str(1 - x) and
  #   the complement size as size - (int(x * size)). Exponential
  #   notation is not supported, only numbers like "3.0", ".4", and "3.14"
  #  4) a stringified fraction value like "1/3", where 0 <= x <= 1,
  #   in which case compute the complement string and value as #3
  #   but using a fraction instead of a decimal. Note that "1/2" of
  #   51 must return ("1/2", 26), not ("1/2", 25).
  #
  # Otherwise, return ("error", -1)
  def get_complement(text: str, size: int) -> tuple[str, int]:
    ...
For examples:
  get_complement("1/2", 100) == ("1/2", 50)
  get_complement("0.6", 100) == ("0.4", 40)
  get_complement("100", 100) == ("0", 0)
  get_complement("0/1", 100) == ("1/1", 100)
Some of the harder test cases I came up were:
get_complement("0.8158557553804697", 448_525_430): this tests the underlying system uses decimal.Decimal rather than a float, because float64 ends up on a 0.5 boundary and applies round-half-even resulting in a different value than the true decimal calculation, which does not end up with a 0.5. (The value is "365932053.4999999857944710")
get_complement("nan", 100): this is a valid decimal.Decimal but not allowed by the spec.
get_complement("1/0", 100): handle division-by-zero in fractions.Fraction
get_complement("0.", 100): this tests that the string complement is "1." or "1.0" and not "1"
get_complement("0.999999999999999", 100): this tests the complement is "0.000000000000001" and not "1E-15".
get_complement("0.5E0", 100): test that decimal parsing isn't simply done by decimal.Decimal(size) wrapped in an exception handler.
Also, this isn't the full spec. The real code reports parse errors (like recognizing the "1/" is an incomplete fraction) and if the value is out of range it uses the range boundary (so "-0.4" for input is treated as "0.0" and the complement is "1.0"), along with an error flag so the GUI can display the error message appropriately.