Makes intuitive sense for this form of introspection to emerge at higher capability levels.
GPT-2 write sentences; GPT-3 writes poetry. ChatGPT can chat. Claude 4.1 can introspect. Maybe by testing what capabilities models of certain size have - we could build a “ladder of conceptual complexity” for every concept ever :)