You may be right, but there is another hypothesis that would need to be rejected: at question is whether LLMs "do" language the same way we do. For certain they learn language much differently, with orders of magnitude more input data. It could be that they just string sentence fragments together, whereas (by hypothesis) we construct sentences hierarchically. The internal representation of semantics might also be different, more compositional in humans.
If I had time and free use of an LLM, I'd like to investigate how well it understands constructional synonymy, like "the red car" and "the car that is red" and "John saw a car on the street yesterday. It was red." I guess models that can draw pictures can be used to test this sort of thing--surely someone has looked into this?