I think, strictly speaking, autoregressive LLMs are markov chains of a very high order.
The trick (aside from the order) is the training process by which they are derived from their source data. Simply enumerating the states and transitions in the source data and the probability of each transition from each state in the source doesn’t get you an LLM.
replies(2):