We have already seen huge efficiency increases over the last two years. Small models have become increasingly capable, the minimum viable model size for simple tasks keeps shrinking, and proprietary model providers have long stopped talking about new milestones in model sizes and instead achieved massive price cuts through methods they largely keep quiet about (but that almost certainly include smaller models and intelligent routing to different model sizes)
But so far this has just lead to more induced demand. There are a lot of things we would use LLMs for if it was just cheap enough, and every increase in efficiency makes more of those use cases viable