Is this only at the level of visual compression? For example, are there any applications in terms of understanding (being able to represent the actual meaning it stands for) and reasoning? Technically, it seems to have no connection with current reinforcement learning and other techniques. The model is quite small, yet there appears to be no explanation regarding its understanding capabilities. If it is merely for compression, what impact will it have on the current large models?