This sounds like something more up the alley of linear genetic programming. There are some very interesting experiments out there that utilize UTMs (BrainFuck, Forth, et. al.) [0,1,2].
I've personally had some mild success getting these UTM variants to output their own children in a meta programming arrangement. The base program only has access to the valid instruction set of ~12 instructions per byte, while the task program has access to the full range of instructions and data per byte (256). By only training the base program, we reduce the search space by a very substantial factor. I think this would be similar to the idea of a self-hosted compiler, etc. I don't think there would be too much of a stretch to give it access to x86 instructions and a full VM once a certain amount of bootstrapping has been achieved.
[0]: https://arxiv.org/abs/2406.19108
[1]: https://github.com/kurtjd/brainfuck-evolved
[2]: https://news.ycombinator.com/item?id=36120286