I hope they're able to get this ISA-level feedback to people at RVI
replies(2):
We figured out yesterday [1], that the example in the article can already be done in four risc-v instructions, it's just a bit trickier to come up with it:
# a0 = rax, a1 = rbx
slli t0, a1, 64-8
rori a0, a0, 16
add a0, a0, t0
rori a0, a0, 64-16
[1] https://www.reddit.com/r/RISCV/comments/1f1mnxf/box64_and_ri...Flag computation and conditional jumps is where the big optimization opportunities lie. Box64 uses a multi-pass decoder that computes liveness information for flags and then computes flags one by one. QEMU instead tries to store the original operands and computes flags lazily. Both approaches have advantages and disadvantages...