Box64 and RISC-V in 2024: What It Takes to Run the Witcher 3 on RISC-V

Show context

justahuman74 ◴[27 Aug 24 05:31 UTC] No.41364800[source]▶

I hope they're able to get this ISA-level feedback to people at RVI

dmitrygr ◴[27 Aug 24 05:38 UTC] No.41364827[source]▶

None of this is new. None of it.

In fact, bitfield extract is such an obvious oversight that it is my favourite example of how idiotic the RISCV ISA is (#2 is lack of sane addressing modes).

Some of the better RISCV designs, in fact, implement a custom instr to do this, eg: BEXTM in Hazard3: https://github.com/Wren6991/Hazard3/blob/stable/doc/hazard3....

replies(2): >>41364944 #>>41366113 #

renox ◴[27 Aug 24 06:12 UTC] No.41364944[source]▶

>>41364827 #

Whoa, someone else who doesn't believe that the RISC-V ISA is 'perfect'! I'm curious: how the discussions on the bitfield extract have been going? Because it does really seem like an obvious oversight and something to add as a 'standard extension'.

What's your take on

1) unaligned 32bit instructions with the C extension?

2) lack of 'trap on overflow' for arithmetic instructions? MIPS had it..

replies(3): >>41364991 #>>41365621 #>>41367330 #

dmitrygr ◴[27 Aug 24 06:24 UTC] No.41364991[source]▶

>>41364944 #

1. aarch64 does this right. RISCV tries to be too many things at once, and predictably ends up sucking at everything. Fast big cores should just stick to fixed size instrs for faster decode. You always know where instrs start, and every cacheline has an integer number of instrs. microcontroler cores can use compressed intrs, since it matters there, while trying to parallel-codec instrs does not matter there. Trying to have one arch cover it all is idiotic.

2. nobody uses it on mips either, so it is likely of no use.

replies(3): >>41365378 #>>41365968 #>>41366469 #

loup-vaillant ◴[27 Aug 24 10:15 UTC] No.41365968[source]▶

>>41364991 #

> Fast big cores should just stick to fixed size instrs for faster decode.

How much faster, though? RISC-V decode is not crazy like x86, you only need to look at the first byte to know how long the instruction is (the first two bits if you limit yourself to 16 and 32-bit instructions, 5 bits if you support 48-bits instructions, 6 bits if you support 64-bits instructions). Which means, the serial part of the decoder is very very small.

The bigger complain about variable length instruction is potentially misaligned instructions, which does not play well with cache lines (a single instruction may start in a cache line and end at the next, making hardware a bit more hairy).

And there’s an advantage to compressed instructions even on big cores: less pressure on the instruction cache, and correspondingly fewer cache misses.

Thus, it’s not clear to me that fixed size instructions is the obvious way to go for big cores.

replies(3): >>41366051 #>>41366073 #>>41366598 #