Box64 and RISC-V in 2024: What It Takes to Run the Witcher 3 on RISC-V

(box86.org)

366 points pabs3 | 1 comments | 27 Aug 24 04:23 UTC | HN request time: 0.292s | source

Show context

jokoon ◴[27 Aug 24 13:39 UTC] No.41367403[source]▶

Question for somebody who doesn't work in chips: what does a software engineer has to do differently when targeting software for RISC5?

I would imagine that executable size increases, meaning it has to be aggressively optimized for cache locality?

I would imagine that some types of softwares are better suited for either CISC or RISC, like games, webservers?

replies(3): >>41367499 #>>41368262 #>>41370208 #

dzaima ◴[27 Aug 24 13:50 UTC] No.41367499[source]▶

>>41367403 #

RISC-V with the compressed instruction extension actually ends up smaller than x86-64 and ARM on average.

There's not much inherent that needs to change in software approach. Probably the biggest thing vs x86-64 is the availability of 32 registers (vs 16 on x86-64), allowing for more intermediate values before things start spilling to stack, which also applies to ARM (which too has 32 registers). But generally it doesn't matter unless you're micro-optimizing.

More micro-optimization things might include:

- The vector extension (aka V or RVV) isn't in the base rv64gc ISA, so you might not get SIMD optimizations depending on the target; whereas x86-64 and aarch64 have SSE2 and NEON (128-bit SIMD) in their base.

- Similarly, no popcount & count leading/trailing zeroes in base rv64gc (requires Zbb); base x86-64 doesn't have popcount, but does have clz/ctz. aarch64 has all.

- Less efficient branchless select, i.e. "a ? b : c"; takes ~4-5 instrs on base rv64gc, 3 with Zicond, but 1 on x86-64 and aarch64. Some hardware can also fuse a jump over a mv instruction to be effectively branchless, but that's even more target-specific.

RISC-V profiles kind of solve the first two issues (e.g. Android requires rva23, which requires rvv & Zbb & Zicond among other things) but if linux distros decide to target rva20/rv64gc then they're ~forever stuck without having those extensions in precompiled code that hasn't bothered with dynamic dispatch. Though this is a problem with x86-64 too (much less so with ARM as it doesn't have that many extensions; SVE is probably the biggest thing by far, and still not supported widely (i.e. Apple silicon doesn't)).

replies(1): >>41367683 #

packetlost ◴[27 Aug 24 14:07 UTC] No.41367683[source]▶

>>41367499 #

That seems like something the compiler would generally handle, no? Obviously that doesn't apply everywhere, but in the general case it should.

replies(2): >>41367742 #>>41368271 #

vlovich123 ◴[27 Aug 24 14:57 UTC] No.41368271[source]▶

>>41367683 #

Vector stuff is typically hand coded with intrinsics or assembly. Autovectorization has mixed results because there’s no way to request the compiler to promise that it vectorized the code.

But for an emulator like this, box64 has to pick how to emulate vectorized instructions on RiscV (eg slowly using scalars or trying to reimplement using native vector instructions). The challenge of course is that typically you don’t get as good a performance unless the emulator can actually rewrite the code on the fly because a 1:1 mapping is going to be suboptimal vs noticing patterns of high level operations being performed and providing a more optimized implementation that replaces an alternate chunk of instructions at once instead to account for implementation differences on the chip (eg you may have to emulate missing instructions but a rewriter could skip emulation if there’s an alternate way to accomplish the same high level computation)

The biggest challenge for something like this from a performance perspective of course will be translating the GPU stuff efficiently to hit the native driver code and that Riscv likely is relying on OSS GPU drivers (and maybe wine to add another translation layer if the game is windows only )

replies(4): >>41368434 #>>41368685 #>>41368702 #>>41370359 #

1. tormeh ◴[27 Aug 24 15:33 UTC] No.41368702[source]▶

>>41368271 #

I'd assume it uses RADV, same as the Steam Deck. For most workloads that's faster than AMD's own driver. And yes, it uses Wine and DXVK. As dar as the game is concerned it's running on a DirectX-capable x86 Windows machine. That's a lot of translation layers.

↑