←back to thread

366 points pabs3 | 2 comments | | HN request time: 0.439s | source
Show context
jokoon ◴[] No.41367403[source]
Question for somebody who doesn't work in chips: what does a software engineer has to do differently when targeting software for RISC5?

I would imagine that executable size increases, meaning it has to be aggressively optimized for cache locality?

I would imagine that some types of softwares are better suited for either CISC or RISC, like games, webservers?

replies(3): >>41367499 #>>41368262 #>>41370208 #
dzaima ◴[] No.41367499[source]
RISC-V with the compressed instruction extension actually ends up smaller than x86-64 and ARM on average.

There's not much inherent that needs to change in software approach. Probably the biggest thing vs x86-64 is the availability of 32 registers (vs 16 on x86-64), allowing for more intermediate values before things start spilling to stack, which also applies to ARM (which too has 32 registers). But generally it doesn't matter unless you're micro-optimizing.

More micro-optimization things might include:

- The vector extension (aka V or RVV) isn't in the base rv64gc ISA, so you might not get SIMD optimizations depending on the target; whereas x86-64 and aarch64 have SSE2 and NEON (128-bit SIMD) in their base.

- Similarly, no popcount & count leading/trailing zeroes in base rv64gc (requires Zbb); base x86-64 doesn't have popcount, but does have clz/ctz. aarch64 has all.

- Less efficient branchless select, i.e. "a ? b : c"; takes ~4-5 instrs on base rv64gc, 3 with Zicond, but 1 on x86-64 and aarch64. Some hardware can also fuse a jump over a mv instruction to be effectively branchless, but that's even more target-specific.

RISC-V profiles kind of solve the first two issues (e.g. Android requires rva23, which requires rvv & Zbb & Zicond among other things) but if linux distros decide to target rva20/rv64gc then they're ~forever stuck without having those extensions in precompiled code that hasn't bothered with dynamic dispatch. Though this is a problem with x86-64 too (much less so with ARM as it doesn't have that many extensions; SVE is probably the biggest thing by far, and still not supported widely (i.e. Apple silicon doesn't)).

replies(1): >>41367683 #
packetlost ◴[] No.41367683[source]
That seems like something the compiler would generally handle, no? Obviously that doesn't apply everywhere, but in the general case it should.
replies(2): >>41367742 #>>41368271 #
dzaima ◴[] No.41367742[source]
It's something that the compiler would handle, but can still moderately influence programming decisions, i.e. you can have a lot more temporary variables before things start slowing down due to spill stores/loads (esp. in, say, a loop with function calls, as more registers also means more non-volatile registers (i.e. those that are guaranteed to not change across function calls)). But, yes, very limited impact even then.
replies(1): >>41368021 #
packetlost ◴[] No.41368021[source]
It's certainly something I would take into consideration when making a (language) runtime, but probably not at all during all but the most performance sensitive of applications. Certainly a difference, but far lower level than what most applications require
replies(1): >>41368149 #
1. dzaima ◴[] No.41368149[source]
Yep. Unfortunately I am one to be making language runtimes :)

It's just the potentially most significant thing I could come up with at first. Though perhaps RVV not being in rva20/rv64gc is more significant.

replies(1): >>41368734 #
2. packetlost ◴[] No.41368734[source]
Looks like an APL project? That's really cool!