A Superscalar Out-of-Order x86 Soft Processor for FPGA (2017)

1. Skunkleton ◴[07 Mar 19 22:31 UTC] No.19333243[source]▶

>>19329083 (OP) #

I am by no means an expert in digital design (I have only worked with them as a SWE), but it seems to me that the use cases for a high performance soft processor are pretty few and far between. After all, if you want a fast processor you can get a hard processor with excellent performance/support for less than the FPGA fabric likely cost.

Still a cool piece of tech though.

replies(4): >>19333342 #>>19333803 #>>19334865 #>>19336752 #

2. monocasa ◴[07 Mar 19 22:41 UTC] No.19333342[source]▶

>>19333243 (TP) #

Section 1.1 makes a pretty good argument in favor.

replies(1): >>19334533 #

3. Skunkleton ◴[08 Mar 19 02:05 UTC] No.19334533[source]▶

>>19333342 #

What I am trying to say is that FPGA fabric is expensive, and at some point if you use too much, then it will push you to a new part. Most of the time I would guess it to be cheaper to have a separate processor for more intensive tasks. Of course if you have some IP that is partially implemented in a soft core, then it might not be practical to offload it to an external cpu.

replies(2): >>19334613 #>>19334747 #

4. FullyFunctional ◴[08 Mar 19 02:27 UTC] No.19334613{3}[source]▶

>>19334533 #

Actually, I think FPGAs are amazingly cheap compared to what it would cost you to fab a chip of similar specs. A leading-edge FPGA will get you near-leading edge DSPs, memory blocks, SerDes, etc. What you _don't_ get is the frequency of a custom part for the full design (this is the FPGA overhead).

Simple RISC softcores run at 200-400 MHz in modern parts. Couple this with custom softcore accelerators and you can get very performant designs. The biggest issue with deploying FPGAs however is that it's much harder to design for, especially dealing with hard blocks, like memory controllers and PCIe.

replies(2): >>19334729 #>>19334746 #

5. andrewf ◴[08 Mar 19 02:54 UTC] No.19334729{4}[source]▶

>>19334613 #

Are there detailed writeups of people building these things into their systems? (Something like https://blog.cloudflare.com/building-fast-interpreters-in-ru... but for FPGAs instead of Rust).

6. bogomipz ◴[08 Mar 19 03:01 UTC] No.19334746{4}[source]▶

>>19334613 #

>"Couple this with custom softcore accelerators and you can get very performant designs."

Can you elaborate on what these softcore accelerators are or how they work? Might you have any links?

replies(2): >>19335239 #>>19340033 #

7. 0815test ◴[08 Mar 19 03:01 UTC] No.19334747{3}[source]▶

>>19334533 #

This is not for commodity tasks, though. Think reimplementing an old x86 system on FPGA and using it to keep old software (games, utilities, cool demos) properly functioning, even for things that might not work properly with simple software emulation. With Intel now planning to abandon support for the BIOS and for 16-bit and 32-bit system boot in new x86 architectures, an independent reimplementation is something we should have.

8. gh02t ◴[08 Mar 19 03:30 UTC] No.19334865[source]▶

>>19333243 (TP) #

The main use for soft processors is for hybrid designs. Stuff that needs some significant programmable logic for really performance or timing sensitive applications, but where other functionality is better implemented in an easier to program CPU. If you're gonna have to use an FPGA anyway, it is frequently easier/cheaper to just implement a soft core processor versus adding a separate discrete processor (which is more involved than just adding a single chip, you need all the supporting circuitry, interconnects, routing on the board etc).

The other use case is sorta the same thing, but is as a normal CPU with a few custom extensions. Sometimes no manufacturer's product fits your needs well and ASICs are expensive (also difficult to change), so some companies just ship customized CPUs on FPGAs with whatever extensions they need.

Xilinx's Zynq chips (FPGA with an ARM core) have been very successful, which kinda demonstrates that this is an attractive combination.

replies(1): >>19335503 #

9. FullyFunctional ◴[08 Mar 19 05:03 UTC] No.19335239{5}[source]▶

>>19334746 #

I don't have good references handy where I am right now, but Hotchips a couple of years ago had many examples, including a Memcache accelerator.

A more down-to-earth example shipped with an Arrow FPGA dev kit I got: they took a software and ported MPG123 (mp3 decoding). They then profiled it and isolated a candidate for acceleration (some moderately wide integer operation). The result saved meaningful amount of CPU cycles and power. (The FPGA board was battery driven, which is still unusual).

10. wtallis ◴[08 Mar 19 06:12 UTC] No.19335503[source]▶

>>19334865 #

Time to market is also sometimes a factor; putting a soft processor onto the unused parts of an FPGA is far easier than bringing up a SoC combining CPU cores with special-purpose compute or IO.

The high-end SSD market has had a lot of FPGA-based products for years, and recently many of them are using any leftover gates to add user-accessible CPUs (or occasionally ML-focused compute resources). It turns out that there are quite a few uses for having a CPU extremely close to your massive pile of data, rather than having a relatively narrow PCIe link between the storage and the CPU. These SSD controllers are usually forced to use pretty large FPGAs in order to have a high enough pin count to manage several TB of flash, and it seems that they often have logic elements to spare.

11. boomlinde ◴[08 Mar 19 11:48 UTC] No.19336752[source]▶

>>19333243 (TP) #

Suppose that my board already has an FPGA, and that the choice between a soft CPU and an additional chip hinges on performance. You don't want a fast CPU for no particular reason. Rather, you want a CPU that is fast enough to perform its intended tasks.

12. monocasa ◴[08 Mar 19 17:49 UTC] No.19340033{5}[source]▶

>>19334746 #

Most decent sized designs, hard or soft, have little processors embedded in them in addition to their core logic. It's an interesting space/time/ease of development trade-off between a special little processor, and doing the same work in just logic.

GPUs are pretty well documented (relatively speaking), so they make a good case study. Generally special processors will handle FIFO pulling (so the part that reads the command lists), DMA engines, power management, video codecs, DRM key management, and some other miscellaneous pieces (like run this code on the GPU' on certain interrupts, instead of interrupting the CPU). And that's all in addition to the shader cores you normally think of as 'the GPU'.

In the past, I've used simple processor cores in FPGAs for motor control.