Most active commenters

Popular/hot comments

>>19333243 #

A Superscalar Out-of-Order x86 Soft Processor for FPGA (2017)

(tspace.library.utoronto.ca)

1. hak8or ◴[07 Mar 19 22:29 UTC] No.19333225[source]▶

>>19329083 (OP) #

How is this legally handled considering amd owns x86-64 and Intel owns x86?

As long as it's not used in commercial settings, will they pretend not to see it while users are in a legally gray area?

replies(1): >>19333272 #

2. Skunkleton ◴[07 Mar 19 22:31 UTC] No.19333243[source]▶

>>19329083 (OP) #

I am by no means an expert in digital design (I have only worked with them as a SWE), but it seems to me that the use cases for a high performance soft processor are pretty few and far between. After all, if you want a fast processor you can get a hard processor with excellent performance/support for less than the FPGA fabric likely cost.

Still a cool piece of tech though.

replies(4): >>19333342 #>>19333803 #>>19334865 #>>19336752 #

3. wk_end ◴[07 Mar 19 22:33 UTC] No.19333272[source]▶

>>19333225 #

This is x86 only, and a quick skim of the dissertation suggests that it doesn't implement any post-P6 (Pentium Pro) instructions. The P6 is 24 years old now so presumably all patents are expired?

replies(1): >>19334759 #

4. monocasa ◴[07 Mar 19 22:41 UTC] No.19333342[source]▶

>>19333243 #

Section 1.1 makes a pretty good argument in favor.

replies(1): >>19334533 #

5. basementcat ◴[07 Mar 19 23:24 UTC] No.19333690[source]▶

>>19329083 (OP) #

From section 6.8 (p. 69) "Note that the decoder’s microarchitecture design is complete, including the branch predictor design and micro-op sequences for nearly every x86 instruction and behaviour. Our circuit implementation is less complete than our microarchitecture design (implemented as a detailed pipeline simulation)"

Still an impressive work.

6. Skunkleton ◴[08 Mar 19 02:05 UTC] No.19334533{3}[source]▶

>>19333342 #

What I am trying to say is that FPGA fabric is expensive, and at some point if you use too much, then it will push you to a new part. Most of the time I would guess it to be cheaper to have a separate processor for more intensive tasks. Of course if you have some IP that is partially implemented in a soft core, then it might not be practical to offload it to an external cpu.

replies(2): >>19334613 #>>19334747 #

7. FullyFunctional ◴[08 Mar 19 02:27 UTC] No.19334613{4}[source]▶

>>19334533 #

Actually, I think FPGAs are amazingly cheap compared to what it would cost you to fab a chip of similar specs. A leading-edge FPGA will get you near-leading edge DSPs, memory blocks, SerDes, etc. What you _don't_ get is the frequency of a custom part for the full design (this is the FPGA overhead).

Simple RISC softcores run at 200-400 MHz in modern parts. Couple this with custom softcore accelerators and you can get very performant designs. The biggest issue with deploying FPGAs however is that it's much harder to design for, especially dealing with hard blocks, like memory controllers and PCIe.

replies(2): >>19334729 #>>19334746 #

8. andrewf ◴[08 Mar 19 02:54 UTC] No.19334729{5}[source]▶

>>19334613 #

Are there detailed writeups of people building these things into their systems? (Something like https://blog.cloudflare.com/building-fast-interpreters-in-ru... but for FPGAs instead of Rust).

9. bogomipz ◴[08 Mar 19 03:01 UTC] No.19334746{5}[source]▶

>>19334613 #

>"Couple this with custom softcore accelerators and you can get very performant designs."

Can you elaborate on what these softcore accelerators are or how they work? Might you have any links?

replies(2): >>19335239 #>>19340033 #

10. 0815test ◴[08 Mar 19 03:01 UTC] No.19334747{4}[source]▶

>>19334533 #

This is not for commodity tasks, though. Think reimplementing an old x86 system on FPGA and using it to keep old software (games, utilities, cool demos) properly functioning, even for things that might not work properly with simple software emulation. With Intel now planning to abandon support for the BIOS and for 16-bit and 32-bit system boot in new x86 architectures, an independent reimplementation is something we should have.

11. userbinator ◴[08 Mar 19 03:02 UTC] No.19334750[source]▶

>>19329083 (OP) #

Our microarchitecture achieves 2.7 times the per-clock performance of a performance-tuned Nios II/f, Alteraâ s fastest (RISC-like, single-issue, pipelined) soft processor, and 0.8 times the frequency, for a total performance improvement of 2.2 times.

It'd be very interesting to compare this to RISC-V.

replies(2): >>19334809 #>>19335267 #

12. userbinator ◴[08 Mar 19 03:04 UTC] No.19334759{3}[source]▶

>>19333272 #

The last time I looked, the MMX patents were close to expiry too, and by now they might be.

replies(1): >>19335466 #

13. pcwalton ◴[08 Mar 19 03:15 UTC] No.19334809[source]▶

>>19334750 #

Note that this the Nios II/f is an in-order CPU, while this is a superscalar CPU. A more relevant benchmark would be the superscalar dual-issue ARM Cortex-A9, illustrated in figure 13.4. It's about the same performance as that one if you average all the benchmarks.

In theory, RISC-V should be at about the same performance as ARMv8 (note that Cortex-A9 is ARMv7): https://news.ycombinator.com/item?id=15343287

replies(1): >>19335276 #

14. gh02t ◴[08 Mar 19 03:30 UTC] No.19334865[source]▶

>>19333243 #

The main use for soft processors is for hybrid designs. Stuff that needs some significant programmable logic for really performance or timing sensitive applications, but where other functionality is better implemented in an easier to program CPU. If you're gonna have to use an FPGA anyway, it is frequently easier/cheaper to just implement a soft core processor versus adding a separate discrete processor (which is more involved than just adding a single chip, you need all the supporting circuitry, interconnects, routing on the board etc).

The other use case is sorta the same thing, but is as a normal CPU with a few custom extensions. Sometimes no manufacturer's product fits your needs well and ASICs are expensive (also difficult to change), so some companies just ship customized CPUs on FPGAs with whatever extensions they need.

Xilinx's Zynq chips (FPGA with an ARM core) have been very successful, which kinda demonstrates that this is an attractive combination.

replies(1): >>19335503 #

15. FullyFunctional ◴[08 Mar 19 05:03 UTC] No.19335239{6}[source]▶

>>19334746 #

I don't have good references handy where I am right now, but Hotchips a couple of years ago had many examples, including a Memcache accelerator.

A more down-to-earth example shipped with an Arrow FPGA dev kit I got: they took a software and ported MPG123 (mp3 decoding). They then profiled it and isolated a candidate for acceleration (some moderately wide integer operation). The result saved meaningful amount of CPU cycles and power. (The FPGA board was battery driven, which is still unusual).

16. snvzz ◴[08 Mar 19 05:10 UTC] No.19335267[source]▶

>>19334750 #

>It'd be very interesting to compare this to RISC-V.

BOOM was close to ivy bridge performance in 2016.

17. snvzz ◴[08 Mar 19 05:11 UTC] No.19335276{3}[source]▶

>>19334809 #

>In theory, RISC-V should be at about the same performance as ARMv8

Or POWER9, the cpu with the fastest IPC. And it's RISC.

There's nothing to prevent RISC-V ISA from getting high performance implementations.

18. Const-me ◴[08 Mar 19 05:59 UTC] No.19335466{4}[source]▶

>>19334759 #

MMX is from P5.

P6 supports MMX and later revisions SSE 1.

The next one after P6 is NetBurst, it introduced SSE2 and later revisions SSE3.

19. wtallis ◴[08 Mar 19 06:12 UTC] No.19335503{3}[source]▶

>>19334865 #

Time to market is also sometimes a factor; putting a soft processor onto the unused parts of an FPGA is far easier than bringing up a SoC combining CPU cores with special-purpose compute or IO.

The high-end SSD market has had a lot of FPGA-based products for years, and recently many of them are using any leftover gates to add user-accessible CPUs (or occasionally ML-focused compute resources). It turns out that there are quite a few uses for having a CPU extremely close to your massive pile of data, rather than having a relatively narrow PCIe link between the storage and the CPU. These SSD controllers are usually forced to use pretty large FPGAs in order to have a high enough pin count to manage several TB of flash, and it seems that they often have logic elements to spare.

20. boomlinde ◴[08 Mar 19 11:48 UTC] No.19336752[source]▶

>>19333243 #

Suppose that my board already has an FPGA, and that the choice between a soft CPU and an additional chip hinges on performance. You don't want a fast CPU for no particular reason. Rather, you want a CPU that is fast enough to perform its intended tasks.

21. monocasa ◴[08 Mar 19 17:49 UTC] No.19340033{6}[source]▶

>>19334746 #

Most decent sized designs, hard or soft, have little processors embedded in them in addition to their core logic. It's an interesting space/time/ease of development trade-off between a special little processor, and doing the same work in just logic.

GPUs are pretty well documented (relatively speaking), so they make a good case study. Generally special processors will handle FIFO pulling (so the part that reads the command lists), DMA engines, power management, video codecs, DRM key management, and some other miscellaneous pieces (like run this code on the GPU' on certain interrupts, instead of interrupting the CPU). And that's all in addition to the shader cores you normally think of as 'the GPU'.

In the past, I've used simple processor cores in FPGAs for motor control.

↑