Notes on the Pentium's microcode circuitry

1. Aardwolf ◴[01 Apr 25 09:16 UTC] No.43544547[source]▶

I would love to know how multiplication and division work in modern chips to have such low cycle count compared to addition, since in theory the addition complexity is linear in the amount of bits but multiplication and division are quadratic, or loglinear for large inputs. Part of that is solved by surface area rather than time I guess, but that's also true for the adders already with the carry logic

replies(2): >>43549294 #>>43549620 #

2. RiverCrochet ◴[01 Apr 25 17:19 UTC] No.43549294[source]▶

>>43544547 #

I remember reading somewhere--memory is hazy--that at least division uses a partial look up table, kinda like how you'd do it in 6502 assembly back in the day. E.g., if you have to multiply something by 5, and you can get the range of inputs down to something reasonable, then you can just have a table of x*5 for that range and just look it up.

Also I'm not sure multiplication/division are quadratic if your algorithm is not "add X to itself Y times." Look at this for 6502 16-bit multiply - https://www.llx.com/Neil/a2/mult.html - it's dependent on the bit width, not the value of the multiplier/cand. Of course this is for integers, not floating point.

3. kens ◴[01 Apr 25 17:48 UTC] No.43549620[source]▶

>>43544547 #

I'm working on the multiplication circuit in the Pentium; I've done a partial writeup: https://www.righto.com/2025/03/pentium-multiplier-adder-reve... The short answer is that multiplication uses a large tree of adders so it can add up all the long-division terms at once. It also uses base-8 for the multiplier to reduce the number of terms. The adders are 4:2 carry-save compressors that take four numbers as inputs and produce two numbers as outputs.

I also wrote about the Pentium's division circuitry and the infamous FDIV bug: https://www.righto.com/2024/12/this-die-photo-of-pentium-sho... The short answer is that the Pentium used base-4 SRT division, similar to long division but generating two bits of result per cycle. It used a lookup table to determine the two quotient bits; an error in this table resulted in the bug.