I would love to know how multiplication and division work in modern chips to have such low cycle count compared to addition, since in theory the addition complexity is linear in the amount of bits but multiplication and division are quadratic, or loglinear for large inputs. Part of that is solved by surface area rather than time I guess, but that's also true for the adders already with the carry logic
replies(2):