←back to thread

317 points laserduck | 1 comments | | HN request time: 0.216s | source
Show context
mikewarot ◴[] No.42165008[source]
I agree LLMs aren't ready to design ASICs. It's likely that in a decade or less, they'll be ready for the times you absolutely need to squeeze out every square nanometer, picosecond, femtojoule, or nanowatt.

Gary Tan's was right[1] in that there is a fundamental inefficiency inherent in the von Neumann architecture we're all using. This gross impedance mismatch[4] is a great opportunity for innovation.

Once ENIAC was "improved" from its original structure to a general purpose compute device in the von Neumann style, it suffered a 83% loss in performance[2] Everything since is 80 years of premature optimization that we need to unwind. It's the ultimate pile of technical debt.

Instead of throwing maximum effort into making specific workloads faster, why not build a chip that can make all workloads faster instead, and let economy of scale work for everyone?

I propose (and have for a while[3]) a general purpose solution.

A systolic array of simple 4 bits in, 4 bits out, Look Up Tables (LUTs) latched so that timing issues are eliminated, could greatly accelerate computation, in a far nearer timeframe.

The challenges are that it's a greenfield environment, with no compilers (though it's probable that LLVM could target it), and a bus number of 1.

[1] https://www.ycombinator.com/rfs-build#llms-for-chip-design

[2] https://en.wikipedia.org/wiki/ENIAC#Improvements

[3] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

[4] https://en.wikipedia.org/wiki/Impedance_matching

replies(1): >>42187533 #
1. therealcamino ◴[] No.42187533[source]
I find it hard to imagine how you'd implement various simple functions in the bitgrid. It would be interesting if you'd present some simple hand-worked examples.

For example, how it would implement a 1-bit full adder? Like the nitty-gritty details: which input on which cell represents input A, which represents input B, and which represents carry-in? Which output is sum and which is carry-out? What are the functions programmed into each node that it uses?

Then show how to build a 2-bit adder from there.