←back to thread

Stochastic computing

(scottlocklin.wordpress.com)
52 points emmelaich | 1 comments | | HN request time: 0.953s | source
Show context
mikewarot ◴[] No.45897988[source]
The key thing I would watch out for with real stochastic computing hardware is crosstalk[1], the inevitable coupling between channels that is bound to happen at some level. Getting hundreds or thousands (or millions?) of independent noise sources to avoid correlation is going to be one of the largest challenges in the process. For a small number of channels, it should be managable, but with LLM size problems, I think it's a deal killer.

[1] https://en.wikipedia.org/wiki/Crosstalk

replies(2): >>45899564 #>>45902281 #
1. observationist ◴[] No.45902281[source]
https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

You can precisely engineer arbitrary numbers of channels, design sampling methods to raise your data integrity to whatever your desired error rate is, and so on. This gives you an accuracy/efficiency tradeoff dial, which can be useful - you can choose to spend more time or energy for higher fidelity where the cost justifies it.

Feedback and crosstalk creating chaotic relationships, unintended synchronization, and other effects are non-trivial, however.

Neural networks are non-dimensional or unordered sets, meaning you can arbitrarily order the neurons in a layer so long as you maintain the links to the connected layers. If you permute the structure of a network to reorder neurons in a layer by some feature, the function of the network remains identical to the original, but you can highlight a particular function or feature of the layer, with the constellation of coordinates representing the particular configuration of synapse ordering and weight vectors. You can cycle through all possible configurations of orderings, and those represent possible states of a trained network. When trying to work with stochastic optimizations for neural networks, you're playing around in this same space - they're effectively a combinatorial minefield.

If you design a processing regime to sample a particular subset of possible configurations, it might be possible to exploit a traversal of random orderings associated with amplitude of signals where they correlate and coincide with useful computation - selecting and ordering a set of addresses whose function approximates the desired value.

I see some possibilities and interesting spaces to explore with these systems, but they're going to need some heavy duty number theorists just to eke out a set of useful primitives, and it's unclear to me that it can ever be generalized. You might be able to carefully handcraft an implementation for something like ChatGPT 5, for example, but I don't see how you could simply update it, finetune it, or otherwise. You'd have to put in just as much effort to implement any other model, and any sort of dynamic online learning or training seems to hit a combinatorial explosion right out of the gate.