Show HN: Web Audio Spring-Mass Synthesis

(blog.cochlea.xyz)

77 points cochlear | 2 comments | 14 Mar 25 21:27 UTC | HN request time: 0.42s | source

Hi, I'm the author of this little Web Audio toy which does physical modeling synthesis using a simple spring-mass system.

My current area of research is in sparse, event-based encodings of musical audio (https://blog.cochlea.xyz/sparse-interpretable-audio-codec-pa...). I'm very interested in decomposing audio signals into a description of the "system" (e.g., room, instrument, vocal tract, etc.) and a sparse "control signal" which describes how and when energy is injected into that system. This toy was a great way to start learning about physical modeling synthesis, which seems to be the next stop in my research journey. I was also pleasantly surprised at what's possible these days writing custom Audio Worklets!

Show context

xavriley ◴[15 Mar 25 07:29 UTC] No.43370713[source]▶

>>43367482 (OP) #

This is cool - there’s some similar work here https://arxiv.org/pdf/2402.01571 which uses spiking neural networks (essentially Dirac pulses). I think the next step for this would be to learn a tonal embedding of the source alongside the event embedding so that you don’t have to rely on physically modelled priors. There’s some interesting work on guitar amp tone modelling that’s doing this already https://zenodo.org/records/14877373

replies(1): >>43371388 #

1. cochlear ◴[15 Mar 25 09:58 UTC] No.43371388[source]▶

>>43370713 #

How funny, I actually corresponded with one of the authors of the "Spiking Music..." paper when it first showed up on arxiv. I'll definitely give the amp-modeling paper a read, looks to be right up my alley!

Now that I understand the basics of how this works, I'd like to use a (much) more efficient version of the simulation as an infinite-dataset generator and try to learn a neural operator, or NERF like model that, given a spring mesh configuration, a sparse control signal, and a time, can produce an approximation of the simulation in a parallel and sample-rate-independent manner. This also (maybe) opens the door to spatial audio, such that you could approximate sound-pressure levels at a particular point in time _and_ space. At this point, I'm just dreaming out-loud a bit.

replies(1): >>43372760 #

2. blovescoffee ◴[15 Mar 25 14:28 UTC] No.43372760[source]▶

>>43371388 (TP) #

This is possible but very very hard! Actually getting the model to converge on something that sounds reasonable will make you pull your hair out. It’s definitely a fun and worthwhile project though. I attempted something similar a few years ago. Good luck!

↑