The VAE Used for Stable Diffusion Is Flawed

(old.reddit.com)

268 points prashp | 1 comments | 01 Feb 24 12:25 UTC | HN request time: 0.218s | source

Show context

dawnofdusk ◴[01 Feb 24 14:33 UTC] No.39216343[source]▶

This is one of the cool things about various neural network architectures that I've found in my own work: you can make a lot of dumb mistakes in coding certain aspects but because the model has so many degrees of freedom it can actually "learn away" your mistakes.

replies(1): >>39218043 #

CamperBob2 ◴[01 Feb 24 16:52 UTC] No.39218043[source]▶

>>39216343 #

It's also one of the scariest things about NNs. Traditionally, if you had a bug that was causing serious performance or quality issues, it was a safe bet that you'd eventually discover it and fix it. It would fail one test or another, crash the program, or otherwise come up short against the expectations you'd have for a working implementation. Now it's almost impossible to know if what you've implemented is really performing at its best.

IMO the ability for a NN to compensate for bugs and unfounded assumptions in the model isn't a Good Thing in the slightest. Building latent-space diagnostics that can determine whether a network is wasting time working around bugs sounds like a worthwhile research topic in itself (and probably already is.)

replies(3): >>39218802 #>>39222080 #>>39227182 #

1. nullc ◴[02 Feb 24 10:34 UTC] No.39227182[source]▶

>>39218043 #

It's a common problem for network protocols, IO subsystems, etc. and really even any software with error handling.

It's been a few years since I worked on any program using boost asio, but at least back then if you straced it you'd find it constantly attempting to malloc hundreds of TB of ram, failing harmlessly, then continuing on with its life. (bet that will be fun when someone tries to run it on a system that supports that much virtual address space)

Similarly anything with any kind of feedback correction. PID controllers, codecs that code residuals-- you can get things horribly wrong and the later steps will paper it over.

Taking a step back you can even say that common software development practices-- a kind of meta program-- have the issue: A drunk squirrel sends you a patch full of errors, your test suite flags some which you fix. Then you ship all the bugs you didn't catch, because the test suite caused you to fix some issues but didn't change the fact that you were accepting code from a dubious source.

So I would say that the ML world is only special in that they exist almost entirely of self-correcting mechanisms and that inconsistent performance is broadly expected to a vastly greater degree, so when errors leak through you still may not react. If a calculator app told you that 2+2=5 you'd immediately be sure that something is actually broken, while if some LLM does it, it could just be an expected limitation (or even just sampling bad luck).

↑