(discrete-distribution-networks.github.io)

392 points diyer22 | 1 comments | 10 Oct 25 09:01 UTC | HN request time: 0.373s | source

I invented Discrete Distribution Networks, a novel generative model with simple principles and unique properties, and the paper has been accepted to ICLR2025!

Modeling data distribution is challenging; DDN adopts a simple yet fundamentally different approach compared to mainstream generative models (Diffusion, GAN, VAE, autoregressive model):

1. The model generates multiple outputs simultaneously in a single forward pass, rather than just one output. 2. It uses these multiple outputs to approximate the target distribution of the training data. 3. These outputs together represent a discrete distribution. This is why we named it "Discrete Distribution Networks".

Every generative model has its unique properties, and DDN is no exception. Here, we highlight three characteristics of DDN:

- Zero-Shot Conditional Generation (ZSCG). - One-dimensional discrete latent representation organized in a tree structure. - Fully end-to-end differentiable.

Reviews from ICLR:

> I find the method novel and elegant. The novelty is very strong, and this should not be overlooked. This is a whole new method, very different from any of the existing generative models. > This is a very good paper that can open a door to new directions in generative modeling.

Show context

f_devd ◴[10 Oct 25 11:13 UTC] No.45537537[source]▶

>>45536694 (OP) #

Pretty interesting architecture, seems very easy to debug, but as a downside you effectively discard K-1 computations at each layer since it's using a sampler rather than a MoE-style router.

The best way I can summarize it is a Mixture-of-Experts combined with an 'x0-target' latent diffusion model. The main innovation is the guided sampler (rather than router) & split-and-prune optimizer; making it easier to train.

replies(2): >>45537638 #>>45540788 #

yorwba ◴[10 Oct 25 11:27 UTC] No.45537638[source]▶

>>45537537 #

Since the sampling probability is 1/K independent of the input, you don't need to compute K different intermediate outputs at each layer during inference, you can instead decide ahead of time which of the outputs you want to use and only compute that one.

(This is mentioned in Q1 in the "Common Questions About DDN" section at the bottom.)

replies(2): >>45539642 #>>45541272 #

1. kevmo314 ◴[10 Oct 25 14:42 UTC] No.45539642[source]▶

>>45537638 #

This is a very clever insight, nice work!

↑

Show HN: I invented a new generative model and got accepted to ICLR