←back to thread

381 points diyer22 | 3 comments | | HN request time: 0.824s | source

I invented Discrete Distribution Networks, a novel generative model with simple principles and unique properties, and the paper has been accepted to ICLR2025!

Modeling data distribution is challenging; DDN adopts a simple yet fundamentally different approach compared to mainstream generative models (Diffusion, GAN, VAE, autoregressive model):

1. The model generates multiple outputs simultaneously in a single forward pass, rather than just one output. 2. It uses these multiple outputs to approximate the target distribution of the training data. 3. These outputs together represent a discrete distribution. This is why we named it "Discrete Distribution Networks".

Every generative model has its unique properties, and DDN is no exception. Here, we highlight three characteristics of DDN:

- Zero-Shot Conditional Generation (ZSCG). - One-dimensional discrete latent representation organized in a tree structure. - Fully end-to-end differentiable.

Reviews from ICLR:

> I find the method novel and elegant. The novelty is very strong, and this should not be overlooked. This is a whole new method, very different from any of the existing generative models. > This is a very good paper that can open a door to new directions in generative modeling.

Show context
VoidWhisperer ◴[] No.45537268[source]
I don't have a super deep understanding of the underlying algorithms involved, but going off the demo and that page, is this mainly a model for image related tasks, or could it also be trained to do things like what GPT/Claude/etc does (chat conversations)?
replies(2): >>45537309 #>>45537623 #
1. diyer22 ◴[] No.45537623[source]
Yes, it's absolutely possible—just like how diffusion LLMs work, we can do the same with DDN LLMs.

I made an initial attempt to combine [DDN with GPT](https://github.com/Discrete-Distribution-Networks/Discrete-D...), aiming to remove tokenizers and let LLMs directly model binary strings. In each forward pass, the model adaptively adjusts the byte length of generated content based on generation difficulty (naturally supporting speculative sampling).

replies(1): >>45539561 #
2. vintermann ◴[] No.45539561[source]
This is what I find most impressive, that it's a natural hierarchial method which seems so general, yet is actually quite competitive. I feel like the machine learning community has been looking for that for a long time. Non-generative uses (like hierarchial embeddings, maybe? Making Dewey's decimal like embeddings for anything!) are even more exciting.
replies(1): >>45539656 #
3. diyer22 ◴[] No.45539656[source]
Exactly! The paragraph on Efficient Data Compression Capability in the original paper also highlights:

> To our knowledge, Taiji-DDN is the first generative model capable of directly transforming data into a semantically meaningful binary string which represents a leaf node on a balanced binary tree.

This property excites me just as much.