←back to thread

354 points misonic | 7 comments | | HN request time: 0.81s | source | bottom
Show context
cherryteastain ◴[] No.42469706[source]
There are a lot of papers using GNNs for physics simulations (e.g. computational fluid dynamics) because the unstructured meshes used to discretize the problem domain for such applications map very neatly to a graph structure.

In practice, every such mesh/graph is used once to solve a particular problem. Hence it makes little sense to train a GNN for a specific graph. However, that's exactly what most papers did because no one found a way to make a GNN that can adjust well to a different mesh/graph and different simulation parameters. I wonder if there's a breakthrough waiting just around the corner to make such a generalization possible.

replies(2): >>42470241 #>>42470602 #
1. magicalhippo ◴[] No.42470241[source]
Naive question:

Words in sentences kinda forms graphs, referencing other words or are leafs being referenced, both inside sentences and between sentences.

Given the success of the attention mechanism in modern LLMs, how well would they do if you trained a LLM to process an actual graph?

I guess you'd need some alternate tokenizer for optimal performance.

replies(4): >>42470363 #>>42470817 #>>42472814 #>>42474565 #
2. cherryteastain ◴[] No.42470363[source]
For physics sims, I'd say it's useless.

Imagine you discretize a cube into 1000 gridpoints in each direction, that's 1000^3 = 1 billion nodes/"tokens". Plus you typically time-march some sort of equation so you need the solutions previous 3-5 timesteps as well so that's 3-5 billion tokens. If you are gonna do that in the first place, you may as well just use the traditional solver. Traditional solvers usually set up and solve a matrix equation like Ax=b with an iterative method like multigrid which is O(n) as opposed to transformer's O(n^2). It'll give you a much more accurate answer much quicker than it'll take a transformer to do attention on a sequence of length 3 billion.

The entire point of using GNNs/CNNs in this field is that people rely on their ability to make inference using local information. That means the value at each gridpoint/node can be inferred from neighbouring nodes only, which is O(n) like multigrid. Idea in most papers is that the GNN can do this faster than multigrid. Results so far are mixed, however [1].

[1] https://arxiv.org/abs/2407.07218

replies(1): >>42470591 #
3. magicalhippo ◴[] No.42470591[source]
Ah yes, for dense problems like that I wouldn't expect it to work well. The example graphs in the submission were mostly quite sparse, hence why I thought of LLMs. But perhaps that was just for illustrative purposes.
4. disattention ◴[] No.42470817[source]
This is actually a good insight. It turns out that transformers are indeed a form of graph network, precisely because of the attention mechanism. Graph attention networks are actually a very popular GNN architecture. Generally, the issue with using an LLM style architecture for generic graphs is modeling the sparsity, but is possible by using the graph adjacency matrix to mask the attention matrix. There are a number of papers and articles which address this connection, and plenty of research into mechanisms for sparsifying attention in transformers.

There are also graph tokenizers for using more standard transformers on graphs for doing things like classification, generation, and community detection.

replies(1): >>42471502 #
5. algo_trader ◴[] No.42471502[source]
Any canonical papers on GNN for code graphs?
6. whimsicalism ◴[] No.42472814[source]
yep you're now pretty much at state-of-the-art
7. 6gvONxR4sf7o ◴[] No.42474565[source]
That's the kind of thing that I could imagine could dramatically speed up certain tasks, but not enable particularly new abilities. A ton of the challenge is converting a sequence into a graph. So if you need a huge clever model to turn a sequence into a graph, then your potential gain downstream is either a) in making certain computationally hard queries easier, or b) answering tons and tons of followup queries dramatically faster (enough to make the initial graphification overhead okay).

For (a), any imperfections in the graphification make the problem super hard and researchy.