A Gentle Introduction to Graph Neural Networks (2021)

(distill.pub)

362 points misonic | 1 comments | 20 Dec 24 04:10 UTC | HN request time: 0.359s | source

Show context

cherryteastain ◴[20 Dec 24 10:07 UTC] No.42469706[source]▶

There are a lot of papers using GNNs for physics simulations (e.g. computational fluid dynamics) because the unstructured meshes used to discretize the problem domain for such applications map very neatly to a graph structure.

In practice, every such mesh/graph is used once to solve a particular problem. Hence it makes little sense to train a GNN for a specific graph. However, that's exactly what most papers did because no one found a way to make a GNN that can adjust well to a different mesh/graph and different simulation parameters. I wonder if there's a breakthrough waiting just around the corner to make such a generalization possible.

replies(2): >>42470241 #>>42470602 #

magicalhippo ◴[20 Dec 24 11:31 UTC] No.42470241[source]▶

>>42469706 #

Naive question:

Words in sentences kinda forms graphs, referencing other words or are leafs being referenced, both inside sentences and between sentences.

Given the success of the attention mechanism in modern LLMs, how well would they do if you trained a LLM to process an actual graph?

I guess you'd need some alternate tokenizer for optimal performance.

replies(4): >>42470363 #>>42470817 #>>42472814 #>>42474565 #

cherryteastain ◴[20 Dec 24 11:56 UTC] No.42470363[source]▶

>>42470241 #

For physics sims, I'd say it's useless.

Imagine you discretize a cube into 1000 gridpoints in each direction, that's 1000^3 = 1 billion nodes/"tokens". Plus you typically time-march some sort of equation so you need the solutions previous 3-5 timesteps as well so that's 3-5 billion tokens. If you are gonna do that in the first place, you may as well just use the traditional solver. Traditional solvers usually set up and solve a matrix equation like Ax=b with an iterative method like multigrid which is O(n) as opposed to transformer's O(n^2). It'll give you a much more accurate answer much quicker than it'll take a transformer to do attention on a sequence of length 3 billion.

The entire point of using GNNs/CNNs in this field is that people rely on their ability to make inference using local information. That means the value at each gridpoint/node can be inferred from neighbouring nodes only, which is O(n) like multigrid. Idea in most papers is that the GNN can do this faster than multigrid. Results so far are mixed, however [1].

[1] https://arxiv.org/abs/2407.07218

replies(1): >>42470591 #

1. magicalhippo ◴[20 Dec 24 12:39 UTC] No.42470591[source]▶

>>42470363 #

Ah yes, for dense problems like that I wouldn't expect it to work well. The example graphs in the submission were mostly quite sparse, hence why I thought of LLMs. But perhaps that was just for illustrative purposes.

↑