←back to thread

247 points nabla9 | 4 comments | | HN request time: 0s | source
Show context
gcanyon ◴[] No.41833456[source]
One that isn't listed here, and which is critical to machine learning, is the idea of near-orthogonality. When you think of 2D or 3D space, you can only have 2 or 3 orthogonal directions, and allowing for near-orthogonality doesn't really gain you anything. But in higher dimensions, you can reasonably work with directions that are only somewhat orthogonal, and "somewhat" gets pretty silly large once you get to thousands of dimensions -- like 75 degrees is fine (I'm writing this from memory, don't quote me). And the number of orthogonal-enough dimensions you can have scales as maybe as much as 10^sqrt(dimension_count), meaning that yes, if your embeddings have 10,000 dimensions, you might be able to have literally 10^100 different orthogonal-enough dimensions. This is critical for turning embeddings + machine learning into LLMs.
replies(5): >>41833539 #>>41834446 #>>41835280 #>>41835565 #>>41861970 #
westurner ◴[] No.41833539[source]
Does distance in feature space require orthogonality?

With real space (x,y,z) we omit the redundant units from each feature when describing the distance in feature space.

But distance is just a metric, and often the space or paths through it are curvilinear.

By Taxicab distance, it's 3 cats, 4 dogs, and 5 glasses of water away.

Python now has math.dist() for Euclidean distance, for example.

replies(1): >>41834302 #
1. epistasis ◴[] No.41834302[source]
Near-orthogonality allows fitting in more directions for distinct concepts than the dimension of the space. So even though the dimension of an LLM might be <2000, far far more than 2000 distinct directions can fit into that space.

The term most often used is "superposition." Here's some material on it that I'm working through right now:

https://arena3-chapter1-transformer-interp.streamlit.app/%5B...

replies(2): >>41836687 #>>41873650 #
2. gcanyon ◴[] No.41836687[source]
Nice, thanks!
3. westurner ◴[] No.41873650[source]
Skew coordinates aren't orthogonal.

Skew coordinates: https://en.wikipedia.org/wiki/Skew_coordinates

Are the feature described with high-dimensional spaces really all 90° geometrically orthogonal?

How does the distance metric vary with feature order?

Do algorithmic outputs diverge or converge given variance in sequence order of all orthogonal axes? Does it matter which order the dimensions are stated in; is the output sensitive to feature order, but does it converge regardless?

Re: superposition in this context, too

Are there multiple particles in the same space, or is it measuring a point-in-time sampling of the possible states of one particle?

(Can photons actually occupy the same point in spacetime? Can electrons? But the plenoptic function describes all light passing through a point or all of the space)

Expectation values are or are not good estimators of wave function outputs from discrete quantum circuits and real quantum systems.

To describe the products of the histogram PDFs

replies(1): >>41876465 #
4. westurner ◴[] No.41876465[source]
> Are the [features] described with high-dimensional spaces really all 90° geometrically orthogonal?

If the features are not statistically independent, I don't think it's likely that they're truly orthogonal; which might not affect the utility of a distance metric that assumes that they are all orthogonal.