←back to thread

46 points petethomas | 1 comments | | HN request time: 0.207s | source
1. cadamsdotcom ◴[] No.44400039[source]
The tiniest nudge pushes a complex system (ChatGPT’s LLM) from a delicate hard won state - alignment - to something very undesirable.

The space of possible end states for trained models must be a minefield. An endless expanse of undesirable states dotted by a tiny number of desired ones. If so, the state these researchers found is one of a great many.

Proves how hard it was to achieve alignment in the first place.