>The model uses this internal error signal (the gradient) as a mathematical equivalent of saying, "This is unexpected and important!" This allows the Titans architecture to selectively update its long-term memory only with the most novel and context-breaking information
So one can break a model by consistently feeding it with random, highly improbable junk? Everything would be registered as a surprise and get stored, impacting future interactions
replies(6):