(github.com)

534 points BlueFalconHD | 2 comments | 06 Jul 25 19:50 UTC | HN request time: 0.016s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

torginus ◴[06 Jul 25 21:31 UTC] No.44484236[source]▶

>>44483485 (OP) #

I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.

replies(8): >>44484268 #>>44484323 #>>44484354 #>>44485047 #>>44485237 #>>44486883 #>>44487765 #>>44493460 #

1. fl0id ◴[07 Jul 25 07:55 UTC] No.44487765[source]▶

>>44484236 #

Actually even of their was AGI, it would be even more necessary to control it.

replies(1): >>44489484 #

2. mailund ◴[07 Jul 25 12:06 UTC] No.44489484[source]▶

>>44487765 (TP) #

I feel that if teenagers are able to trivially bypass illegal-word filters by substituting with words that obviously mean the same thing, I think an AGI wouldn't be too inhibited by this either

↑

I extracted the safety filters from Apple Intelligence models