(github.com)

534 points BlueFalconHD | 1 comments | 06 Jul 25 19:50 UTC | HN request time: 0.001s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

bombcar ◴[06 Jul 25 20:38 UTC] No.44483830[source]▶

>>44483485 (OP) #

There’s got to be a way to turn these lists of “naughty words” into shibboleths somehow.

replies(2): >>44484345 #>>44485000 #

1. immibis ◴[06 Jul 25 23:12 UTC] No.44485000[source]▶

>>44483830 #

Like asking sensitive employment candidates about Kim Jong Un's roundness to check if they're North Korean spies, we could ask humans what they think about Trump and Palestine to check if they're computers.

However, I think about half of real humans would also fail the test.

↑

I extracted the safety filters from Apple Intelligence models