(github.com)

536 points BlueFalconHD | 2 comments | 06 Jul 25 19:50 UTC | HN request time: 0.414s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

trebligdivad ◴[06 Jul 25 20:56 UTC] No.44483981[source]▶

>>44483485 (OP) #

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(11): >>44483999 #>>44484073 #>>44484095 #>>44484410 #>>44484636 #>>44486072 #>>44487916 #>>44488185 #>>44488279 #>>44488362 #>>44488856 #

grues-dinner ◴[06 Jul 25 21:09 UTC] No.44484073[source]▶

>>44483981 #

Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

replies(11): >>44484164 #>>44484360 #>>44484635 #>>44484665 #>>44485033 #>>44485034 #>>44486246 #>>44487244 #>>44488055 #>>44488114 #>>44500918 #

qingcharles ◴[06 Jul 25 21:22 UTC] No.44484164[source]▶

>>44484073 #

It's totally performative. There's no way to stay ahead of the new language that people create.

At what point do the new words become the actual words? Are there many instances of people using unalive IRL?

replies(17): >>44484171 #>>44484218 #>>44484614 #>>44484958 #>>44484970 #>>44484989 #>>44485202 #>>44485277 #>>44485309 #>>44486128 #>>44486394 #>>44487625 #>>44487839 #>>44487936 #>>44488097 #>>44488704 #>>44493436 #

joquarky ◴[07 Jul 25 03:11 UTC] No.44486394[source]▶

>>44484164 #

I feel like we can call our society mature when we no longer need safety alignment in AI.

replies(1): >>44486457 #

scarface_74 ◴[07 Jul 25 03:22 UTC] No.44486457[source]▶

>>44486394 #

You never tried some of the earlier pre-aligned chatbots. Some of the early ones would go off on racist, homophobic rants from the most innocent conversations without any explicit prompting. If you train on all the data on the internet, you have to have some type of alignment.

replies(1): >>44486488 #

1. decremental ◴[07 Jul 25 03:29 UTC] No.44486488[source]▶

>>44486457 #

You say that as if it stands as truth on its own. We actually don't need to filter out how people actually talk and think. Otherwise you just end up with yet another enforcer against wrong-think. I wonder if you even think that deeply about it or if you're just wired at this point to conform.

replies(2): >>44487497 #>>44489427 #

2. scarface_74 ◴[07 Jul 25 12:01 UTC] No.44489427[source]▶

>>44486488 (TP) #

Really? You would want every conversation no matter what you were talking about to immediately devolve to something you would see on 4chan?

↑

I extracted the safety filters from Apple Intelligence models