←back to thread

534 points BlueFalconHD | 1 comments | | HN request time: 0s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
Show context
trebligdivad ◴[] No.44483981[source]
Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(11): >>44483999 #>>44484073 #>>44484095 #>>44484410 #>>44484636 #>>44486072 #>>44487916 #>>44488185 #>>44488279 #>>44488362 #>>44488856 #
grues-dinner ◴[] No.44484073[source]
Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

replies(11): >>44484164 #>>44484360 #>>44484635 #>>44484665 #>>44485033 #>>44485034 #>>44486246 #>>44487244 #>>44488055 #>>44488114 #>>44500918 #
qingcharles ◴[] No.44484164[source]
It's totally performative. There's no way to stay ahead of the new language that people create.

At what point do the new words become the actual words? Are there many instances of people using unalive IRL?

replies(17): >>44484171 #>>44484218 #>>44484614 #>>44484958 #>>44484970 #>>44484989 #>>44485202 #>>44485277 #>>44485309 #>>44486128 #>>44486394 #>>44487625 #>>44487839 #>>44487936 #>>44488097 #>>44488704 #>>44493436 #
Terr_ ◴[] No.44484970[source]
> There's no way to stay ahead of the new language that people create.

I'm imagining a new exploit: After someone says something totally innocent, people gang up in the comments to act like a terrible vicious slur has been said, and then the moderation system (with an LLM involved somewhere) "learns" that an arbitrary term is heinous eand indirectly bans any discussion of that topic.

replies(5): >>44485038 #>>44485110 #>>44485356 #>>44486827 #>>44486843 #
grues-dinner ◴[] No.44486843[source]
The first half of that already happened with the OK gesture: https://www.bbc.co.uk/news/newsbeat-49837898.

Though it would be fun to see what happens if an LLM if used to ban anything that tends to generate heated exchanges. It would presumably learn to ban racial terms, politics and politicians and words like "immigrant" (i.e. basically the list in this repo), but what else could it be persuaded to ban? Vim and Emacs? SystemD? Anything involving cyclists? Parenting advice?

replies(3): >>44487593 #>>44488628 #>>44488746 #
weinzierl ◴[] No.44488628{3}[source]
The OK gesture has always been very inappropriate in most parts of the world.
replies(2): >>44488774 #>>44500652 #
chmod775 ◴[] No.44488774{4}[source]
> The OK gesture has always been very inappropriate in most parts of the world.

No, it isn't, and especially hasn't been historically. The negative connotations are overwhelmingly modern.

The areas where it is very inappropriate right now tally up to maybe 1 billion people*. That's pretty far from "most". For everyone else it is mostly positive, neutral, or meaningless.

*Brazil, Turkey, Iran, Iraq, Saudi Arabia, Greece, Italy, Spain, Russia, Ukraine, Belarus, other parts of Eastern Europe

replies(3): >>44489651 #>>44499210 #>>44500689 #
weinzierl ◴[] No.44489651{5}[source]
"No, it isn't, and especially hasn't been historically. The negative connotations are overwhelmingly modern."

Maybe that is what Richard Nixon thought as well when he caused a little scandal using it in South America in 1950. In 1992 when the Chicago Tribune published "HANDS OFF" mentioning said episode the negative connotations still seemed to be in place[1].

In 1996 The New York Times stated "What's A-O.K. in the U.S.A. Is Lewd and Worthless Beyond"[2] as title of an article confirming the negative connotations.

It is worth mentioning that this article lists Australia amongst the places where the gesture is inappropriate. I always thought it was something used only in the English-speaking world but it seems in reality it is more like a North American plus diving world thing.

If you don't believe the press, I traveled around the world for more than 30 years and I can assure you in most parts using your thumb and index finger for a visual OK is not OK.

[1] https://www.chicagotribune.com/1992/01/26/hands-off-34/

[2] https://www.nytimes.com/1996/08/18/weekinreview/what-s-a-ok-...*

replies(2): >>44490415 #>>44490432 #
1. mopsi ◴[] No.44490432{6}[source]
That might have been the case decades ago. For example, in the USSR, various finger gestures usually implied something related to a penis and were considered extremely offensive. But that hasn't been the case since at least the early 1990s, when VCRs became widely available, people saw Hollywood movies for the first time and got used to westernized meaning of thumbs-up and OK gestures. Nowadays, when backing a truck towards a trailer, a thumbs-up would be taken as "good job" and an OK gesture (often paired with a kiss) as "exceptionally well done".