←back to thread

534 points BlueFalconHD | 2 comments | | HN request time: 0.425s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
Show context
trebligdivad ◴[] No.44483981[source]
Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(11): >>44483999 #>>44484073 #>>44484095 #>>44484410 #>>44484636 #>>44486072 #>>44487916 #>>44488185 #>>44488279 #>>44488362 #>>44488856 #
grues-dinner ◴[] No.44484073[source]
Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

replies(11): >>44484164 #>>44484360 #>>44484635 #>>44484665 #>>44485033 #>>44485034 #>>44486246 #>>44487244 #>>44488055 #>>44488114 #>>44500918 #
qingcharles ◴[] No.44484164[source]
It's totally performative. There's no way to stay ahead of the new language that people create.

At what point do the new words become the actual words? Are there many instances of people using unalive IRL?

replies(17): >>44484171 #>>44484218 #>>44484614 #>>44484958 #>>44484970 #>>44484989 #>>44485202 #>>44485277 #>>44485309 #>>44486128 #>>44486394 #>>44487625 #>>44487839 #>>44487936 #>>44488097 #>>44488704 #>>44493436 #
Terr_ ◴[] No.44484970[source]
> There's no way to stay ahead of the new language that people create.

I'm imagining a new exploit: After someone says something totally innocent, people gang up in the comments to act like a terrible vicious slur has been said, and then the moderation system (with an LLM involved somewhere) "learns" that an arbitrary term is heinous eand indirectly bans any discussion of that topic.

replies(5): >>44485038 #>>44485110 #>>44485356 #>>44486827 #>>44486843 #
grues-dinner ◴[] No.44486843[source]
The first half of that already happened with the OK gesture: https://www.bbc.co.uk/news/newsbeat-49837898.

Though it would be fun to see what happens if an LLM if used to ban anything that tends to generate heated exchanges. It would presumably learn to ban racial terms, politics and politicians and words like "immigrant" (i.e. basically the list in this repo), but what else could it be persuaded to ban? Vim and Emacs? SystemD? Anything involving cyclists? Parenting advice?

replies(3): >>44487593 #>>44488628 #>>44488746 #
weinzierl ◴[] No.44488628[source]
The OK gesture has always been very inappropriate in most parts of the world.
replies(2): >>44488774 #>>44500652 #
chmod775 ◴[] No.44488774[source]
> The OK gesture has always been very inappropriate in most parts of the world.

No, it isn't, and especially hasn't been historically. The negative connotations are overwhelmingly modern.

The areas where it is very inappropriate right now tally up to maybe 1 billion people*. That's pretty far from "most". For everyone else it is mostly positive, neutral, or meaningless.

*Brazil, Turkey, Iran, Iraq, Saudi Arabia, Greece, Italy, Spain, Russia, Ukraine, Belarus, other parts of Eastern Europe

replies(3): >>44489651 #>>44499210 #>>44500689 #
weinzierl ◴[] No.44489651[source]
"No, it isn't, and especially hasn't been historically. The negative connotations are overwhelmingly modern."

Maybe that is what Richard Nixon thought as well when he caused a little scandal using it in South America in 1950. In 1992 when the Chicago Tribune published "HANDS OFF" mentioning said episode the negative connotations still seemed to be in place[1].

In 1996 The New York Times stated "What's A-O.K. in the U.S.A. Is Lewd and Worthless Beyond"[2] as title of an article confirming the negative connotations.

It is worth mentioning that this article lists Australia amongst the places where the gesture is inappropriate. I always thought it was something used only in the English-speaking world but it seems in reality it is more like a North American plus diving world thing.

If you don't believe the press, I traveled around the world for more than 30 years and I can assure you in most parts using your thumb and index finger for a visual OK is not OK.

[1] https://www.chicagotribune.com/1992/01/26/hands-off-34/

[2] https://www.nytimes.com/1996/08/18/weekinreview/what-s-a-ok-...*

replies(2): >>44490415 #>>44490432 #
chmod775 ◴[] No.44490415[source]
Care to add any country to the list then? Did I miss anything? Let's see if we can push it past half of the world's population, but I don't think we will.

> I can assure you in most parts using your thumb and index finger for a visual OK is not OK.

You're moving goal posts. Of course it doesn't just mean "OK" in some places.

What you actually claimed was "The OK gesture has always been very inappropriate in most parts of the world."

Which is plain wrong. In India for instance it can refer to "money", while in China it can nowadays also be seen as a distress signal when performed a certain way (thanks to Chinese social media popularizing that use). There's some ways you can mess this up, like making it seem you're attempting to bribe someone, or signalling you're in distress when you aren't, but in neither country the gestures are inherently anywhere near "very inappropriate" and both will even understand it as "OK" if you perform it correctly and in the appropriate context.

That's already almost 3 billion people, but let's say 2.5 billion because there's regional variations in both countries and I'm sure you could find some northern Chinese village that will take offense.

I can easily push the number of people to whom it is not inappropriate past 4 billion by adding smaller populations (Indonesia, Japan, western Europe, USA, Taiwan, South Africa, Kenya, Nigeria, ...), so your claim that "[it] has always been very inappropriate in most parts of the world" cannot possibly be true.

replies(1): >>44490882 #
1. weinzierl ◴[] No.44490882[source]
> I can assure you in most parts using your thumb and index finger for a visual OK is not OK.

>>You're moving goal posts. Of course it doesn't mean "OK" in many

I said the gesture is "not OK" to use (meaning inappropriate), not that it doesn’t mean "OK". Those are two different things. The gesture can mean OK in some places while still being not OK (inappropriate) to use in many others.

Also, I always said "parts of the world". You introduced population into the argument.

replies(1): >>44490985 #
2. chmod775 ◴[] No.44490985[source]
> I said the gesture is "not OK" to use (meaning inappropriate), not that it doesn’t mean "OK". Those are two different things. The gesture can mean OK in some places while still being not OK (inappropriate) to use in many others.

Fair. That's clearly how I should've read that.

Though it does not materially affect this conversation, since demonstrably there's over 4 billion people to whom the gesture is not inappropriate. The claim "[it] has always been very inappropriate in most parts of the world" is wrong, regardless of what reasonable definition of "most" you use.

You edited your comment to add this, so I'll respond here:

> Also, I always said "parts of the world". You introduced population into the argument.

Right. And you're being vague on how you actually arrive at your claim of "most", which conveniently keeps the waters muddy while you attack attempts to turn this into something measurable.

So what other measure would you use? Most others are nonsense.

For example "places" isn't a useful measure, but even then: It can only be offensive to people. If I dropped you on a random point on the globe and you made that gesture, there's about a 99% chance nobody would be around to be offended.

By land area and predominant culture? Just Antarctica (hardly anyone there to take offense), the US, China, Canada, Australia, and India together are going to dwarf the opposition.

Counting countries? It's clearly inappropriate in around 10, with about another 20-30 where it can be misunderstood easily (Arab world, some of eastern Europe, scattered ones). A far cry from ~195 countries.

Either way there needs to be someone to take offense, so population is a pretty good measure.

You may disagree, but the onus was always on you, the one making the claim, to pick a measure and a definition of "most", then show that the bar is met. Feel free to now make more of an argument than "trust me I traveled".