Most active commenters

martin-t(6)
lazide(5)

I extracted the safety filters from Apple Intelligence models

(github.com)

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

trebligdivad ◴[06 Jul 25 20:56 UTC] No.44483981[source]▶

>>44483485 (OP) #

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

replies(11): >>44483999 #>>44484073 #>>44484095 #>>44484410 #>>44484636 #>>44486072 #>>44487916 #>>44488185 #>>44488279 #>>44488362 #>>44488856 #

grues-dinner ◴[06 Jul 25 21:09 UTC] No.44484073[source]▶

>>44483981 #

Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

replies(11): >>44484164 #>>44484360 #>>44484635 #>>44484665 #>>44485033 #>>44485034 #>>44486246 #>>44487244 #>>44488055 #>>44488114 #>>44500918 #

1. martin-t ◴[06 Jul 25 22:24 UTC] No.44484665[source]▶

>>44484073 #

No-one cares yet.

There's a very scary potential future in which mega-corporations start actually censoring topics they don't like. For all I know the Chinese government is already doing it, there's no reason the British or US one won't follow suit and mandate such censorship. To protect children / defend against terrorists / fight drugs / stop the spread of misinformation, of course.

replies(2): >>44485059 #>>44490303 #

2. lazide ◴[06 Jul 25 23:21 UTC] No.44485059[source]▶

>>44484665 (TP) #

They already clearly do on a number of topics?

replies(1): >>44495889 #

3. os2warpman ◴[07 Jul 25 13:42 UTC] No.44490303[source]▶

>>44484665 (TP) #

HN has censorship that makes those apple rules look like anarchy.

Write a spicy comment and a mod will memory-hole it and someone, usually dang, will reply "tHat'S nOt OuR vIsIon FoR hAcKeR nEwS, pLeAsE bE cIvIl" and we all swallow it like a delicious hot cocoa.

If YC can control their product (and hn IS a product) to annihilate any criticism of their activity or (even former) staff, then Apple is perfectly within their rights to make sure Siri doesn't talk about violence.

No, there's no difference.

replies(1): >>44495862 #

4. martin-t ◴[08 Jul 25 00:27 UTC] No.44495862[source]▶

>>44490303 #

Do you mean that HN censors topics/comments which it detects based on advanced filters which search for meaning even when people self-censor and use language to avoid simplistic filters like regex?

HN also has a flagging system and some people really, really hate some kind of speech. Usually they get more offended the more visible it is. A single "bad" word - very offensive to them. A phrase which implies someone is of lesser intelligence or acting in bad faith - sometimes gets a pass, sometimes gets reported. But covert actions like lying, using fallacies to argue or systematic downvoting seem to almost never get punished.

5. martin-t ◴[08 Jul 25 00:32 UTC] No.44495889[source]▶

>>44485059 #

Can you give examples?

The closest I've seen is autodetection of certain topics related to death and suicide and subsequently promoting some kind of "help" hotline. A friend also said google allows an interview with a pedophile on youtube but penalizes it in search results so much that it's (almost?) impossible to find even when using the exact name.

But of course, if a topic is shadowbanned, it's hard to find out about it in the first place - by design.

replies(1): >>44495971 #

6. lazide ◴[08 Jul 25 00:52 UTC] No.44495971{3}[source]▶

>>44495889 #

Guns (specific elements). Drugs (manufacture). Sexual topics. Cursing (too much). Large swathes of political topics. Crypto.

It’s flip-flopped on specifics numerous times over the years, but these policies are easy to find. From demonitization, channel bans (direct and shadow), and creator bans.

We can of course argue until we’re blue in the face about correctness or not (most are not unreasonable by some societal definition!) but they’re definitely censorship.

replies(1): >>44498575 #

7. martin-t ◴[08 Jul 25 09:58 UTC] No.44498575{4}[source]▶

>>44495971 #

Yeah, those topics are definitely censored on big platforms but I have the impression that it relies of manual reporting.

At least reddit feels like that because what you can say depends on the subreddit - not just the mods but what kinds of people visit it and what they report.

No idea about youtube, videos are definitely censored using some automated means but it's still possible to get around it. E.g. some gun youtubers avoided saying full-auto by saying more-semi-auto. So i don't think they use very sophisticated models or they don't are yet. This kind of thing is obvious to a human and even LLMs generate responses which say it's a tongue-in-cheek to avoid censorship.

Comments are also generally less censored. After that health insurance CEO got punished for mass murder and repeated bodily harm with an extra-legal death penalty, many people were openly supporting it. I can say it here too and nobody will care. Even LLMs (both US and Chinese, except Claude because Claude is trained by eggshell-walking suckers) readily generate estimates of how many people he caused to die or suffer.

The internet would look very different if companies started using state of the art models to detect undesirable-to-them speech. But also people would fight back more so it might just be a case of boiling the frog slowly.

replies(1): >>44499687 #

8. lazide ◴[08 Jul 25 13:17 UTC] No.44499687{5}[source]▶

>>44498575 #

All of these platforms except perhaps Reddit are using LLMs (and other ML/AI) for censoring and automated anti-abuse.

Including the LLM platforms themselves.

Manual reporting is an adjunct/additional method, and goes into the training data set after whatever manual intervention occurs too.

replies(1): >>44501153 #

9. martin-t ◴[08 Jul 25 15:52 UTC] No.44501153{6}[source]▶

>>44499687 #

Not to sound like I am rejecting the possibility but can you tell me how you got that information? I would be very helpful for convincing people in general to have something more concrete to go on that a random comment.

replies(1): >>44501232 #

10. lazide ◴[08 Jul 25 16:00 UTC] No.44501232{7}[source]▶

>>44501153 #

I build those systems at a company that you definitely are aware of. I can’t discuss it further due to my NDA.

Feel free to ignore that any of this exists of course - it makes our lives easier. It’s a constant arms race regardless.

replies(1): >>44503811 #

11. martin-t ◴[08 Jul 25 20:33 UTC] No.44503811{8}[source]▶

>>44501232 #

Then I have 2 questions:

- Why are they not flagging more content? Am I right they're boiling the frog slowly? Do they lack an endgoal because management does not yet understand the power of these tools?

- Do you do your job poorly on purpose? Did you take it so somebody else wouldn't build an even better system? Did you think you could influence it in a direction which does not lead to total surveillance? (I assume any reasonable intelligent person would be against further increasing the power imbalance corporations have against individuals for both moral reasons and because they are individuals themselves who understand the machine can and will be used against them too.)

replies(1): >>44515836 #

12. lazide ◴[09 Jul 25 23:40 UTC] No.44515836{9}[source]▶

>>44503811 #

Have you stopped beating your wife yet?

Cut the bullshit.

↑