Most active commenters

    ←back to thread

    534 points BlueFalconHD | 19 comments | | HN request time: 0.864s | source | bottom

    I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
    1. binarymax ◴[] No.44483936[source]
    Wow, this is pretty silly. If things are like this at Apple I’m not sure what to think.

    https://github.com/BlueFalconHD/apple_generative_model_safet...

    EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.

    replies(7): >>44484127 #>>44484154 #>>44484177 #>>44484296 #>>44484501 #>>44484693 #>>44489367 #
    2. deepdarkforest ◴[] No.44484127[source]
    It's not silly. I would bet 99% of the users don't care that much to do that. A hardcoded regex like this is a good first layer/filter, and very efficient
    replies(2): >>44484514 #>>44484896 #
    3. miohtama ◴[] No.44484154[source]
    Sounds like UK politics is taboo?
    replies(1): >>44484977 #
    4. tpmoney ◴[] No.44484177[source]
    I doubt the purpose here is so much to prevent someone from intentionally side stepping the block. It's more likely here to avoid the sort of headlines you would expect to see if someone was suggested "I wish ${politician} would die" as a response to an email mentioning that politician. In general you should view these sorts of broad word filters as looking to short circuit the "think of the children" reactions to Tiny Tim's phone suggesting not that God should "bless us, every one", but that God should "kill us, every one". A dumb filter like this is more than enough for that sort of thing.
    replies(1): >>44484332 #
    5. bigyabai ◴[] No.44484296[source]
    > If things are like this at Apple I’m not sure what to think.

    I don't know what you expected? This is the SOTA solution, and Apple is barely in the AI race as-is. It makes more sense for them to copy what works than to bet the farm on a courageous feature nobody likes.

    6. XorNot ◴[] No.44484332[source]
    It would also substantially disrupt the generation process: a model which sees B0ris and not Boris is going to struggle to actually associate that input to the politician since it won't be well represented in the training set (and on the output side the same: if it does make the association, a reasoning model for example would include the proper name in the output first at which point the supervisor process can reject it).
    replies(3): >>44484499 #>>44484952 #>>44485371 #
    7. quonn ◴[] No.44484499{3}[source]
    I don‘t think so. My impression with LLMs is that they correct typos well. I would imagine this happens in early layers without much impact on the remaining computation.
    8. stefan_ ◴[] No.44484501[source]
    Why are these things always so deeply unserious? Is there no one working on "safety in AI" (oxymoron in itself of course) that has a meaningful understanding of what they are actually working with and an ability beyond an interns weekend project? Reminds me of the cybersecurity field that got the 1% of people able to turn a double free into code execution while 99% peddle checklists, "signature scanning" and deal in CVE numbers.

    Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.

    9. BlueFalconHD ◴[] No.44484514[source]
    Yep. These filters are applied first before the safety model (still figuring out the architecture, I am pretty confident it is an LLM combined with some text classification) runs.
    replies(1): >>44484674 #
    10. brookst ◴[] No.44484674{3}[source]
    All commercial LLM products I’m aware of use dedicated safety classifiers and then alter the prompt to the LLM if a classifier is tripped.
    replies(1): >>44485031 #
    11. Aeolun ◴[] No.44484693[source]
    The LLM will. But the image generation model that is trained on a bunch of pre-specified tags will almost immediately spit out unrecognizable results.
    12. twoodfin ◴[] No.44484896[source]
    Efficient at what?
    13. lupire ◴[] No.44484952{3}[source]
    "Draw a picture of a gorgon with the face of the 2024 Prime Minister of UK."
    replies(1): >>44488039 #
    14. immibis ◴[] No.44484977[source]
    All politics is taboo, except the sort that helps Apple get richer. (Or any other company, in that company's "safety" filters)
    15. latency-guy2 ◴[] No.44485031{4}[source]
    The safety filter appears on both ends (or multi-ended depending on the complexity of your application), input and output.

    I can tell you from using Microsoft's products that safety filters appears in a bunch of places. M365 for example, your prompts are never totally your prompts, every single one gets rewritten. It's detailed here: https://learn.microsoft.com/en-us/copilot/microsoft-365/micr...

    There's a more illuminating image of the Copilot architecture here: https://i.imgur.com/2vQYGoK.png which I was able to find from https://labs.zenity.io/p/inside-microsoft-365-copilot-techni...

    The above appears to be scrubbed, but it used to be available from the learn page months ago. Your messages get additional context data from Microsoft's Graph, which powers the enterprise version of M365 Copilot. There's significant benefits to this, and downsides. And considering the way Microsoft wants to control things, you will get an overindex toward things that happen inside of your organization than what will happen in the near real-time web.

    16. binarymax ◴[] No.44485371{3}[source]
    No it doesn't disrupt. This is a well known capability of LLMs. Most models don't even point out a mistake they just carry on.

    https://chatgpt.com/share/686b1092-4974-8010-9c33-86036c88e7...

    17. chgs ◴[] No.44488039{4}[source]
    There were two.
    18. Lockal ◴[] No.44489367[source]
    What prevents Apple from applying a quick anti-typo LLM which restores B0ris, unalive, fixs tpyos, and replaces "slumbering steed" with a "sleeping horse", not just for censorship, but also to improve generation results?
    replies(1): >>44491272 #
    19. the_mar ◴[] No.44491272[source]
    why do you think this doesn't already exist?