I haven’t found them particularly useful but I also don’t get bombarded with notifications.
I honestly feel Apple should lean into the weirdness by allowing people to change the prompt or allowing people to install alternate prompts from the App Store. So you could have your messages summarized as a haiku or poem, or in the style of Shakespeare or a movie character. I think there would be a market for that.
This simply means they do not work.
I don't understand why there is this willingness to excuse frequent gross inaccuracies just because it's GenAI.
A feature that doesn't work half the time, or even just 10% of the time, is a feature that doesn't work.
Most notifications are pretty terse anyway. Emails are very short these days. I don't use the socials but aren't they all character limited?
Me: M3 Macbook Pro owner with an Android phone. I'm 'eligible' for Apple Intelligence but haven't requested it.
This effect of smaller models being bad at negation is most obvious in image generators, most of which are only a handful of gigabytes in size. If you ask one for “don’t show an elephant next to the circus tent!” then you will definitely get an elephant.
In some ways it reminds me of the titles that the OpenAI interface applies to our conversations. It has gotten better over time, but I still have it do weird things like provide titles in Spanish for Rust programming questions that used no language other than English.
When I wrote an AI assistant forever ago now, I kept tweaking the prompt to ask it for title summaries. At some point I had to start threatening the assistant so it would provide me the format I wanted with passive aggressive instructions like "Including semicolons or subtitles will mean you failed your task. You don't want to fail, do you?
Granted that was with GPT 3.5 so today's models should perform much better
It’s not just negation that models struggle with, but also reversing the direction of any arrow connecting facts, or wandering too far from established patterns of any kind. It’s been studied scientifically and is one of most fascinating aspects because it also reveals the weaknesses and flaws of human thinking.
Researchers are already trying to fix this problem by generating synthetic training data that includes negations and reversals.
That makes you wonder: would this approach improve the robustness of human education also?
"Don't think of an elephant."
It's actually interesting how often we have to guess that someone dropped a "not" in conversation based on the context.
It wouldn't be hard to have an iMessage bot (eg on a Mac) running to test some of this out on the fly.
Either it's "Apple releases after everyone else, but gets it right", when that doesn't work, like now, it's "Oh, it's understood to be less than perfect, but will be getting better.
And Apple Maps was in part, released because Apple didn't want Google getting user data (I don't like calling it "their user's data" because that implies those users are owned by Apple). So they released a terrible experience for their own benefit - while continuing the narrative of "we're the only ones who care about your experience".
Would be nice if they had an option only to summarise multiple notifications in a stack, and not to summarise once you expand them.
Especially since so often its summarising a message that is barely longer than the summary. It seems to sometimes decide not to do that, but still so often does.
Not GenAI, but Apple. If this was Google there would have been 5 front page HN stories a day with everyone dragging them through the mud.
An incoming "no" could be so much better summarized when combined with my outgoing message (possibly days ago) that prompted that "no".
And I suspect that they were more between a rock and a hard place with respect to "it has been hyped so much, everyone is expecting something, we've promised something. But so far, it ain't great."
Apple of old wouldn't release. Apple now, does. But I don't think it should be necessarily be painted as some kind of deliberate strategy of theirs.
Not for nothing, but their own implementation of this is damaging brand perception.
Because .rs → rsrsrs, which is lol in Portuguese. Which would be a genius move.
Sure apple map was bad at release, but I can't think of any other product that released so badly and was improved on. That's the exception to the rule.