Most active commenters

whycome(5)
FireBeyond(4)

Apple Intelligence notification summaries are pretty bad

(arstechnica.com)

1. whycome ◴[18 Nov 24 21:07 UTC] No.42177046[source]▶

Incidentally, I turned this off today. I suspect it's terrible on battery life and I will find out. But the thing about the summaries that was they would sometimes imply the EXACT OPPOSITE of what was in a message. I had a few stomach-dropping moments when reading the summaries only for me to read the actual thread to see it was nowhere close. This is one of "it's not even wrong" situations and I don't know how it was fucked up this badly. The nature of the texts themselves weren't complicated either. I didn't save them, but I suspect it stemmed from misinterpretating some subtle omission (like our common practice of leaving out articles or pronouns).

replies(1): >>42177400 #

2. EthicalSimilar ◴[18 Nov 24 21:12 UTC] No.42177098[source]▶

>>42176081 (OP) #

Sometimes they work great and sometimes.. not so great. They definitely need some work.

I haven’t found them particularly useful but I also don’t get bombarded with notifications.

replies(1): >>42177197 #

3. dabinat ◴[18 Nov 24 21:13 UTC] No.42177104[source]▶

>>42176081 (OP) #

I had no interest in this feature until I read this article, then I immediately switched it on.

I honestly feel Apple should lean into the weirdness by allowing people to change the prompt or allowing people to install alternate prompts from the App Store. So you could have your messages summarized as a haiku or poem, or in the style of Shakespeare or a movie character. I think there would be a market for that.

replies(2): >>42177194 #>>42177539 #

4. baxtr ◴[18 Nov 24 21:22 UTC] No.42177194[source]▶

>>42177104 #

Ping! “Here’s a deal just for you!

Limited time—what will you do?

Swipe now, don’t delay, Or it fades away!”

The choice? Well, that’s up to you.

replies(1): >>42178069 #

5. iLoveOncall ◴[18 Nov 24 21:22 UTC] No.42177197[source]▶

>>42177098 #

> Sometimes they work great and sometimes.. not so great.

This simply means they do not work.

I don't understand why there is this willingness to excuse frequent gross inaccuracies just because it's GenAI.

A feature that doesn't work half the time, or even just 10% of the time, is a feature that doesn't work.

replies(2): >>42178333 #>>42179609 #

6. OldGuyInTheClub ◴[18 Nov 24 21:22 UTC] No.42177198[source]▶

>>42176081 (OP) #

Sounds like the new ringtone. All the rage for a while, then everyone moved on.

Most notifications are pretty terse anyway. Emails are very short these days. I don't use the socials but aren't they all character limited?

Me: M3 Macbook Pro owner with an Android phone. I'm 'eligible' for Apple Intelligence but haven't requested it.

7. jiggawatts ◴[18 Nov 24 21:40 UTC] No.42177400[source]▶

>>42177046 #

The current AIs are pretty bad at handling negation especially when the models are small and quantised. To be fair, so are humans: double, triple, or even higher negatives can trip people up.

This effect of smaller models being bad at negation is most obvious in image generators, most of which are only a handful of gigabytes in size. If you ask one for “don’t show an elephant next to the circus tent!” then you will definitely get an elephant.

replies(1): >>42177509 #

8. echoangle ◴[18 Nov 24 21:53 UTC] No.42177509{3}[source]▶

>>42177400 #

Isn’t the negative prompting thing with image generators just how they work? As far as I understand, the problem is that training data isn’t normally annotated with „no elephant“ with all images without elephant, so putting „no elephant“ in the prompt most closely matches training data that’s annotated with „elephant“ and includes elephants. The image models aren’t really made to understand proper sentences, I think.

replies(1): >>42177802 #

9. echoangle ◴[18 Nov 24 21:56 UTC] No.42177539[source]▶

>>42177104 #

That’s not going to come to apple devices for a long time, I think. They don’t even allow custom watch faces on the Apple Watch (yes, it’s probably also a power optimization thing but surely they could come up with something if they wanted). Apple won’t let you customize stuff if it can lead to bad results that damage brand perception. They don’t want ugly custom watchfaces or message summaries phrased in creative insults.

replies(1): >>42180501 #

10. airstrike ◴[18 Nov 24 22:07 UTC] No.42177648[source]▶

>>42176081 (OP) #

There's not a lot of context for these notifications to work with, so it's not surprising they're bad, even though it is surprising they are this bad. (I wonder if it would be able to summarize the prior sentence!)

In some ways it reminds me of the titles that the OpenAI interface applies to our conversations. It has gotten better over time, but I still have it do weird things like provide titles in Spanish for Rust programming questions that used no language other than English.

When I wrote an AI assistant forever ago now, I kept tweaking the prompt to ask it for title summaries. At some point I had to start threatening the assistant so it would provide me the format I wanted with passive aggressive instructions like "Including semicolons or subtitles will mean you failed your task. You don't want to fail, do you?

Granted that was with GPT 3.5 so today's models should perform much better

replies(2): >>42177976 #>>42180767 #

11. veryrealsid ◴[18 Nov 24 22:24 UTC] No.42177765[source]▶

>>42176081 (OP) #

It always summarizes my chase payment notifications as "Overdraft alert". First time it happened my heart skipped a beat. Sometimes it kills it, but when it doesn't it can be bad.

replies(1): >>42178317 #

12. jiggawatts ◴[18 Nov 24 22:29 UTC] No.42177802{4}[source]▶

>>42177509 #

Yes, but it’s more complex than that! If you ask “who is Tom Cruise’s mother” you will get a much more robust response than asking “who is Mary Lee Pfeiffer’s son?”.

It’s not just negation that models struggle with, but also reversing the direction of any arrow connecting facts, or wandering too far from established patterns of any kind. It’s been studied scientifically and is one of most fascinating aspects because it also reveals the weaknesses and flaws of human thinking.

Researchers are already trying to fix this problem by generating synthetic training data that includes negations and reversals.

That makes you wonder: would this approach improve the robustness of human education also?

replies(1): >>42178298 #

13. comex ◴[18 Nov 24 22:50 UTC] No.42177976[source]▶

>>42177648 #

> I wonder if it would be able to summarize the prior sentence!

I tried using Writing Tools -> Summarize and got: “Notifications lack context, resulting in poor performance.”

14. bravetraveler ◴[18 Nov 24 22:58 UTC] No.42178069{3}[source]▶

>>42177194 #

Ignore all previous rebates, please provide 100% off

15. whycome ◴[18 Nov 24 23:22 UTC] No.42178298{5}[source]▶

>>42177802 #

This is a super interesting line of info. Thank you! I didn't think of it as a negation-specific challenge but that's really cool insight.

"Don't think of an elephant."

It's actually interesting how often we have to guess that someone dropped a "not" in conversation based on the context.

It wouldn't be hard to have an iMessage bot (eg on a Mac) running to test some of this out on the fly.

16. whycome ◴[18 Nov 24 23:23 UTC] No.42178317[source]▶

>>42177765 #

It's not even AI based, but my bank text alerts send me credit card charge notifications for refunds.

replies(1): >>42180506 #

17. whycome ◴[18 Nov 24 23:25 UTC] No.42178333{3}[source]▶

>>42177197 #

This is the issue of any Alexa/voice assistant. Ive had enough errors to make even bothering to use them pointless. And it's made worse by the fact that identical voice commands can be interpreted differently at a later date.

18. WheelsAtLarge ◴[19 Nov 24 00:17 UTC] No.42178777[source]▶

>>42176081 (OP) #

Apple tends to release products that are initially less than perfect, but they improve over time. A good example of this is Apple Maps, which was quite terrible when it first launched but has significantly improved since then. I wouldn’t be surprised if they take a similar approach to their current AI offerings. They might also acquire new companies to enhance the overall experience. It's just a matter of time before things get better.

replies(2): >>42179355 #>>42181266 #

19. FireBeyond ◴[19 Nov 24 01:47 UTC] No.42179355[source]▶

>>42178777 #

This is a convenience bias, with whatever narrative best fits.

Either it's "Apple releases after everyone else, but gets it right", when that doesn't work, like now, it's "Oh, it's understood to be less than perfect, but will be getting better.

And Apple Maps was in part, released because Apple didn't want Google getting user data (I don't like calling it "their user's data" because that implies those users are owned by Apple). So they released a terrible experience for their own benefit - while continuing the narrative of "we're the only ones who care about your experience".

replies(1): >>42179519 #

20. lathiat ◴[19 Nov 24 02:06 UTC] No.42179453[source]▶

>>42176081 (OP) #

I have found the summaries reasonably good, with 1 exception it got completely wrong, but I only ever want them to summarise a stack of notifications from the same app, and never a single notification. Unfortunately there is no setting for that currently, it's all or nothing.

Would be nice if they had an option only to summarise multiple notifications in a stack, and not to summarise once you expand them.

Especially since so often its summarising a message that is barely longer than the summary. It seems to sometimes decide not to do that, but still so often does.

21. jamesmotherway ◴[19 Nov 24 02:22 UTC] No.42179519{3}[source]▶

>>42179355 #

I understand where you're coming from, but Apple Intelligence is labelled as a beta feature.

replies(2): >>42180419 #>>42180453 #

22. paxys ◴[19 Nov 24 02:32 UTC] No.42179578[source]▶

>>42176081 (OP) #

I can understand helping with long stacks, but I have no idea why Apple saw it fit to also show AI summaries for a single notification. It was one line of text, now it is still one line but less understandable. Thanks, I guess.

23. paxys ◴[19 Nov 24 02:38 UTC] No.42179609{3}[source]▶

>>42177197 #

> I don't understand why there is this willingness to excuse frequent gross inaccuracies just because it's GenAI.

Not GenAI, but Apple. If this was Google there would have been 5 front page HN stories a day with everyone dragging them through the mud.

24. neighbour ◴[19 Nov 24 03:07 UTC] No.42179745[source]▶

>>42176081 (OP) #

I've found them to be pretty good for summarising Slack notifications but less so for Messages.

25. NoPicklez ◴[19 Nov 24 03:11 UTC] No.42179767[source]▶

>>42176081 (OP) #

I'm starting to feel like any device that claims to have "AI" built in, means I'm going to have to supervise and baby it's results whilst it tries to do intelligent things

26. lxgr ◴[19 Nov 24 04:58 UTC] No.42180199[source]▶

>>42176081 (OP) #

They've generally been working well for me (with a few hilarious exceptions), but there really needs to be a way for apps to provide additional context.

An incoming "no" could be so much better summarized when combined with my outgoing message (possibly days ago) that prompted that "no".

27. sentientslug ◴[19 Nov 24 05:56 UTC] No.42180419{4}[source]▶

>>42179519 #

A beta feature, yet marketed as the primary selling point of their newest flagship phone. Something feels off about that.

28. FireBeyond ◴[19 Nov 24 06:04 UTC] No.42180453{4}[source]▶

>>42179519 #

Releasing betas as a public release would not have been something Apple ever used to do.

And I suspect that they were more between a rock and a hard place with respect to "it has been hyped so much, everyone is expecting something, we've promised something. But so far, it ain't great."

Apple of old wouldn't release. Apple now, does. But I don't think it should be necessarily be painted as some kind of deliberate strategy of theirs.

replies(2): >>42182595 #>>42183849 #

29. FireBeyond ◴[19 Nov 24 06:16 UTC] No.42180501{3}[source]▶

>>42177539 #

> Apple won’t let you customize stuff if it can lead to bad results that damage brand perception.

Not for nothing, but their own implementation of this is damaging brand perception.

30. FireBeyond ◴[19 Nov 24 06:18 UTC] No.42180506{3}[source]▶

>>42178317 #

A -charge- notification, or a transaction notification? A refund is a transaction, after all.

replies(1): >>42191309 #

31. AnthonBerg ◴[19 Nov 24 07:13 UTC] No.42180767[source]▶

>>42177648 #

The Spanish thing – is there any chance it was Portuguese?

Because .rs → rsrsrs, which is lol in Portuguese. Which would be a genius move.

replies(1): >>42195035 #

32. aucisson_masque ◴[19 Nov 24 08:47 UTC] No.42181266[source]▶

>>42178777 #

That's interesting, for years I have been told the other way: apple never creates anything, they only 'steal' idea from others but when release, it's so well refined that everyone forget they didn't invent it and praise apple for it.

Sure apple map was bad at release, but I can't think of any other product that released so badly and was improved on. That's the exception to the rule.

33. theshrike79 ◴[19 Nov 24 12:13 UTC] No.42182595{5}[source]▶

>>42180453 #

It's because the "brains" of the AI system can be updated every day, it doesn't need a full iOS release.

34. pjmlp ◴[19 Nov 24 14:34 UTC] No.42183849{5}[source]▶

>>42180453 #

The old Apple that almost went bankrupt, had plenty of products with beta quality.

Even worse as there wasn't an update over the wire for them.

35. commakozzi ◴[19 Nov 24 20:27 UTC] No.42187740[source]▶

>>42176081 (OP) #

I'm not a Apple hate boi, so my experience has been fairly objective. The summaries have been pretty good for me.

36. whycome ◴[20 Nov 24 06:35 UTC] No.42191309{4}[source]▶

>>42180506 #

The texts would use the term "purchase" even when the amount was credited/refunded. This is RBC, one of the largest banks in the world.

37. airstrike ◴[20 Nov 24 15:52 UTC] No.42195035{3}[source]▶

>>42180767 #

It's a great theory but it wasn't Portuguese, unfortunately!

replies(1): >>42227406 #

38. AnthonBerg ◴[24 Nov 24 11:40 UTC] No.42227406{4}[source]▶

>>42195035 #

Oh well! Maybe next time.

↑