←back to thread

353 points mohi-kalantari | 1 comments | | HN request time: 0.195s | source
Show context
ZeroConcerns ◴[] No.46195233[source]
Well, the major problem Microsoft is facing is that its AI products are not only shoddier than average, which is nothing new for them in many categories, but that this time the competition can actually easily leapfrog them.

Like, I have a 'Copilot' button prominently displayed in my New Outlook on MacOS (the only platform where the app-with-that-designation is sort-of usable), and it's a dropdown menu, and it has... zero items when expanded.

I asked my 'Microsoft 365 Bing Chat AI Bot Powered By ChatGPT<tm>' about that, and it wasn't able to tell me how to make that button actually do something, ending the conversation with "yeah, that's sort-of a tease, isn't it?"...

Oh, well, and I actually also have a dedicated Copilot button on my new Lenovo laptop powered-by-Windows-11. And, guess what, it does exactly nothing! I can elect to either assign this button to 'Search', which opens a WebView2 to bing.com (ehhm, yeah, sure, thanks!) or to 'Custom', in which case it informs me that 'nothing' meets the hardware requirements to actually enable that.

So, my question to anyone in the Microsoft C-suite: have you ever tried to, like, actually use, like anything that you're selling? Because if you would have, the failings would have been obvious, right? Right??

replies(15): >>46195308 #>>46195429 #>>46195463 #>>46195557 #>>46195648 #>>46195673 #>>46196109 #>>46196188 #>>46196233 #>>46196367 #>>46196502 #>>46196573 #>>46196837 #>>46197326 #>>46197998 #
throw310822 ◴[] No.46195308[source]
The other day I've clicked on one of Outlook calendar's copilot prefilled questions: "who are the main attendees of this meeting". It started a long winding speech that went nowhere, so I typed in "but WHO are the attendees" and finally it admitted "I don't know, I can't see that".
replies(7): >>46195481 #>>46195920 #>>46196018 #>>46196178 #>>46196184 #>>46196430 #>>46196505 #
artrockalter ◴[] No.46196430[source]
It's so easy to ship completely broken AI features because you can't really unit test them and unit tests have been the main standard for whether code is working for a long time now.

The most successful AI companies (OpenAI, Anthropic, Cursor) are all dogfooding their products as far as I can tell, and I don't really see any other reliable way to make sure the AI feature you ship actually works.

replies(2): >>46196476 #>>46196643 #
sk7 ◴[] No.46196643[source]
Tests are called "evals" (evaluations) in the AI product development world. Basically you let humans review LLM output or feed it to another LLM with instructions how to evaluate it.

https://www.lennysnewsletter.com/p/beyond-vibe-checks-a-pms-...

replies(1): >>46197113 #
1. azemetre ◴[] No.46197113[source]
Interesting, never really thought of it outside of this comment chain but I'm guessing approaches like this hurt the typical automated testing devs would do but seeing how this is MSFT (who already stopped having dedicated testing roles for a good while now, rip SDET roles) I can only imagine the quality culture is even worse for "AI" teams.