Most active commenters

    ←back to thread

    306 points mohi-kalantari | 19 comments | | HN request time: 1.006s | source | bottom
    Show context
    ZeroConcerns ◴[] No.46195233[source]
    Well, the major problem Microsoft is facing is that its AI products are not only shoddier than average, which is nothing new for them in many categories, but that this time the competition can actually easily leapfrog them.

    Like, I have a 'Copilot' button prominently displayed in my New Outlook on MacOS (the only platform where the app-with-that-designation is sort-of usable), and it's a dropdown menu, and it has... zero items when expanded.

    I asked my 'Microsoft 365 Bing Chat AI Bot Powered By ChatGPT<tm>' about that, and it wasn't able to tell me how to make that button actually do something, ending the conversation with "yeah, that's sort-of a tease, isn't it?"...

    Oh, well, and I actually also have a dedicated Copilot button on my new Lenovo laptop powered-by-Windows-11. And, guess what, it does exactly nothing! I can elect to either assign this button to 'Search', which opens a WebView2 to bing.com (ehhm, yeah, sure, thanks!) or to 'Custom', in which case it informs me that 'nothing' meets the hardware requirements to actually enable that.

    So, my question to anyone in the Microsoft C-suite: have you ever tried to, like, actually use, like anything that you're selling? Because if you would have, the failings would have been obvious, right? Right??

    replies(13): >>46195308 #>>46195429 #>>46195463 #>>46195557 #>>46195648 #>>46195673 #>>46196109 #>>46196188 #>>46196233 #>>46196367 #>>46196502 #>>46196573 #>>46196837 #
    1. throw310822 ◴[] No.46195308[source]
    The other day I've clicked on one of Outlook calendar's copilot prefilled questions: "who are the main attendees of this meeting". It started a long winding speech that went nowhere, so I typed in "but WHO are the attendees" and finally it admitted "I don't know, I can't see that".
    replies(7): >>46195481 #>>46195920 #>>46196018 #>>46196178 #>>46196184 #>>46196430 #>>46196505 #
    2. ZeroConcerns ◴[] No.46195481[source]
    Absolutely! There are so many scenarios where they could actually add some value, and they're fulfilling, like, exactly none of those?

    Even in Visual Studio Enterprise, their flagship developer product, the GPT integration mostly just destroys code regardless of model output. I truly cannot fathom how any of that made it past even a cursory review. Or how that situation would last for over 6 months, but, yet, here we are.

    And, again, it's fine with me: I'll just use Claude Code, but if I were a Microsoft VP-or-above, the lack of execution would sort-of, well concern me? But maybe I'm just focused on the wrong things. I mean, Cloudflare brought down, like, half the Internet twice in the past two weeks, and they're still a tech darling, so possibly incompetence is the new hotness now?

    replies(4): >>46195798 #>>46196321 #>>46196572 #>>46196628 #
    3. Sanzig ◴[] No.46195798[source]
    I have Copilot at work, it feels so useless sometimes. As an example, I had a report which I needed to make some batch edits to. I figured why not let the robot take a crack at it, so I clicked the Copilot button and spent a couple minutes describing what I needed changed.

    Copilot tells me it can't edit my current document, but it can create a new one. I figured okay, Microsoft doesn't want to set it loose on the original, guess it makes sense that it requires a copy. So I said yes.

    Nope. Instead of creating a copy of my document and editing it, it created an entirely new document which excised basically everything in the original report and replaced it with a very short summary - I'm talking 5000 words down to 500. All my tables and figures were gone, as was the standard report template my employer uses.

    What utter garbage. Office productivity is a major use case for LLMs, and here the largest vendor of productivity software on the planet is happy to fuck it up.

    4. HarHarVeryFunny ◴[] No.46195920[source]
    Sounds like Siri - unable to control much of anything on the iPhone outside of reading/sending text messages and setting alarms.
    replies(1): >>46196177 #
    5. burningChrome ◴[] No.46196018[source]
    I've fooled around with some vibe coding on several LLM's like Claude, Gemini, and ChatGPT with some pretty decent results.

    Since I have a full Copilot license at my corporate day gig, I figured I would try using Copilot for a basic static site. Nothing too hard, and something that's been handled easily with the other LLM's.

    The prompt was pretty basic just to get something to start working with. "Build a four page template. With a home or index page, two pages of content and a contact page with a responsive slide out menu from the left hand side of the page."

    It ran and put everything in a folder. I open the home page and everything was broken. I opened the files in VS Code and saw this:

        <ul class="drawer__list">
          <li>index.htmlHome</a></li>
          <li>services.htmlServices</a></li>
          <li><a class="nav-linkeduling</a></li>
          <li>contact.htmlContact</a></li>
        </ul>
    
    
    And then this:

        <head>
        <meta charset="utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
        <title>Home · Acme Web</title>
        <meta name="description" content="Accessible, responsive starter template with a slide-out menu."/>
        <linkts/css/styles.css
        /assets/css/styles.css
        </head>
    
    I mean, if you can't even this right, I don't have much hope it can do anything more complicated. To say this was pretty sad is an understatement and clarified how far Microsoft is behind other LLM's.
    6. donkey_brains ◴[] No.46196177[source]
    And that’s ok. Those are core features of the phone that absolutely must work reliably and consistently. Far better to do a few important things really well than a hundred things execrably.
    replies(2): >>46196224 #>>46196347 #
    7. estetlinus ◴[] No.46196178[source]
    Kudos to you. I never use new buttons out of fear of something irreversible happening, like sending a random email or deleting something. I still feel uncomfortable with the Gmail UX, I would _never_ use a ”hello iz magic ai”-button.
    8. outside2344 ◴[] No.46196184[source]
    I asked Microsoft 365 Copilot to create a new word document for me (since they have hidden the link on office.com) and... it refused to do that.

    Edit: Just tried again. It refused to do it. I mean WTF.

    9. throw310822 ◴[] No.46196224{3}[source]
    However the other day I asked the Gemini assistant on my phone to check the birthdays in my calendar, get all their dates, then make a graph of how many fall in each period with a 15-day moving average. It did everything as instructed including writing a python script to generate the graph, then discussed the results with me :)
    10. Angostura ◴[] No.46196321[source]
    I’ve found it fairly useful in Excel. The suggestions to clean up data are pretty good and it’s spat out some quite gnarly formula on request
    11. HarHarVeryFunny ◴[] No.46196347{3}[source]
    I would expect Siri to be able to do anything on the iPhone that I can - change settings, report stats, kill/launch apps, etc.

    It would be nice if it could control 3rd party apps too, like GMail, but being able to control the stuff that Apple themselves have built doesn't seem a lot to ask.

    12. artrockalter ◴[] No.46196430[source]
    It's so easy to ship completely broken AI features because you can't really unit test them and unit tests have been the main standard for whether code is working for a long time now.

    The most successful AI companies (OpenAI, Anthropic, Cursor) are all dogfooding their products as far as I can tell, and I don't really see any other reliable way to make sure the AI feature you ship actually works.

    replies(2): >>46196476 #>>46196643 #
    13. bn-l ◴[] No.46196476[source]
    Microsoft: What? You want us to eat this slop? Are you crazy?!
    replies(1): >>46196581 #
    14. flkiwi ◴[] No.46196505[source]
    Me: Can you access my inbox and Teams messages?

    Copilot: Yep!

    Me: Please find any items in my inbox or sent items indicating (a) that I have agreed to take on a task or (b) identifying me as the person responsible for a task, removing duplicates and any items that I have unambiguously replied to via email or Teams. Time window is preceding 7 days.

    Copilot: Prints a list with, at best, 5% accuracy

    I know some folks have the peculiar idea that search is dead in favor of AI, but if AI can't accurately find information, it is useless. As near as I can tell, Copilot finds 3-4 items (but rarely the SAME 3-4 items across runs) and calls it a day. It just seems like nobody is actually testing any of this stuff. Microsoft is actively destroying its credibility because it's offering a tool with a party trick but is utterly unreliable. I will, therefore, not rely on it.

    replies(1): >>46196564 #
    15. PLenz ◴[] No.46196564[source]
    It's a generalization problem. We can train LLMs that 'know' a lot of stuff in the global sense but the tasks that are interesting to people require the LLM to know a lot about you and your world in a very specific sense. The technical problem is that it's all corner cases and that's impossible to scale right now. No amount of context window is going to get you there either.
    16. danudey ◴[] No.46196572[source]
    Imagine a circumstance where Windows Search was as good as Apple's Spotlight, and could integrate with cloud services to index documents, browser bookmarks, web history, maybe podcasts, etc.)

    Hey Copilot, where is that document I was reading about the new network diagramming software Jacob is testing out?

    Or Hey Copilot, my disk is getting pretty full. What software is taking up a lot of space that I haven't used for a while? Or are there any files I can move to cloud storage to free up space?

    But no, instead it's just 'we're going to take screenshots of all your windows, OCR it, and index it, so that when someone infects your machine they can see your credit card numbers and pornography habits.'

    17. danudey ◴[] No.46196581{3}[source]
    50% of our code is being written by AI! Or at least, autocompleted by AI. And then our developers have to fix 50% of THAT code so that it does what they actually wanted to do in the first place. But boy, it sure produces a lot of words!
    18. rickydroll ◴[] No.46196628[source]
    > I truly cannot fathom how any of that made it past even a cursory review.

    Maybe it's a fifth column group working to destroy Microsoft.

    19. sk7 ◴[] No.46196643[source]
    Tests are called "evals" (evaluations) in the AI product development world. Basically you let humans review LLM output or feed it to another LLM with instructions how to evaluate it.

    https://www.lennysnewsletter.com/p/beyond-vibe-checks-a-pms-...