With current technology (LLM), how can an agent ever be sure about its confidence?
Calibrated Language Models Must Hallucinate
These models are tools, and LLM products bundles these tools with other tools, and 90% of UX amounts to bundling these well. The article here gives a great sense of what this takes.
Ok, but can you please make your substantive points without putting others down? Your comment wouold be fine without this bit.
In my book they ideally focus on understanding scope, user needs and how to measure success, while implementation details such as orchestration strategies, evaluation and making sure your system delivers the capabilities you want in general, are engineering responsibilities.
There's a dichotomy in development where bad PMs can prosper in a way bad engineers can't.
There's no skill test for PMs, unlike engineers. Bad PMs can look like good PMs to senior management simply because they hold tons of meetings, kiss ass, over promise or steal credit. Any of those bad traits can fool senior management. But those are bad PMs.
On top of that, when you have a bad PM, there's a good chance the Devs themselves will step into the role and still deliver a product.
The bad PM will still take credit, obviously. A bad PM is often circumvented instead of exposed.
Conversely the opposite doesn't work, a good PM + bad Devs turns into never ending dev cycles. The PM looks bad even though there's nothing he can really do, unless he can fire/hire. The good PM cannot circumvent bad engineers.
And in the end, to find bad engineers you can just look at their code. If you don't have the skill to do that, or don't employ someone you know that can, you probably shouldn't be in the software development business.
I challenge the idea that there is no skill test for PMs, though - take a PM interview at a serious product company some day.
And the PM role is of course more than just delivery. If they dropped dead the product would still get shipped. But then what? Someone would need to talk to customers, dig into data and figure out the roadmap. Other people can do it, but in a sufficiently complex company you might as well get people who are good at it and want to devote their time to it.
I understand why some engineers don’t like PMs. But it is exactly the same reason as why some PMs (and C-suites) view engineers as fungible resources who waste time on abstractions instead of shipping, and pad estimates and refuse to discuss practical tradeoffs to move quicker - it’s an unfair generalisation based on bad experiences.
I just think more respect all around wouldn’t hurt.
In short: nice industry roadmap, but we are nowhere near robust, trustworthy multi-agent systems yet.
My view is that you need to transition slowly and carefully to AI first customer support.
1. Know the scope of problems an AI can solve with high probability. Related prompt: "You can ONLY help with the following issues."
2. Escalate to a human immediately if its out of scope: "If you cannot help, escalate to a human immediately by CCing bob@smallbiz.co"
3. Have an "unlocked agent" that your customer service person can use to answer a question and evaluate how well the agent performs in helping. Use this to drive your development roadmap.
4. If the "unlocked agent" becomes good at solving a problem, add that to the in-scope solutions.
Finally, you should probably have some way to test existing conversations when you make changes. (It's on my TODO list)
I've implemented this for a few small businesses, and the process is so seamless that no one has suspected interaction with an AI. For one client, there's not even a visible escalation step: they get pinged on their phone and take over the chat!
'Routing through increasingly specialised agents' was my approach, and the only thing that would've done the job (in MVP form) at the time. There weren't many models that would fit our (v good) CS & Product teams' dataset of "probable queries from customers" into a single context window.
I never personally got my MVP beyond sitting with it beside the customer support inbox, talking to customers. And AFAIK it never moved beyond that after I left.
Nor should it have been, probably - there are (wild, & mostly ineffable) trade-offs that you make the moment you stop actually talking to users at the very moment they get in touch. I don't remember ever making a trade-off like that where it was worthwhile.
I _do_ remember it as perhaps the most worthwhile time I ever spent doing product-y work.
I say that because: To consider a customer support query type that might be 0.005% of all queries received by the CS team, even my trash MVP had to walk a path down a pretty intricate tree of agents and possible query types.
So - if you believe that 'solving the problems users have with your product' = 'making a better product'. then talking to an LLM that was an advocate for a tiny subset of users, and knew very intimately the details of their issue with your product, that felt really good. It felt like it was a very pure version of what _I_ should be to devs, as any kind of interface between them and our users.
It was very hard to stay a believer in the idea of a 'PM' after seeing that, at least. As a person who preferred to just let people get on with things.
I enjoyed the linked post; it's really interesting to see how far things have come. I'm surprised nobody has built 'talk to your customers at scale', yet - this feels like a far more interesting problem than 'avoid talking to your customers at scale'.
I'm also not surprised, I guess, since it's an incredibly bespoke job to do properly, I imagine, for most products.
Even assuming you've correctly auth'd the user contacting you (big assumption!), allowing that user to very literally prompt a 'semi-confident thing with tools' - however many layers of abstraction away the tool is - feels very, very far away from a real-world, sensible implementation right now.
Just shoot the tool prompts over to a human operator, if it's so necessary! Sense-check!
It’s pretty simple. When a non-tech person sees faked demos of what it can do - it looks epic and everyone extrapolates results and thinks AI is that good.
LLMs ability to give convincing sounding answers is like catnip for service desk managers who have never actually been on the desk itself
Using GenAI is a huge breakthrough in this field, because it is a socially acceptable way to tell someone you don't care about their issue.
This sounds hard to pull off in a very similar way to getting good data through surveys.
I generally don't want to talk to my tools. If I'm motivated to talk to you, it's probably because something went wrong. And even if I talked to you when not annoyed, I'd struggle to articulate more than "it's working good" at any given moment - when what you really want as a product person is to know "it's working good, but I had to internalize this workaround for something for my use case that no I don't even think about but originally I found offputting and almost bounced because of" or whatever.
The purpose has been achieved, in that there is a large drop rate. The product manager has met their goals, cut costs, and might be looking forward to their bonus.
It would be far more expensive to make the LLM behave effectively than it would be to do nothing. Any product manager that sincerely cared about customer support wouldn't be inflicting a personalised callous disregard for service. Instead they'd be focusing on improving documentation, help, and processes. But that's not innately quantifiable in a way that leads to bonuses, and therefore goes unnoticed.
I get the feeling there's going to be either 1) a great revert of the features, 2) a bunch of hurried patches, or 3) a bunch of legacy systems operating on MCP v0.00-beta (metaphorically speaking)
:lol_sob:
Far better to focus on enhancing human capabilities with agents.
For example while a human talks to a customer on the phone, AI is fetching useful context about the customer and suggesting talking points to improve the human conversation.
One example of a direct benefit for business using AI this way is reducing onboarding times for new employees
It is a generalization, but it's not unfair. That's the mistake you're making. Is it "unfair" to call the British people Roast Beef, or calling French people Froggies. Those are generalizations but are fair (or were at least). British people genuinely eat a disproportionate amount of Roast Beef and French people genuinely eat Frogs legs.
And there are genuinely more bad PMs than good ones and lots of developers have experience "managing" their PM and trying to ensure they don't do too much harm, like the GP that started this discussion.
Don't worry, most engineers will quickly realize when a PM is good and let them do their job without "managing" them. In fact, it's a delight working with one as they do genuinely make the dev process so much better.
Don’t worry, PMs are also used to working with engineers who view their profession as the only special one. Managing that is part of how to get good outcomes.
If you’ve mainly encountered bad PMs, then hey I’m sorry for you. Find somewhere to work with better colleagues?
But you’ll not convince me that one profession is just inherently better than another. That’s silly, and speaks to a lack of empathy that is, if you’re still looking for a checkbox test for the role, the type of thing that would cause you to fail it immediately.
I agree with that.
But what you originally wrote was, "The AI bundling problem is over. The user interface problem is over." It would probably make more sense to say "...will be over."
People tend to be sensitive to those kinds of claims because there's a lot of hype around all this at the moment. So when people seem to imply that what we have right now is much more capable than it actually is, there tends to be pushback.
If you look at development methodologies such as agile, scrum, XP, etc. what you'll notice is that originally, for all of these, there's no non-technical PM. Often the PM role is entirely absent. Why do you think that was?
Because the industry doesn't value that role. It is felt that the process generally works better without it. But C-Suite think they can delegate what should be their decision making to non-technical middle managers, and that's why PMs keep getting forced back in.
If you feel your role is so key and it's this empathy that's so important, why does your post drip with constant snark? Why can't you acknowledge other people's point of view? You've ignored most of my points, and strawmanned one, a classic narcissistic response. All I'm doing is explaining why the PM in tech workflows is generally disdained, and you're shooting the messenger. That all seems more like a lack of empathy than anything.