On average the developers overestimated how long tasks would take when not using AI; they undershot their estimates on average. The opposite happened with AI-assisted tasks.
The conclusion isn't that "estimates are hard" (they can be), but rather that AI-assistance can lead people to believe they're being more productive than they actually are, because they incorrectly think they've spent less time.
The graphs in the paper tell part of that story; the time that is being reduced is in actual programming time, "Reading & Searching", "Testing & Debugging", but that time is being spent elsewhere, notably in parts specific to LLMs (reviewing output, prompting, waiting for the AI to spit out results).