https://developers.google.com/maps/billing-and-pricing/prici...
https://developers.google.com/maps/billing-and-pricing/prici...
``` WeatherNext 2 can generate forecasts 8x faster and with resolution up to 1-hour. This breakthrough is enabled by a new model that can provide hundreds of possible scenarios. ```
As an end user, all I care is that there's one accurate forecasted scenario.
Quite a lot of weather sites offer this data in an easily eatable visual format.
Obviously all I have is anecdata for what I'm mentioning here but from a consumer perspective I don't feel like these model enhancements are really making average folks feel as if weather is any more understood than it was decades ago.
tdlr: Weather forecasts have improved a lot
Different models have different strengths, though. Some are shorter range (72h) or longer range (1-3 weeks). Some are higher resolution for where you live (the size of an area which it assigns a forecast to, so your forecast is more local).
Some governments will have their own weather model for your country that is the most accurate for where you live. What I did for a long time was use Windy and use HDRPS (a Canadian short range model with a higher resolution in Canada so I have more accurate forecasts). Now I just use the government of Canada weather app.
I genuinely wonder what the weather Channel, iPhone/Android official weather apps, etc. use under the hood for global models. My gut says ECMWF (a European model with global coverage) mixed with a little magic.
For example on Apple's Weather app, a "rainy" day means a high chance of rain at any point during the day. If it's 80% chance of rain at 5am and sunny the rest of the day– that counts as rainy. You can see an hourly report for more info, and generally this is pretty accurate. You have to learn how to find the right data, know your local area, and interpret it yourself.
Then you have to consider what effects this has on your plans and it gets more complicated. Finding a window to walk the dog, choosing a day to go sailing, or determining conditions for backcountry skiing all have different requirements and resources. What I'd like AI to do is know my own interests and highlight what the forecast means for me.
Sure, those big physics-based models are very computationally intensive (national weather bureaus run them on sizeable HPC clusters), but you only need to run them every few hours in a central location and then distribute the outputs online. It's not like every forecaster in a country needs to run a model, they just need online access to the outputs. Even if they could run the models themselves, they would still need the mountains of raw observation data that feeds the models (weather stations, satellite imagery, radars, wind profilers...). And these are usually distributed by... the national weather bureau of that country. So the weather bureau might as well do the number crunching as well and distribute that.
The standard graph that most people look at to get an idea about today and tomorrow: https://www.yr.no/en/forecast/graph/1-72837/Norway/Oslo/Oslo...
The live weather radar which shows where it is raining right now and prediction/history for rain +/- 90 minutes. This is accurate enough that you can use it to time your walk from the office to the subway and avoid getting wet: https://www.yr.no/en/map/radar/1-72837/Norway/Oslo/Oslo/Oslo
Then you have more specialised forecasts of course. Dew point, feels like temperature, UV, pollution, avalanche risks, statistics, sea conditions, tides, ... People tend to geek out quite heavily on these.
Developing an ensemble of possible scenarios has been the central insight of weather forecasting since the 1960s when Edward Lorenz discovered that tiny differences in initial conditions can grow exponentially (the "butterfly effect"). Since they could really do it in the 90s, all competitive forecasts are based on these ensemble models.
When you hear "a 70% chance of rain," it more or less means "there was rain in 70 of the 100 scenarios we ran."[0] There is no "single accurate forecast scenario."
[0] Acknowledging this dramatically oversimplifies the models and the location where the rain could occur.
The accuracy improvement is provable. A four-day forecast today is as accurate as a one-day forecast 30 years ago. And this is supremely impressive, because the difficulty of predicting the weather grows exponentially, not linearly, with time.
You are welcome to your feelings - and to be fair, I'm not sure that our understanding of the weather has improved as much as our computational power to extend predictions has.
Models all use a "current world state" of all sensors available to bootstrap their runs.
Similar thing happened during the beginning of Covid-19: they are using modified cargo/passenger planes to gather weather data during their routine trips. Suddenly this huge data source was gone (but was partially replaced by the experimental ADM-Aeolus satellite - which turned out to be a huge global gamer changer due to its unexpected high quality data)
Like if I wanted to simulate whether something like Hurricane Melissa would've gone through a handful of southern US states, what would the effect have been, from an insurance or resiliency standpoint.
Apple even bought Dark Sky, which purported to do this but never released any information - so I doubt they really did do it. And if they did, I doubt Apple continued the practice.
Been waiting a long time to hear Google announce they'll use your barometer to give you a better forecast. Still waiting I guess.
For WeatherNext, the answer is 'no'. The paper (https://arxiv.org/abs/2506.10772) describes in detail what data the model uses, and direct assimilation of user barometric data is not on the list.
Again, you, as an end user, don't need to know any of that. The CRPS scorecard is a very specific measure of error. I don't expect them to reveal the technical details of the model, but an industry expert instantly knows what WeatherBench[1] is, the code it runs, the data it uses, and how that CRPS scorecard was generated.
By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts (aka the ones you get on your phone). These are a piece of the puzzle, though, and not one that you will ever actually encounter as a layperson.
https://arstechnica.com/science/2025/11/googles-new-weather-...
Essentially you add random noise to the inputs and train by minimizing the regular loss (like l1) and at the same time maximizing the difference between 2 members with different random noise initialisations. I wonder if this will be applied to more traditional genai at some point.
Sorry - not sure this is a reasonable take-away. The models here are all still initialized from analysis performed by ECMWF; Google is not running an in-house data assimilation product for this. So there's no feedback mechanism between ensemble spread/uncertainty and the observation itself in this stack. The output of this system could be interrogated using something like Ensemble Sensitivity Analysis, but there's nothing novel about that and we can do that with existing ensemble forecast systems.
Yes, _in aggregate_, forecasts are objectively, quantifiably better in 2025 than they were in 2005 let alone 1985. But any given, specific forecast may have unique and egregious failure modes. Look no further than the GFS' complete inability to lock on to the forecast track for Hurricane Melissa a month ago. This is dramatically compounded when you look at mesoscale forecast, where higher spatial resolution is a liability that leads to double-penalty errors (e.g. setting up a mesoscale snow squall band just slightly south of where it actually develops).
And keep in mind that the benchmarks shared from this model product are evaluating an ensemble mean, which further confounds things. Even if the ensemble mean is well-calibrated and accurate, there can be critical spread from the ensemble members themselves.
> By incorporating WeatherNext technology, we’ve now upgraded weather forecasts in Search, Gemini, Pixel Weather and Google Maps Platform’s Weather API. In the coming weeks, it will also help power weather information in Google Maps.
* 90 degree day => more air conditioning usage => power goes up
* 70 degree sunny day => that's also July 4th (holiday, not a work day when factories or heavy industry are running) => lots of people go outside + it's a holiday => power consumption goes DOWN
* 10 degree difference colder/hotter => impacts resistance of power lines => impacts transmission congestion credits => impacts power prices
It's a fascinating industry. One power trading company that I consulted for had a meteorologist who was also a trader. They literally hired the dude from a news channel if I remember it correctly.
As a layperson, what _is_ useful is to look at the difference between models. My long range favourite is to compare ECMWF and GFS27 and if the deviation is high (windy app has this) then you can bet that at least one of them is likely wrong
But knowing "there will be a massive drop in temperature between 1pm->2pm" doesn't help much anymore, you need to know which 15-minute or 5-minute block all those heat pumps will kick on in, to align with markets moving to 15-min and 5-min contracts.
Major forecasts like ECMWF don't have anything like that resolution; they model the planet at 3 hour time scale, with a 1 hour "reanalysis" model called ERA5.. hoping to find good info on what's available at higher resolution.
And I say that as a huge fan of AI, but being vocally self-critical is an important attribute for professional success in AI and elsewhere.
Kenneth Arrow and his statisticians found that their long-range forecasts were no better than numbers pulled out of a hat. The forecasters agreed and asked their superiors to be relieved of this duty. The reply was: "The Commanding General is well aware that the forecasts are no good. However he needs them for planning purposes."
We recently had a situation where we specifically wanted to generate 2 "different" outputs from an optimization task and struggled to come up with a good heuristic for doing so. Not at all a GenAI task, but this technique probably would have helped us.
[edit: "without", not "with"]
I use these and Windy: https://www.windy.com/
In my experience, these forecasts are really good 5-7 days out, and then degrade in reliability (as you would expect from predictions of chaotic systems). The apps that show you a rain cloud and a percentage number are always terrible in my experience for some reason, even if the origin of the data is the same. I'm not sure why that might be.
I am personally not interested in predicting the weather as end users expect it, rather I am interested in representative evolutions of wind patterns. I.e. specify some location (say somewhere in the North Sea, or perhaps on mainland Western Europe), and a date (say Nov 12) without specifying a year, and would like to have the wind patterns at different heights for that location say for half an hour. Basically running with different seeds, I want to have representative evolutions of the wind vector field (without specifying starting conditions, other than location and date, i.e. NO prior weather).
Are there any ML models capable of delivering realistic and representative wind gust models?
(The context is structural stability analysis of hypothetical megastructures)
https://arstechnica.com/science/2025/11/googles-new-weather-...
At least for the US NWS: if 30 of 100 scenarios result in 50% shower coverage, and 70 out of 100 result in 0%, this is reported as 15% chance of rain. Which is exactly the same as 15 with 100% coverage and 85 with 0% coverage, or 100 with 15% coverage.
Understanding this, and digging further into the forecast, gives a better sense of whether you're likely to encounter widespread rainfall or spotty rainfall in your local area.
https://www.windy.com/?hrrrConus
Also checkout HRDPS model if you're in Canada/northern US
In the bottom right hand corner you can switch between different models and it points out their resolution levels
What makes that funny is that historically, weather forecasters have been less than 90% accurate.
Now, I will say that today's weather models are pretty dang amazing. The 10 day forecast rarely wrong for me.
But it is funny that humans put a great lot of weight on social contracts and being given explicit orders, maybe even publicly, must help pursuing action instead of rumination. Especially in a world where things seemed to happen randomly anyway.
Weather is three-dimensional and I would guess that the difference between sphere and (appropriate) spheroid could impact predictions. It seems possible that, at least for local and hyperlocal forecasts, geoids would be worthwhile. But as you go from plane -> sphere -> spheroid -> geoid, computing resources must increase pretty quickly.
And even if a geoid is used, that doesn't mean the weather user sees a geoid or section of geoid. Every consumer weather application displays a plane, afaict. Maybe nautical or aeronatautical weather maps display spheres?
E.g. the weather app tells me there's a drizzle all day and currently and yet it's entirely dry. The opposite happens too.
Days of rain often shift in increments of days one or two days before as well.
I'd say it's location specific how accurate predictions are.
What do you mean?
Many nonfiction books have it to some extent and it's usually fine (like 5% of the content, either relevant or easy to pass into one ear and out the other), but this sounds like it takes up a good chunk of the book with who's-whos and (former) meteorological celebrities
What's your take on this? Does it spend more than, say, 20% talking about the people as compared to the content matter about weather forecast mechanisms and innovations?
You are a bit misleading here. The model is trained on historical data but each run off of new instrument readings will be generated a few times in an ensemble.
Definitely. Training on the historical data creates compelling forecasts but it comes off as a magic box. Where are the missing physics for the high performance cluster?
Concrete if anecdotal example: weather forecast in SF are fairly accurate but the weather patterns are also simple to predict with the Pacific High and the simpler high level mechanics at play. Weather forecasts in Seoul are quite often completely wrong, but the weather patterns are also much more dynamics at a macro level with competing large systems in China/Gobi desert and the Western Pacific.
I'm not a meteorologist, just a sailor who likes to look at weather.
But to expand: the US flagship forecast model just had its worst year predicting hurricanes since 2005. The trend of errors over the last few years hasn't been great.
Forecasting that what is happening today, will happen tomorrow, was an almost insurmountable baseline for early forecasters.
A bit like trying to come up with psych drugs that can beat a placebo. Although, placebos are particularly effective for psych treatments.
What is more interesting for meteorological forecasting is the time-sensitive details such as:
1. We know severe storms will impact city X at approximately Ypm tomorrow. Will it include large hailstones? Severe and destructive downdraft / tornado? What path will the most damage occur and how much notice can we provide those in the path, even if it's just 30min before the storm arrives?
2. Large wildfire breaks out near city X and is starting to form its own weather patterns.[3] What's the possible scenarios for fire tornadoes, lightning, etc to be formed and when/where? Will the wind direction change more likely happen at Ypm or Y+2pm?
I'm skeptical that AI models would excel in these areas because of the time sensitivity of input data as well as the general lack of accurate input data (impacting human analysis too).
Maybe AI models would be better than humans at making longer term climate predictions such as "If [particular type of ENSO/IOD/etc event] is occurring, the number of cloudy days in [city] is expected to be [quantity]/month in [month] versus [quantity2]/month if the event was not occurring." It's not that humans would be unable to arrive at these type of results -- just that it would be tedious and resource intensive to do so.
[1] https://en.wikipedia.org/wiki/List_of_cities_by_sunshine_dur...
[2] https://imagehunter.apollomapping.com/search/90e4893eeeaa48a...
[3] https://en.wikipedia.org/wiki/Cumulonimbus_flammagenitus
On the advice of someone here on hackernews I tried out weawow, and though it is a terrible name it is _very_ accurate. So much better and consistent. Love it so far.
Here in Berlin, predictions that it will rain or when it will rain are often too pessimistic because the city is a bit warmer and drier than the surrounding areas, which is where the airports are. Tegel, now closed is no the North West, Brandenburg airport is on the South East. They are about 20km apart. The long decommissioned Tempelhof is actually in the middle of the city but I doubt that there still is a weather station there.
Airports are the big consumers of, and important sources of weather data used for making predictions (in addition to satellite data, and weather stations elsewhere). It's more important that the predictions are correct there than 10-15 km away in the downtown areas.
Additionally, many weather apps aren't really precise about where their focus is. You set the city typically; not a postal code. So they'll predict it will rain in Berlin. But it's a big city and that doesn't mean it's going to rain everywhere in the city. It won't do neighborhood by neighborhood predictions. It's technically correct even if not a drop falls where you are. And of course professional users of weather predictions mainly care about the type of weather they need to plan for, which for airports is things like Thunderstorms, poor visibility, etc.
For short term planning, weather radar apps are popular here. Great stuff for guestimating whether you can get home by bike without getting caught up in a big shower. Thunderstorms are very common here throughout the summer but you can see the systems moving west to east hours in advance on the radar apps.
One of the major upgrades to the platform was to allow "day of use I-Loads." Effectively, they could update some constants in the shuttle software image, by literally patching new binary values into the code, while the vehicle was loaded and ready on the launch pad.
Then the game was to launch rockets to measure the upper atmosphere wind properties, convert them into usable constants, and then to update the software. It took the shuttle from having launch opportunities 30% of the time to having them 70% of the time later in the program.
Anyways..
This reminds me of variational noise (https://www.cs.toronto.edu/~graves/nips_2011.pdf).
If it is random noise on the input, it would be like many of the SSL methods, e.g. DINO (https://arxiv.org/abs/2104.14294), right?
Like, they consistenly called for freezing seasonal overnight lows many weeks before it was remotely probable. You'd get better predictions asking anyone who's lived here a couple years. In fairness, I'm in a region that's notoriously difficult to forecast, but the popular non-Google sources seem to be generating better predictions.
I wonder if the rollout of this new model is related (either occurred and made it worse, or will come and make it better).
I'd love to get some hard data. Are there any sites out there where you can compare past performance of different prediction models at a very localized scale?
For example, I have just added rainbow.ai short term precipitation forecast into https://weathergraph.app, and it's the best short term forecast I have ever used - based on radar data + AI prediction based on wind etc.
It sounds simple, but there is surprising complexity even just getting (in fast predicting) the 'ground truth' from the radar data, as each radar is noisy, is updated at a different time, might not work at a time ... so even the "current precipitation according to radars" is not a reading, but a result of ML model.
Making it vague-ish was a design choice to help curtail complaints of inaccuracy while still giving near-enough-to-accurate information to be useful generally speaking.
ForecastAdvisor will show you the accuracy of the major weather forecasters, including AerisWeather, Foreca, Microsoft, the National Weather Service, OpenWeather, The Weather Channel, Wetter.com, WeatherBit, World Weather Online, and others. They also provide links to your city's weather forecast from all the other weather forecasters, so you can compare for yourself.
FGN (and NVIDIA's FourCastNet-v3) show a new path forward that balances inference/training cost without sacrificing the sharpness of the outputs. And you get well-calibrated ensembles if you run them with random seeds to their noise vectors, too!
This is a much bigger deal than people realize.
However, an increase in the mean error at the same time out year over year (or between 2005 and 2025) is an indication of an issue, and that’s what we see.
Calculating the stability and structural requirements for a super-chimney to the tropopause, would require representative higher temporal frequency wind fields
Do you know if I can extract such a high time resolution from LENS since a cursory look at ERA5 showed a time resolution of just 1 hour?
The advantage of an ML model is that its usually possible to calculate the joint probability for a wind field, or to selectively generate a dataset with N-th percentile wind fields etc.
If its differentiable, and the structural stress assumptions are known, then one can "optimize" towards wind profiles that are simultaneously more dangerous and more probable, to identify what needs adressing. Thats why an ML model of local wind patterns would be desirable. ML is more than just LLM's. What one typically complains of in the context of LLM's: that there's no error bars on the output, is not entirely correct: just like differentiable ML models for physical and other phenomena they too allow to calculate the joint probability of sentences, except instead of modeling natural phenomena it is modelling what humans uttered in the corpus (or implicit corpus after RLHF etc). A base model LLM can quite accurately predict the likelihood of a human expressing a certain phrase, but thats modeling human expressions, not their validity. An ML model trained on actual weather data, or fine grained simulated weather data results in comparatively more accurate probability distributions, because physics isn't much of an opinion.