[1]https://blog.google/products/google-cloud/ironwood-tpu-age-o...
[1]https://blog.google/products/google-cloud/ironwood-tpu-age-o...
Modern BERT with the extended context has solved natural language web search. I mean it as no exaggeration that _everything_ google does for search is now obsolete. The only reason why google search isn't dead yet is that it takes a while to index all web paged into a vector database.
And yet it wasn't google that released the architecture update, it was hugging face as a summer collaboration between a dozen people. Google's version came out in 2018 and languished for a decade because it would destroy their business model.
Google is too risk averse to do anything, but completely doomed if they don't cannibalize their cash cow product. Web search is no longer a crown jewel, but plumbing that answering services, like perplexity, need. I don't see google being able to pull off an iPhone moment where they killed the iPod to win the next 20 years.
The web UI for people using search may be obsolete, but search is hot, all AIs need it, both web and local. It's because models don't have recent information in them and are unable to reliably quote from memory.
Google's cash-cow product is relevant ads. You can display relevant ads in LLM output or natural language web-search. As long as people are interacting with a Google property, I really don't think it matters what that product is, as long as there are ad views. Also:
> Web search is no longer a crown jewel, but plumbing that answering services, like perplexity, need
This sounds like a gigantic competitive advantage if you're selling AI-based products. You don't have to give everyone access to the good search via API, just your inhouse AI generator.
Bryce Bayer worked for Kodak when he invented and patented the Bayer pattern filter used in essentially every colour image sensor to this day.
But the problem was: Kodak had a big film business - with a lot of film factories, a lot of employees, a lot of executives, and a lot of recurring revenue. And jumping into digital with both feet would have threatened all that.
So they didn't capitalise on their early lead - and now they're bankrupt, reduced to licensing their brand to third-party battery makers.
> You can display relevant ads in LLM output or natural language web-search.
Maybe. But the LLM costs a lot more per response.
Making half a cent is very profitable if you only take 0.2s of CPU to do it. Making half a cent with 30 seconds multiple GPUs, consuming 1000W of power... isn't.
As a business Google's interest is in showing ads that make it the most money - if they quickly show just the relevant information then Google loses advertising opportunities.
To an extent, it is the web equivalent of irl super markets intentionally moving stuff around and having checkout displays.
This is just a question of UX- the purpose of their search engine was already to show the most relevant information (ie. links), but they just put some semi-relevant information (ie. sponsored links) first, and make a fortune. They can just do the same with AI results.
I do think Google is a little different to Kodak however; their scale and influence is on another level. GSuite, Cloud, YouTube and Android are pretty huge diversifications from Search in my mind even if Search is still the money maker...
> I’m forgetting something. Oh, of course, Google is also a hardware company. With its left arm, Google is fighting Nvidia in the AI chip market (both to eliminate its former GPU dependence and to eventually sell its chips to other companies). How well are they doing? They just announced the 7th version of their TPU, Ironwood. The specifications are impressive. It’s a chip made for the AI era of inference, just like Nvidia Blackwell
The improvement has been steady and impressive. The entire integration is becoming a product that I want to use.
Google has their own cloud with their data centers with their own custom designed hardware using their own machine learning software stack running their in-house designed neural networks.
The only thing Google is missing is designing a computer memory that is specifically tailored for machine learning. Something like processing in memory.
People like to believe CEOs aren't worth their pay package, and sometimes they're not. But a look at a couple of their failures and a different CEO of Kodak wouldn't have had what happened happen, makes me think that sometimes, some of them do deserve that.
Once the space settles down, the balance might tip towards specialized accelerators but NVIDIA has plenty of room to make specialized silicon and cut prices too. Google has still to prove that the TPU investment is worth it.
But I am not sure how AWS and Google Cloud match up in terms of making this verticial integration work for their competitive advantage.
Any insight there - would be curious to read up on.
I guess Microsoft for that matter also has been investing -- we heard about the latest quantum breakthrough that was reported as creating a fundamenatally new physical state of matter. Not sure if they also have some traction with GPUs and others with more immediate applications.
When a fool inevitably takes the throne, disaster ensues.
I can't say for sure that a different system of government would have saved Kodak. But when one man's choices result in disaster for a massive organization, I don't blame the man. I blame the structure that laid the power to make such a mistake on his shoulders.
Even on the few Vaios that had MD drives on them, they're pretty much just an external MD player permanently glued to the device instead of being a full and deeply integrated PC component.
Now for the life of me, I still haven't been able to understan what a TPU is. Is it Google's marketing term for a GPU? Or is it something different entirely?
1. Search ads (at risk of disintermediation) 2. Display ads (not going anywhere) 3. Ad-supported YouTube 4. Ad-supported YouTube TV 5. Ad-supported Maps 6. Partnership/Ad supported Travel, YouTube, News, Shopping (and probably several more) 7. Hardware (ChromeOS licensing, Android, Pixel, Nest) 8. Cloud
There are probably more ad-supported or ad-enhanced properties, but what's been shifting over the past few years is the focus on subscription-supported products:
1. YouTube TV 2. YouTube Premium 3. GoogleOne (initially for storage, but now also for advanced AI access) 4. Nest Aware 5. Android Play Store 6. Google Fi 7. Workspace (and affiliated products)
In terms of search, we're already seeing a renaissance of new options, most of which are AI-powered or enhanced, like basic LLM interfaces (ChatGPT, Gemini, etc), or fundamentally improved products like Perplexity & Kagi. But Google has a broad and deep moat relative to any direct competitors. Its existential risk factors are mostly regulation/legal challenge and specific product competition, but not everything on all fronts all at once.
They started becoming available internally in mid 2015.
It's a chip (and associated hardware) that can do linear algebra operations really fast. XLA and TPUs were co-designed, so as long as what you are doing is expressible in XLA's HLO language (https://openxla.org/xla/operation_semantics), the TPU can run it, and in many cases run it very efficiently. TPUs have different scaling properties than GPUs (think sparser but much larger communication), no graphics hardware inside them (no shader hardware, no raytracing hardware, etc), and a different control flow regime ("single-threaded" with very-wide SIMD primitives, as opposed to massively-multithreaded GPUs).
It's not a GPU, as there is no graphics hardware there anymore. Just memory and very efficient cores, capable of doing massively parallel matmuls on the memory. The instruction set is tiny, basically only capable of doing transformer operations fast.
Today, I'm not sure how much graphics an A100 GPU still can do. But I guess the answer is "too much"?
So are the electric and cooling costs at Google's scale. Improving perf-per-watt efficiency can pay for itself. The fact that they keep iterating on it suggests it's not a negative-return exercise.
While Nv does have an unlimited money printer at the moment, the fact that at least some potential future competition exists does represent a threat to that.
Crawling the web has a huge moat because a huge number of sites have blocked 'abusive' crawlers except Google and possibly Bing.
For example just try to crawl sites like Reddit and see how long before you're blocked and get a "please pay us for our data" message.
Also worth noting that its Ads division is the largest, heaviest user of TPU. Thanks to it, it can flex running a bunch of different expensive models that you cannot realistically afford with GPU. The revenue delta from this is more than enough to pay off the entire investment history for TPU.
> The revenue delta from this is more than enough to pay off the entire investment history for TPU.
Possibly; such statements were common when I was there too but digging in would often reveal that the numbers being used for what things cost, or how revenue was being allocated, were kind of ad hoc and semi-fictional. It doesn't matter as long as the company itself makes money, but I heard a lot of very odd accounting when I was there. Doubtful that changed in the years since.
Regardless the question is not whether some ads launches can pay for the TPUs, the question is whether it'd have worked out cheaper in the end to just buy lots of GPUs. Answering that would require a lot of data that's certainly considered very sensitive, and makes some assumptions about whether Google could have negotiated private deals etc.
So GPUs have ~120 small systolic arrays, one per SM (aka, a tensorcore), plus passable off-chip bandwidth (aka 16 lines of PCI).
Where has TPUs have one honking big systolic array, plus large amounts of off-chip bandwidth.
This roughly translates to GPUs being better if you're doing a bunch of different small-ish things in parallel, but TPUs are better if you're doing lots of large matrix multiplies.
I'm not sure what you're trying to deliver here. Following your logic, even if you have a fab you need to compete for rare metals, ASML etc etc... That's a logic built for nothing but its own sake. In the real world, it is much easier to compete outside Nvidia's own allocation as you get rid of the critical bottleneck. And Nvidia has all the incentives to control the supply to maximize its own profit, not to meet the demands.
> Possibly; such statements were common when I was there too but digging in would often reveal that the numbers being used for what things cost, or how revenue was being allocated, were kind of ad hoc and semi-fictional.
> Regardless the question is not whether some ads launches can pay for the TPUs, the question is whether it'd have worked out cheaper in the end to just buy lots of GPUs.
Of course everyone can build their own narratives in favor of their launch, but I've been involved in some of those ads quality launches and can say pretty confidently that most of those launches would not be launchable without TPU at all. This was especially true in the early days of TPU as the supply of GPU for datacenter was extremely limited and immature.
More GPU can solve? Companies are talking about 100k~200k of H100 as a massive cluster and Google already has much larger TPU clusters with computation capability in a different order of magnitudes. The problem is, you cannot simply buy more computation even if you have lots of money. I've been pretty clear about how relying on Nvidia's supply could be a critical limiting factor in a strategic point of view but you're trying to move the point. Please don't.
Today a consumer grade >8b decoder only model does a better job of predicting if some (long) string of text matches a user query than any bespoke algorithm would.
The only reason why encoder only models are better than decoder only models is that you can cache the results against the corpus ahead of time.
You're talking about small-money bets. The technical infrastructure group at Google makes a lot of them, to explore options or hedge risks, but they only scale the things that make financial sense. They aren't dumb people after all.
The TPU was a small-money bet for quite a few years until this latest AI boom.
I've been wondering for some time what sustainable advantage will end up looking like in AI. The only obvious thing is that whoever invents an AI that can remember who you are and every conversation it's had with you -- that will be a sticky product.
Google is catching up fast on product though.
95% of our load is from crawlers, so we have to pick who to serve.
If they want our data all they need to do is offer a way for us to send it, we're happy to increase exposure and shopping aggregation site updates are our second highest priority task after price and availability updates.
Edit: And btw, another question that I had had before was what's the difference between a tensor core and a GPU, and based on your answer, my speculative answer to that would be that the tensor core is the part inside the GPU that actually does the matmuls.
The cost delta was massive and really quite astounding to see spelled out because it was hardly talked about internally even after the paper was written. And if you took into account the very high comp Google engineers got, even back then when it was lower than today, the delta became comic. If Gmail had been a normal business it'd have been outcompeted on price and gone broke instantly, the cost disadvantage was so huge.
The people who built Gmail were far from dumb but they just weren't being measured on cost efficiency at all. The same issues could be seen at all levels of the Google stack at that time. For instance, one reason for Gmail's cost problem was that the underlying shared storage systems like replicated BigTables were very expensive compared to more ordinary SANs. And Google's insistence on being able to take clusters offline at will with very little notice required a higher replication factor than a normal company would have used. There were certainly benefits in terms of rapid iteration on advanced datacenter tech, but did every product really need such advanced datacenters to begin with? Probably not. The products I worked on didn't seem to.
Occasionally we'd get a reality check when acquiring companies and discovering they ran competitive products on what was for Google an unimaginably thrifty budget.
So Google was certainly willing to scale things up that only made financial sense if you were in an environment totally unconstrained by normal budgets. Perhaps the hardware divisions operate differently, but it was true of the software side at least.
I think they've jumped the shark and need to give me more control because currently I actively avoid watching videos I think MIGHT be interesting because the risk is too high. This is a terrible position to put your users in both from a specific experience perspective but also in a "how they feel about your product" perspective.
I've build RAG systems that index tokens in the 1e12 range and the main thing stopping us from having a super search that will make google look like the library card catalogue is the copyright system.
A country that ignores that and builds the first XXX billion parameter encoder only model will do for knowledge work what the high pressure steam engine did for muscle work.
I see search engines as a dripfeed from a firehose, not some magical thing that's going to get me the 100% correct 100% accurate result.
Humans are the most prolific liars; I could never trust search results anyway since Google may find something that looks right but the author may be heavily biased, uninformed and all manner of other things anyways.
Constantly I see them dodging responsibility or resigning (as an "apology") during a crisis they caused and then moving on to the next place they got buddies at for another multi-mil salary.
Many here would defend 'em tho. HN/SV tech people seem to aspire to such things from what I've seen. The rest of us just really think computers are super cool.
The CEO takes the blame, the board picks a new one (Unless the CEO has special shares that make them impossible to dismiss), and we go on hoping that the king isn't an idiot this time.
My reading of history is that some people are fools - we can blame them for their incompetence or we can set out to build foolproof systems. (Obviously, nothing will be truly foolproof. But we can build systems that are robust against a minority of the population being fools/defectors.)
"It was there."
Everyone understands that such naysaying is effectively an accusation of lying. In any case it was a totally low effort utterly inappropriate comment. Clearly you aren't going to learn the lesson.
You can remove videos from your viewing history. I do this when I start watching something but the content turnes out to not be what I expected. It seems to prevent polluting my recommendations.