If you had $3 billion, xAI would choose to invest $2.5 billion in GPUs and $0.5 billion in talent. Deepseek, would invest $1 billion in GPUs and $2 billion in talent.
I would argue that the latter approach (Deepseek's) is more scalable. It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible.
TSMC has a factory in the USA now, ASML too. OpenAI, Google, xAI and Nvidia are natively in the USA.
While no other country is even close to build AI on their own.
Is the USA going to "own" the world by becoming the keeper of AI? Or is there an alternative future that has a probability > 0?
> Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.
eg: “the study of linguistics doesn’t help you build an LLM” or “you don’t need to know about chicken physiology to make a vision system that tells you how old a chicken is”
The author then uses a narrow and _unusual_ definition of what computation _means_, by saying it simply means access to fast chips, rather than the work you can perform on them, which would obviously include how efficiently you use them.
In short, this article misuses two terms to more simply say “looks like the scaling laws still work”.
So far, I haven't seen any indication that this is the case. And I'd say, hyped up speculations by people financially incentivized to hype AI should be taken with an entire mine full of salt.
In particular from a strategic level consider all the benefits of Mahan's "Influence of Sea Power"[0] and the benefit nations with suitable sea power / access to global markets has on being able to build the necessary infrastructure at scale quickly in order to support such endeavours. And this isn't just about raw "Sea Power" but all the concomitant requirements to achieve this, in particular the "Number of population", "Character of the people", and "Character of the government".
[0] Alfred Thayer Mahan; "The Influence of Sea Power on History"
The article explains how in reality the opposite is true. Especially when you look at it long term. Compute power grows exponentially, humans do not.
If one country moves along this direction faster than the others, no country will stand a chance to compete with them militarily or economically.
Having hardware and software suppliers all together makes it more likely even if you assume (like I do) that we're at least one paradigm shift away from the right architecture, despite how impressively general Transformers have been.
But software is easy to exfiltrate, so I think anyone with hardware alone can catch up extremely fast.
Currently there is no hard ROI on LLMs for example other than force bundling and using it to leverage soft outcomes (layoffs) and generating trash. User interest and revenue drops off fairly quickly. And there are regulations coming in elsewhere.
It’s really not looking good.
Second, how could AI not be the deciding geopolitical factor of the future? You expect progress to stop and AI not to achieve and surpass human intelligence?
A big lesson seems to be that one can rapidly close the gap, with much less compute, once paths have been blazed by others. There’s a first-mover disadvantage.
It's a lazy blog post that should be thrown out after a minute of thought by anyone in the field.
A word generator is not intelligence. There’s no “thinking” involved here.
To surpass human intelligence, you’d first need to actually develop intelligence, and llms will not be it.
In reality pushing the frontier on datacenters will tend to attract the best talent, not turn them away.
And in talent, it is the quality rather than the quantity that counts.
A 10x breakthrough in algorithm will compound with a 10x scaleout in compute, not hinder it.
I am a big fan of Deepseek, Meta and other open model groups. I also admire what the Grok team is doing, especially their astounding execution velocity.
And it seems like Grok 2 is scheduled to be opened as promised.
Another nit-pick: I don't think DeepSeek had 50k Hopper GPUs. Maybe they have 50k now after getting the world's attention and having national-sponsored grey market back them, but that 50k number is certainly dreamed-up. During the past year DeepSeek's intern recruitment ads always just mentioned "unlimited access to 10k A100s", suggesting that they may have very limited H100/H800s, and most of their research ideas were validated on smaller models on an Ampere cluster. The 10k A100 number matches with a cluster their parent hedge fund company announced a few years ago. All in all my estimation is they had more (maybe 20k) A100s, and single-digit thousands of H800s.
[1]: https://www.investopedia.com/terms/r/resource-curse.asp#:~:t...
The number of humans who could feasibly work on this problem is pretty high, and the labs could grow an order of magnitude, and still only be tapping into the top 1-2% of engineers & mathematicians. They could grow two orders of magnitude before they've absorbed all of the above-average engineers & mathematicians in the world.
As far as I know, Waymo is still not even remotely able to operate in any kind of difficult environment, even though insane amounts of money have been poured into it. You are vastly overstating its capabilities.
Is it cool tech? Sure. Is it safely going to replace all drivers? Doubt, very much so.
Secondly, this only works if progress in AI does not stagnate. And, again, you have no grounds to actually make that claim. It's all built on the fanciful imagination that we're close to AGI. I disagree heavily and think, it's much further away than people profiting financially from the hype tend to claim.
Even China has not yet managed to even remotely catch up with this hardware stack. Even though the trail has been blazed by ASML, TSMC and Nvidia.
https://en.wikipedia.org/wiki/Exception_that_proves_the_rule...
Also not for nothing scaling compute x100 or even x1000 is much easier than scaling talent by x10 or even x2 since you don’t need workers you need discovery.
Easily. Natural resources, human talent, land and supply chains all are and will be more important factors than AI
> You expect progress to stop
no
> and AI not to achieve and surpass human intelligence
yes
It really depends on how they go about it. It can easily instead end up with lots of people without work, no social security and disillusioned with the country. Instead of being economically great, the country may end up fighting uprisings and sabotage.
But how are animals with nerve-centres or brains different? What do we think us humans do differently so we are not just very big probabilistic prediction systems?
A completely different tack: if we develop the technology to engineer animal-style nerves and form them into big lumps called 'brains', in what way is that not artificial and intelligence? And if we can do that, what is to stop that manufactured brain from not being twice or ten times larger than a humans?
People think LLMs are intelligent because intelligence is latent within the text they digest, process and regurgitate. Their performance reflects this trick.
ASI also will not be magic. Like what exactly would it be doing that enables the country to subject the others? Develop new weapons? We already have the capability to destroy earth. Actually come to think of it, if ASI is an existential threat to other nations, maybe the rational action would be to nuke whichever country develops it first. To safe the world.
You see what I am saying? There is such a thing as the real world with real constraints.
In any AI R&D operation the bulk of the compute goes on doing experiments, not on the final training run for whatever models they choose to make available.
Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:
> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.
https://x.com/skdh/status/1892432032644354192
Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".
Or, as is already being done, use them to influence opinion and twist perception within tools and services that people already use, such as social media.
On the other hand if the economic benefit isn't shared across the whole of society it will become a destabilising factor and hence reduce the overall economic benefit it might have otherwise borne.
Reasoning models are trying to address that now, but monologueing in token-space still feels more like a hack than a real solution, but it does improve their performance a good bit nonetheless.
In practical terms all this means is that current LLMs still need a hell of a lot of hand holding and fail at anything more complex, even if their "System 1" thinking is good enough for the task (e.g. they can write Tetris in 30sec no problem, but they can't write SuperMarioBros at all, since that has numerous levels that would blow the context window size).
Their technical report on DeepSeek-V3 says that it "is trained on a cluster equipped with 2048 NVIDIA H800 GPUs." If they had even high-single-digit thousands of H800s they would have probably used more computing power instead of waiting almost two months.
People belive that LLM progress will become foundation of the future economy expansion the same way as microelectronics did. But for now there are few signs of that economic benefit from AI/LLM stuff. If one do math of what productivity increase tech should give in order to have positive ROI, one would be surprised how reality is far from feasibility of investments https://www.bruegel.org/working-paper/tension-between-explod.... Yes, anecdotally people tell stories how they can code twice/trice/ten times faster, or how they atumated their whole workflow or replaced support with LLM. But that's far not enough for AI investment feasibility in existing businesses (AI startups will flurish for a while on venture money). Also anecdotally there are many failed attempts to replace people with LLMs (like mcdonalds ordering which resulted in crazy orders).
So what we have is a hype on top of beliefs in progress as continious phenomena. But progress itself has slowed greately. Where are all breakthroughs which change our way of living? Pocket computers and consumer electronics (which is not a discovery rather an optimisation) and internet (also more about scaling than inventing) were the last. 3d printing, cancer treatment, robotics thought to be the new factors. Till AI/LLM. Now AIs/LLMs are the last resourt for believers in progress and technooptimists like Musk.
The post you quoted is not a Grok problem if other LLMs are also failing, it seems, to me, to be a fundamental failure in the current approach to AI model development.
Deepseek didn’t have option A or B available, they only had extreme optimisation option to work with.
It’s weird that people present those two approaches as mutually exclusive ones.
Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks
That's something I always wondered about, Goodhart's law is so obvious to apply to each new AI release. Even the fact that writers and journalists don't mention that possibility makes me instantly skeptical about the quality of the article I'm readingThere is a reason why startups innovate and large companies follow.
I think we can split the two.
a) I don't think there's anyone seriously doubting there's a lot of hype going around. But that's to be expected. Trust not the trust me bros, and so on. I think we agree here.
b) Progress hasn't slowed, and at the same time progress (in research) can stop tomorrow and we'd still have work to do for 20years, simply using what's on the table today. LLMs have solved "intent comprehension". RL seems to have solved "verifiable problems". The two can and will be used together to tackle "open ended questions". And together we'll see advancements in every aspect of our lives, on every repetitive mundane and boring task we have ever done.
2 anecdotes here:
- just before grok2 was released, they put it on livearena under a pseudonim. If you read the topics (reddit,x,etc) when that hit, everyone was raving about the model. People were saying it's the next 4o, that it's so good, hyped, so on. Then it launched, and they revealed the pseudonim, and everyone started shitting on it. There is a lot of bias in this area, especially with anything touching bad spaceman, so take "many people doubt" with a huge grain of salt. People be salty.
- there are benchmarks that seem to correlate very well with end to end results on a variety of tasks. Livebench is one of them. Models scoring highly there have proven to perform well on general tasks, and don't feel like they cheated. This is supported by the finding in that paper that found models like phi and qwen to lose ~10-20% of their benchmarks scores when checked against newly-built, unseen but similar tasks. Models scoring strongly on livebench didn't see that big of a gap.
Ah, yes. Google? Meta? Amazon? Microsoft? Hahaha. You're right, they aren't doing it in the open, and some certainly aren't doing a bad job about it. But they are all playing at world domination.
China has a realistic prospect of developing an independent stack.
It'll be very difficult, especially at the level of developing good enough semiconductor fabs with EUV. However, they're not starting from scratch in terms of a domestic semiconductor industry. And their software development / AI research capabilities are already near par with the US.
But they do have a whole of nation approach to this, and are willing to do whatever it takes.
A lot of them could be solved with pre-ai things. Many were self-inflicted by people badly designing and approving existing processes. I really don't see how AI is going to get us out of this one. I've been paid money to automate a few existing self-inflicted repetitive, mundane and boring tasks and the companies that created them are not even interested in talking about solving that. Some work the opposite way - in the US the value keeps being extracted from the tax filing system for example, even though we know how to remove the issue itself.
It's weird to discuss how the AI will automate everything when we're actively fighting simplification right now.
How could Musk LLM train on data that does not yet exist ?
In current LLM neural networks, the signal proceeds in one direction, from input, through the layers, to output. To the extend that LLM's have memory and feedback loops, it's that they write the output of the process to text, and then read that text and process it again though their unidirectional calculations.
Animal brains have circular signals and feedback loops.
There are Recurrent Neural Network (RNN) architectures, but current LLM's are not these.
A fundamentally new approach is needed, such as training AIs in phases, where instead of merely training them to learn to parrot their inputs, the first AI is used to critique and analyse the inputs, which is then used to train another model in a second pass, which is used to critique the data again, and so on, probably for half a dozen or more iterations. On each round, the model can learn not just what it heard, but also an analysis of the veracity, validity, and consistency.
Notably, something akin to this was done for training Deepseek, but only in a limited fashion.
P.S. - clarification: I mean not referring to talent at OpenAI. And yes I have very little doubt talent at DeepSeek is a lot cheaper than the things I listed above for OpenAI. I would be interested in a breakdown of the cost of OpenAI and seeing if even their technical talent costs more than the things I mentioned.
---
No benchmarks involved, just user preference.
Rank* (UB)Rank (StyleCtrl)ModelArena Score95% CIVotesOrganizationLicense 1 1
chocolate (Early Grok-3) 1402 +7/-6 7829 xAI Proprietary 2 4
Gemini-2.0-Flash-Thinking-Exp-01-21 1385 +5/-5 13336 Google Proprietary 2 2
Gemini-2.0-Pro-Exp-02-05 1379 +5/-6 11197 GoogleProprietary
I see this statement thrown around a lot and I don't understand why. We don't process information like computers do. We don't learn like they do, either. We have huge portions of our brains dedicated to communication and problem solving. Clearly we're not stochastic parrots.
> if we develop the technology to engineer animal-style nerves and form them into big lumps called 'brains'
I think y'all vastly underestimate how complex and difficult a task this is.
It's not even "draw a circle, draw the rest of the owl", it's "draw a circle, build the rest of the Dyson sphere".
It's easy to _say_ it, it's easy to picture it, but actually doing it? We're basically at zero.
With that said, I assume that every 'next' version of each engine is using my 'prompts' to train, so each new version has the benefit of having already processed my initial v1.0 and then v1.1 and then v1.2. So it is somewhat 'unfair' because for "ChatGTP v2024" my v1.0 is brand new while for "ChatGTP v2027" my v1.0, v1.1, v1.2 is already in the training dataset.
I haven't used Grok yet, perhaps it's time to pause that OpenAI payment and give Elon some $$$ and see how it works 'for me'.
She really needs to stop commenting on topics outside of theoretical physics.
Even in physics she does not represent the scientific consensus but has some very questionable fringe beliefs like labeling whole sub-fields as "scams to get funding".
She regularly speaks with "scientific authority" about topics she barely knows anything about.
Her video on autism is considered super harmful and misleading by actual autistic people. She also thinks she is an expert on trans-issues and climate change. And I doubt she know enough about artificial intelligence and computer science to comment on LLMs.
She doesn't say she is an expert on trans-issues at all! She analyzed the studies and looked at data and stated that there is no real transpendemic but highlighed an statistical increased numbers in young woman without stating a clear opinion on this finding.
The climate change videos do the same thing. She evaluates these studies discusses them to clarify that for her, certain numbers are unspecific and she also is not coming to a clear conclusion in sense of climate change yes, no, bad, good.
She is for sure not an expert in all fields, but her way of discussing these topics are based on studies, numbers and is a good viewpoint.
The funding scam you mention is a reference of "these people get billions for particle research but the outcome for us as society is way to small"
Nonetheless, i do not use grok and i do not try it out due to it being part of Musk.
I'm also not aware that Grok2 was communicated as the top model in any relevant timespan at all. Perhaps it just didn't deliver? Or a lot more people are not awaare of how to use it or boycot Musk.
After all he clearly doesn't care for any rules or laws it is probably a very high risk sending anything to grok.
* "I know how this works and it's just numbers all the way down" is not an argument of any validity, just to be clear- everything eventually is just physics, blind mechanics.
Crypto mining is an embarassingly parallel problem, requiring little to no communication between GPUs. To a first approximation, in crypto, 10x-ing the amount of "cores" per GPU, 10x-ing the number of GPUs per rig and 10X-ing the number of rigs you own is basically equivalent. An infinite amount of extremely slow GPUs would do just as well as one infinitely fast GPU. This is why consumer GPUs are great for crypto.
AI is the opposite. In AI, you need extremely fast communication between GPUs. This means getting as much memory per GPU as possible (to make communication less necessary), and putting all the GPUs all in one datacenter.
Consumer GPUs, which were used for crypto, don't support the fast communication technologies needed for AI training, and they don't come in the 80gb memory versions that AI labs need. This is Nvidia's price differentiation strategy.
I don't even think it makes sense to use the closed, API-access models for core business functions. Seems like an existential risk to be handing over that kind of information to a third party in the age of AI.
But handing it over to the guy who is simultaneously running for-profit companies that contract with the government and seizing control of the Treasury? To a guy who has a long history of SEC issues, and who was caught cheating at a video game just to convince everyone he's a genius? That just seems incredibly short-sighted.
Do you have any data to support 1. That grok is not more intelligent than previous models (you gave one anecdotal datapoint), and 2. That it was trained on more data than other models like o1 and Claude-3.5 Sonnet?
All datapoints I have support the opposite: scaling actually increases intelligence of models. (agreed, calling this "intelligence" might be a stretch, but alternative definitions like "scope, maybe, or flexibility, or coverage, or something" seem to me like beating around the bush to avoid saying that machines have intelligence)
Check out the technical report of Llama 3 for instance, with nice figures on how scaling up model training does increase performance on intelligence tests (might as well call that intelligence): https://arxiv.org/abs/2407.21783
Especially not in such politically-charged fields that require deeper knowledge about the historical context, the different interest groups and their biases and so on.
Her video on trans-issues labels people that advocate for the rights of trans-people as "extremists" and presents transphobic talking points as valid part of the scientific discussion.
Her trying to appear "neutral" and "just presenting the science" is exactly the issue. Using her authority as a scientist when talking about topics she has no expertise in.
Here is a debunking of her video on trans-issues: https://www.youtube.com/watch?v=r6Kau7bO3Fw
Here is a longer criticism of her video on autism: https://www.youtube.com/watch?v=vaZZiX0veFY
Yes, I'd say so. Bear in mind that, outside of the Terminally Online, very few people would deliberately hobble their business by deliberately choosing an inferior product.
But a point was made to make it less parallel. For example, Ethereum uses DAG, making requirement to have 1 GB RAM, GPU was not enough.
https://ethereum.stackexchange.com/questions/1993/what-actua...
Also any GPUs are now several generations old, so their FLOPS/watt is likely irrelevant.
On Internet comment sections, that's not clear to me. Memes are incredibly infectious, and we can see by looking at, say, a thread about Nvidia. It's inevitable that someone is going to ask about a moat. In a thread about LLMs, the likelihood of stoichastic parrots getting a mention approaches one, as the thread gets longer. what does it all mean?
If Musk really can get 1 million GPUs and they incorporate some algorithmic improvements, it'll be exciting to see what comes out.
You're not even using your real name here. Nobody knows if you have any scientific qualifications, or a university degree at all.
There are "leaderboards" there that provide more anecdotal data points than my two.
Climate chance is settled science. To claim that "certain numbers are unspecific, so I can't say whether climate change is real or not, or whether it's good or bad" (which, based on your paraphrasing, is what it sounds like she said) is an unacceptable position. It's muddying the waters.
I'm not going to go watch her content about trans people, but it sounds like the same thing: Muddying the waters by Just Asking Questions about anti-trans "social contagion" talking points.
---
EDIT: Okay I went back and watched some clips of her anti-trans video. She takes a pseudoscientific theory based on an opinion poll of parents active on an anti-trans web forum and suggests we take it seriously because "there is no conclusive evidence for or against it," as if the burden of proof weren't on the party making the positive claim, and as if the preponderance of evidence and academic consensus didn't overwhelmingly weigh against it. It's textbook laundering of pseudoscience. You've significantly misrepresented her position.
I have not noticed companies or colleagues 10x'ing (hell, or even 1.5x'ing) their productivity from these tools. What am I missing?
Maybe LLMs can't work on huge code bases yet, but for writing bespoke software for individuals who need a computer to do xyz but can't speak the language, it already is working wonders.
Being dismissive of LLMs while sitting above their current scope of capabilities gives strong Microsoft 2007 vibes; "The iPhone is a laughable device that presents no threat to windows mobile".
Obviously more computing power makes the computer better. That is a completely banal observation. The rest of this 2000-word article is groping around for a way to take an insight based on the difference between '70s symbolic AI and the neural networks of the 2010s and apply it to the difference between GPT-4 and Grok 3 off the back of a single set of benchmarks. It's a bad article.
I think it even quietly eliminates the input without telling you. Nobody is putting serious work tasks in 2000 tokens on Arena.
The lesson you should have learned is Arena is a dumb metric, not that people have unfounded biases against Grok 2. (Which I've used on Perplexity and found to be unimpressive.)
The other thing is dumb, low quality statements are all over reddit and Twitter about any "hype" topic, including mysterious new models on arena. So it isn't surprising you encountered that for Grok 2, but you could have said the same thing for Gemini models.
If reddit can be believed, Wizard LM 2 was so much better than OpenAI models that Microsoft had to cancel it so OpenAI wouldn't be driven out of business.
People say all sorts of dumb stuff on social media.
> It's unproductive to treat science as anything more than an ongoing, constantly improving process.
It's unproductive to constantly re-litigate questions like "is germ theory true" or "is global warming real" in the absence of any experimental results that seriously challenge those theories. Instead, we should put our effort into advancing medicine and fixing climate change, predicated on the settled science which makes both those fields possible.
I’m autistic and I just watched her video. I found it to be one of the best primers on autism I’ve seen. Not complete, of course, and there’s a lot more nuance to it, but very even handed. She doesn’t make any judgements. She just gives the history and talks about the controversies without choosing sides, except to say that the idea of neurodiversity seems reasonable to her. When compared to most of the discourse about autism, it stands up pretty well. Of course, there’s a lot more I want people to know about autism, but it’s an ok start.
Actually, many autistic people (myself included) would find your statement far more harmful. You assume that all autistic people think alike and believe the same thing. This is false. You try to defend us without understanding us.
Don’t do that.
I suppose there’s a possibility that you’re autistic and found it harmful to you. If so, don’t speak for me.
And she was commenting on an AI’s knowledge of Bell’s Inequality, which is PHYSICS. If she can’t comment on that, who can?
I'm not sure if it's close to 100x more. xAI had 100K Nvidia H100s, while this is what SemiAnalysis writes about DeepSeek:
> We believe they have access to around 50,000 Hopper GPUs, which is not the same as 50,000 H100, as some have claimed. There are different variations of the H100 that Nvidia made in compliance to different regulations (H800, H20), with only the H20 being currently available to Chinese model providers today. Note that H800s have the same computational power as H100s, but lower network bandwidth.
> We believe DeepSeek has access to around 10,000 of these H800s and about 10,000 H100s. Furthermore they have orders for many more H20’s, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months. These GPUs are shared between High-Flyer and DeepSeek and geographically distributed to an extent. They are used for trading, inference, training, and research. For more specific detailed analysis, please refer to our Accelerator Model.
> Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters. Similarly, all AI Labs and Hyperscalers have many more GPUs for various tasks including research and training then they they commit to an individual training run due to centralization of resources being a challenge. X.AI is unique as an AI lab with all their GPUs in 1 location.
https://semianalysis.com/2025/01/31/deepseek-debates/
I don't know how much slower are these GPUs that they have, but if they have 50K of them, that doesn't sound like 100x less compute to me. Also, a company that has N GPUs and trains AI on them for 2 months can achieve the same results as a company that has 2N GPUs and trains for 1 month. So DeepSeek could spend a longer time training to offset the fact that have less GPUs than competitors.
I can’t wait for others to call her further out for being herself the biggest grifter of them all, bigger than most she tries to take down.
Can you really describe a process like EUV lithography an optimisation? I mean it requires matter to be controlled and manipulated in a way that would have been regarded as pure science fiction 20 years ago. Also the material science that provided Gallium Nitride electronics in our communcation system is rather amazing. There are other things as well - I have an electric car with a battery that lets me travel for hundreds of kilometers, if I trusted it, it could take me there without me operating it (much). I know where I am in London and where I am going because of satellites in geosynchronous orbits and calculations that use relativity. Last year I got an mRNA vaccine, that's pretty new... and pretty pervasive. I've seen rockets that fly up into space and then turn around and come back down to land, and I spend my days talking face to face with people on the other side of the world. I've never met half of them, but they greet me as an old friend.
How is it that you can't see any of these wonders that have sprung into being in our life times?
If every single human on earth was an identical clone with the same cultural upbringing and similar language conversation choices and opinions and feelings, they still wouldn't work like an LLM and still wouldn't be stochastic parots.
She makes arguments, forcefully. That’s good. That’s what science is supposed to be. I don’t agree with her on everything, but I find her arguments engaging, and sometimes convincing, sometimes not. But her process is not dogmatic, as you’re trying to make it out to be.
You need to understand that every single theory will be improved upon in the future. That means they will change and it's impossible to predict if these improvements will have consequences in different contexts where people incorrectly claim the science is settled.
> It's unproductive to constantly re-litigate questions like "is germ theory true" or "is global warming real"
Can you think of any cases where the science had nearly full consensus and it was useful to re-litigate? Galileo isn't the only example. I can think of many.
I agree! Before one may touch the pink sceptre, they must be permitted through the gate, and kissed by the doddling sheep, Harry, who will endow them with permission to pass and comment on many a great manor of thing which are simply out of reach of the natural human mind without these great blessings which we bestow. And, amen.
I am neurodivergent myself but (probably) not autistic. The first time I watched the video I actually didn't think that it was that bad and had a similar reaction to you. But once I started to think about it and educate myself more on the topic I realized how bad it is.
Sure it is not the worst video on autism but it still promotes some really bad ableist views.
Autism speaks it is a horrible hate organization. I don't think there is a spoiler tag here so please skip this paragraph if ableism is triggering to you but there is a video of the autism speaks founder where she talks about how she at one point wanted to kill herself and her autistic child because she couldn't cope with having an autistic child and only didn't do it because of her other non-autistic child. She says that while her autistic child is in the background.
I also didn't know about "aspie supremacy" and why people still use the term "asperger" despite being outdated. Hans Asperger was a Nazi Scientist who is responsible for killing thousands of children. He thought some autistic children might be useful as future scientists for the Nazi regime so assigned them the diagnoses "asperger syndrome" while the other autistic children were to be murdered.
I recommend you watch Ember Green on this topic: https://www.youtube.com/watch?v=vaZZiX0veFY
There's a lot of attention being paid to metrics that often don't align all that well with actual production use-cases, and frankly the metrics are good but hardly breath-taking.
They have an absolutely insane outlay of additional compute, which appears to have given them a relatively paltry increase in capabilities.
15 times the compute for 5-15% better performance is basically the exact opposite of the bitter lesson.
Hell - it genuinely seems like the author didn't even read the actual bitter lesson.
The lesson is not "scale always wins" the lesson was "We have to learn the bitter lesson that building in how we think we think does not work in the long run."
And somewhat ironically - the latest advances seem to genuinely undermine the lesson. It turns out that building in reasoning/thinking (a heuristic that copies human behavior) is the biggest performance jump we've seen in the last year.
Does that mean we won't scale out of the current capabilities? No, we definitely might. But we also definitely might not.
The diminishing returns we're seeing for scale hint strongly that just throwing more compute at the problem is not enough by itself. Possibly still required, but definitely not sufficient.
the risk is mitigated by competition in the space and commoditization of LLMs over time. This isn't like building an app solely on Facebook platform or something. It's very easy to swap models. There's platforms that let you use multiple models with the same interface, etc.
I had specifically mentioned her by name in my criticisms, and I was given a written warning that doing so went against HN's policy on "targeted attacks" or "targeted harassment" or something similar.
Why is it okay for this user to suggest that the act of this woman presenting her work publicly "is the problem", while it is a HN AUP offense for me to criticize the quality of the source code contributions written by another woman presenting her work publicly?
I'm not requesting enforcement against this user or a retroactive removal of my warning, I'm just trying to understand the difference better to improve the conformance of my own discourse to the intent of HN's AUP.
"Math"? Fields Medal level? Tenure? Ph.D.? ... high school plane geometry???
As in
'Grok 3 AI and Some Plane Geometry'
at
https://news.ycombinator.com/item?id=43113949
Grok 3 failed at a plane geometry exercise.
While you didn't say ALL, you didn't clarify, so your wording says that autistic people categorically think her video is super harmful and misleading. That's simply not true.
I'm in a bit of a minority in that I see autism through the neurodiversity lens, but I also think this tribal us-vs-them mentality is doing us more harm than good. Tarring and feathering people for not getting everything exactly "right" by my standards isn't helping anyone. It just causes people to throw up their hands and vote Trump.
So, while I disagree with Autism Speaks on pretty much every point and I think they're extremely harmful, labeling them as a hate group is self-defeating. Parents with high-needs autistic children look to Autism Speaks. These parents love their children, but are being mislead. When you push people, they push back. If we shout "You're a hate group!" then all dialogue stops, and we can't help their children. And helping those children the important thing.
Ironically, I think the problem for Sabine is simply miscommunication, which is due to her probably being autistic and not communicating according to neurotypical standards. She ended the video by showing how she scored high on an online test (which isn't definitive, of course), but then she dismisses it out of hand.
I dismissed those tests too when I first took them, but that's because I still didn't really know what autism is, even after I did tons of research. I couldn't really know that by reading studies or talking to a psychologist. I didn't really know until I met other autistic people and realized that they actually "got" me in a way that no one else ever has.
Her behavior is VERY typical of autistic people. Monotone voice, hyperlogical, hypermoral. She quit a successful career in physics for moral reasons. No neurotypical person would do that. An autistic person would. She wears the same velour shirt in every video. Sensory issues(?), repetitive behavior.
It's up to her to make the call as to whether she's actually autistic or not, but I see her as one of us.
And... I'm going to need a TLDR on that Ember Green video. It's a 2 hour commentary on a 25 minute video. Instead of asking me to unpack the arguments, make them.
I see people getting replaced by AI left and right.
Translators, illustrators, voice over artists, data researchers, photographers, models, writers, personal assistants, drivers, programmers ...
If you want to contact Dan, email HN and make your case. The concat information is at the bottom of the page.
I think the idea that the United States "owns" Grok 3 would be news to Musk and the idea it "owns" ChatGPT would be news to Altman.
You don't seem to understand how scientific models and theories work.
In fact, germ theory of medicine is much the same way. Germ Theory does not explain or predict or account for ALL disease, for example PTSD, and if you build a useful theory for mental illnesses that aren't caused by little creatures of some sort, that doesn't overturn germ theory, it compliments it. A person creating a new theory of how Long Covid hurts people for example may not stick strictly to germ theory, but that would STILL not overturn germ theory.
>Can you think of any cases where the science had nearly full consensus and it was useful to re-litigate? Galileo isn't the only example.
Galileo isn't an example of the science being "settled" and someone radically overturning it. Nobody believed in geocentrism due to "Science", which is also why Galileo had so much difficulty, it was literally a religious matter. Kepler was about as close as we had to any sort of consistent theory to how the heavenly bodies moved, and it was not at all settled, and yet he was still basically right
In actuality, there are remarkably few times where a theory was entirely overturned, especially by a new theory. When we know little enough about a field that we could get something so wrong, we usually don't have much in the way of "theory" and are still spitballing, and that's not considered settled science. If you want a good feeling for what this looks like, go read up on the debates science had when we first started looking at Statistical Mechanics and basics of thermodynamics. There were heated(lol) debates about the very philosophy of science, and whether we should really rely on theories that don't seem like they are physical, and that mostly went away as it continued to bear high quality predictions. The problems and places where theories are not great are usually well understood by the very scientists who work through a theory, because understanding the parameter space and confidence intervals for a theory are a requirement of using that theory successfully.
"Human CO2 and other pollutants are the near totality of the cause of the globe warming" is settled science.
"The globe is warming" is settled science
"Global warming will cause changes in micro and macro climates all over" is settled science
"A hotter globe will result in more energetic, chaotic, and potentially destructive weather" is settled science and obvious
"Global warming is going to kill us all in a decade" is NOT settled science. There is no settled science for how bad climate change will make things for us, who will be worst affected, who might benefit, etc. There is comprehensive agreement among climate scientists that global warming is harmful to our future, and something we have to try and reduce the effect of, prepare for the outcomes of, and adapt to the consequences of, and something that, whether we do anything to combat it, will be immensely costly to handle.
You've proved my entire point in your very first sentence and then go on to say I don't seem to understand how scientific models and theories work.
> Nobody believed in geocentrism due to "Science"...
This isn't a serious argument. Feel free to look up the works of Aristotle, who is sometimes called the first scientist.
I don't have the energy to address the rest of your incorrect conjecture.
More specifically, even particle physicists admit that a 2x or even a 10x bigger accelerator is not expected to find anything fundamentally new.
The core criticism is that it has become a self-licking ice cream cone that serves no real purpose other than keeping physicists employed.
She’s saying something much more specific about the earliest moments (milliseconds!!) of the history of the universe, and before that time, of which we have practically zero observational data, and can’t ever visit even in principle.
She’s arguing against woo, against science fiction, against unsupported what-if musings that are fun to talk about — but are not science.
My favorite example I can never share in polite company is that the (still sota) best image segmentation algorithm I ever saw was done by a guy labeling parts of the vagina for his stable diffusion fine tune pipeline. I used what he'd done as the basis for a (also sota 2 years later) document segmentation model.
Found him on a subreddit about stable diffusion that's now completely overrun by shitesters and he's been banned (of course).
You need a ton of specialized knowledge to use compute effectively.
If we had infinite memory and infinite compute we'd just throw every problem of length n to a tensor of size R^(n^n).
The issue is that we don't have enough memory in the world to store that tensor for something as trivial as mnist (and won't until the 2100s). And as you can imagine the exponentiated exponential grows a bit faster than the exponential so we never will.
I don't know what that something is, so I think I should go listen to Sabine Hossenfelder.
Just because you can tie a person to work they have performed using public records does not seem like it should put them on the same level as someone who communicates with and creates work directly to the public, or a public figure. Not even if some of the actual work itself is performed in some open and observable space -- For example I don't think one has any more or less moral right to commentate on and publicly critique the work of a carpenter working on a building scaffold that's easily observable from the public street, than one does about a programmer working on their own idea from their own home in private. That seems like the immediate obvious difference between the two situations you describe. They don't sound equivalent at all, so I don't think you can win your case on that angle.
But work by "non-public-figures" is frequently posted about and commented on at Hackernews. Obviously open source work is a significant source of such discussion simply because it is accessible. Therefore, clearly it's not entirely verboten to talk about that. Is it permitted to criticize? I don't have a particular example at hand but I'm quite certain that I've seen negative opinions about people's work on this site from time to time. I think this is the angle you could argue your case. Was it fair and consistent that yours was called an attack or harassment? Are similar criticisms of work by non-public-figures permitted on here? Without the full context we can't answer that.
Just based on the comparisons linked in the article, it's not "co-state-of-the-art", it's the clear leader. You might argue those numbers are wrong or not representative, but you can't accept them then claim it's not outperforming existing models.
They start by attacking a physicist for being "neutral" and "just presenting the science" - exactly the kind of delegitimization of objectivity we see in early stages of information control Notice how they frame staying neutral as actively harmful - it's not just "wrong," but presented as dangerous because it doesn't take a strong enough stance against what they view as "extremist" positions Most tellingly, they're not arguing that her analysis is incorrect. Their complaint is that she's even allowing certain viewpoints to be examined objectively at all.
This maps directly to historical patterns where:
1. First you attack individuals for being neutral
2. Then you establish that certain topics are "beyond" neutral analysis
3. Finally you create an environment where examining data objectively becomes seen as suspicious or harmful
This HN comment is a perfect micro-example of this - it's not even sophisticated gatekeeping, it's raw "how dare you look at this objectively when you should be taking my side." This kind of thinking, multiplied across society and amplified by modern media, is exactly how larger patterns of information control take hold.
The amount of “work” done there is staggering and yet adoption appears abysmal, using such solutions with success only happens as part of a really “well oiled” machine.
And what about the simple difficulty going from 99% to 99.9%. What percentage are we even talking about today? We don’t know, but very rich people think it is cool and blindly keep investing more billions.