Other than that, it seems to have sunk without a trace.
IIRC Transmeta's technology came out of HP (?) research into dynamic inlining of compiled code, giving performance comparable to profile-guided optimization without the upfront work. It worked similarly to an inlining JIT compiler, except it was working with already compiled code. Very interesting approach and one I think could be generally useful. Imagine if, say, your machine's bootup process was optimized for the hardware you actually have. I'm going off decades old memories here, so the details might be incorrect.
Similar technology was developed later by Nvidia, which had licensed Transmeta's IP, for the Denver CPU cores used in the HTC Nexus 9 and the Carmel CPU cores in the Magic Leap One. Denver was originally intended to target both ARM and x86 but they had to abandon the x86 support due to patent issues.
It was wrong, but it was controversial among experts at the time.
I’m glad that they tried it even though it turned out to be wrong. Many of the lessons learned are documented in systems conferences and incorporated into modern designs, ie GPUs.
To me transmeta is a great example of a venture investment. If it would have beaten Intel at SPEC by a margin, it would have dominated the market. Sometimes the only way to get to the bottom of a complex system is to build it.
The same could be said of scaling laws and LLMs. It was theory before Dario, Ilya, OpenAI, et al trained it.
However, if you add it onto a better CPU it’s a fine technique to bet on - case in point Apple’s move away from Intel onto homegrown CPUs.
This page is not here yet.
The product hype and lack of knowledge about what it was meant that nobody knew what to expect. In these hyped expectations, and with Torvalds on board, everyone expected that everything would be different. But it wasn't.A similar product launch was the Segway, where we went from this incredible vision of everyone on Segways to nobody wanting one.
The hype was part of the problem with Transmeta. Even in it's delivered form it could have found a niche. For example, the network computer was in vogue at the time, thanks to Oracle. A different type of device, like a Chromebook might have worked.
With Torvalds connected to Transmeta and the stealthy development, we never did get to hear about who was really behind Transmeta and why.
Then, https://web.archive.org/web/20000229173916/http://www.transm... , when content appeared around Feb 2000.
Product launch PDF from Jan 19, 2000: https://web.archive.org/web/20000815231116/http://www.transm...
x86 ISA had the funny advantage of being way closer to RISC than "beloved" CISC architectures of old like m68k or VAX. Many common instructions translate to single "RISCy" instruction for the internal microarchitecture (something AMD noted IIRC in the original K5 with its AMD29050-derived core as "most instructions translate to 1 internal microinstruction, some between 2 to 4"). X86 prefixes are also way simpler than the complicated logic of decoding m68k or VAX. An instruction with multiple prefixes will quite probably decode to single microinstruction.
That said, there's funny thing in that Transmeta tech survived quite a long way to the point that there were Android tablets, in fact flagship Google ones like Nexus 9, whose CPU was based on it - because nvidia "Denver" architecture used same technology (AFAIK licensed from Transmeta, but don't cite me on this)
And then there are read-modify-write instructions, which on modern CPUs need two address-generation μops in addition to the load one, the store one, and the ALU one. So the underlying load-store architecture is very visible.
There’s also the part where we’ve trained ourselves out of using the more CISCy parts of x86 like ENTER, BOUND, or even LOOP, because they’ve been slow for ages, and thus they stay slow.
But for example REP MOVS now is fused into equivalent of using SSE load-stores (16 bytes) or even AVX-512 load stores (64 bytes).
And of course equivalent of LEA by using ModRM/SIB prefixes is pretty much free with it being AFAIK handled as pipeline step
I always felt Transmeta could have carved out a small but sustained niche by offering even less-efficient "morphing" for other architectures, especially discontinued ones. 680x0, SPARC, MIPS, Alpha, PA-RISC... anything the vendors stopped developing hardware (or competitive hardware) for.
I remember that fondly.
If you did view source there was a comment that said something like:
No, there are no hidden messages in the source code, either.
Somehow I managed to tolerate running Gentoo on it. Compiling X, OpenOffice, or Firefox were multi-day affairs. One thing that annoyed me was I could never get the graphics card (an ATI Rage 128 with 4 MB RAM, IIRC) working with acceleration under Linux, and that was when compositing window managers were gaining prevalence; I kept trying to get it working in the hope that it would take a bit of the load off of the struggling CPU.
Despite the bad performance, it worked really well for a college student: it was great for taking notes, and the batteries (extended main and optical drive bay) would easily last a full day of classes. It wouldn't run Eclipse very well, but most of my CS assignments were done using a text editor, anyways.
They did push the envelope on efficiency. My Crusoe-equipped laptop could go six hours on the stock battery (12+ on the extended batteries) back when most laptops struggled to get three.
The problem with Segway in Germany was rather the certification for road traffic. Because of the insane red tape involved, the introduction was delayed, and for the same reason nobody thus wanted one.
I'm curious: Is there a consensus on which startup companies achieved success using IBM as a fab? or if not a consensus, I'd settle for anecdotes too.
My own company (which built 40G optical transponders) used them back in that era. While the tech was first rate, the pricing was something to behold.
Just a small nitpick: I've seen the K5/29050 connection mentioned in a number of places, but the K5 was actually based upon an un-released superscalar 29K project called "Jaguar", not the 29050, which was a single-issue, in-order design.
I had it on my Mac LCII in 1992. It barely ran well enough to run older DOS IDEs for college. Later I bought an accelerator (40Mhz 68030) and it ran better.
The key difference is: what is an instruction set? Is it a Turing-complete thing with branches, calls, etc? Or is it just data flow instructions (math, compares, loads and stores, etc)?
X86 CPUs handle branching in the frontend using speculation. They predict where the branch will go, issue data flow instructions from that branch destination, along with a special "verify that I branched to the right place" instruction, which is basically just the compare portion of the branch. ARM CPUs do the same thing. In both X86 and ARM CPUs, the data flow instructions that the CPU actually executes look different (are lower level, have more registers) than the original instruction set.
This means that there is no need to translate branch destinations. There's never a place in the CPU that has to take a branch destination (an integer address in virtual memory) in your X86 instruction stream and work out what the corresponding branch destination is in the lower-level data flow stream. This is because the data flow stream doesn't branch; it only speculates.
On the other hand, a DBT has to have a story for translating branch destinations, and it does have to target a full instruction set that does have branching.
That said, I don't know what the Transmeta CPUs did. Maybe they had a low-level instruction set that had all sorts of hacks to help the translation layer avoid the problems of branch destination translation.
It would have been an extremely difficult time to enter the market though, because at the time Intel was successfully paying server manufacturers to not offer superior competing products.
Electric unicycles and Onewheels!
And they're really fun!
https://dl.acm.org/doi/10.1145/358438.349303
(this is not about x86 but PA-RISC, but the conclusions would likely be very similar...)
https://homepage.divms.uiowa.edu/~ghosh/4-18-06.pdf
I think it's correct to say Transmeta did partial software emulation, though lines get blurry here.
I don't think Apple is a good example here. Arm was extremely well-established when Apple began its own phone/tablet CPU designs. By the time Macs began to transition, much of their developer ecosystem was already familiar.
Apple's CPUs are actually notably conservative when compared to the truly wild variety of Arm implementations; no special vector instructions (e.g. SVE), no online translation (e.g. Nvidia Denver), no crazy little/big/bigger core complexes.
https://dougallj.wordpress.com/2022/11/09/why-is-rosetta-2-f...
One time I had to unravel a race condition and he seemed pissed that it took a few days and when I tried to explain the complexity he told me his name was on a patent for a system that would let several VAXes share a single disk and didn't need a lecture.
He was then excited after the interview because the individual had been working at transmeta with Linus, and his resume was accurate. He didn't end up working with us, but I wasn't privy to any additional information.
Their fundamental idea was that by having simpler CPUs, they could iterate on Moore's law more quickly. And eventually they would win on performance. Not just on a few speculative edge cases, but overall. The dynamic compilation was needed to be able to run existing software on it.
The first iterations, of course, would be slower. And so their initial market, needed to afford those software generations, would be use cases for low power. Because the complexity of a CISC chip made that a weak point for Intel.
They ran into a number of problems.
The first is that the team building that dynamic compilation layer was more familiar with the demands of Linux than Windows, with the result that the compilation worked better for Linux than Windows.
The second problem was that the "simple iterates faster" also turns out to be true for ARM chips. And the most profitable segments of that low power market turned out to be willing to rewrite their software for that use case.
And the third problem is that Intel proved to be able to address their architectural shortcomings by throwing enough engineers at the problem to iterate faster.
If Transmeta had won its bet, they would have completely dominated. But they didn't.
It is worth noting that Apple pursued a somewhat similar idea with Rosetta. Both in changing to Intel, and later changing to ARM64. With the crucial difference that they also controlled the operating system. Meaning that instead of constantly dynamically compiling, they could rely on the operating system to decide what needs to be compiled, when, and call it correctly. And they also better understood what to optimize for.
Ha, "We stand on the shoulders of giants"...
Wut - SVE and SME are literally Apple designs (AMX) which have been "back ported".
In a similar vein:
One of the dads at school runs a company that does a nanotech waterproof coating for electronics (backed by patents). I told him that it would be very useful for personal electric vehicles, like electric unicycles. He replied that they looked at that, but decided not to license the tech for that use, because there wasn't enough money in it.
Sad.
in any case due to the unfortunate timing of the dot-com implosion it never really went anywhere (I wish I had managed to keep one, they used to appear on ebay occasionally)
the one thing I remember is that it was memory limited, it had 64MB but I think the code-morphing software really wanted 16MB of it which really cut into the available system memory
Literally no Apple CPUs meaningfully support SVE or SVE2. Apple adds what I would say is a relatively "conventional" matrix instructions (AMX) of their own, and now implements SME and SME2, but those are not equivalent to SVE (I call AMX "conventional" in the sense that a fixed-size grid of matrix compute elements is not a particularly new idea, versus variable-sized SIMD which is still quite rare. Really, the only arm64 design with "full fat" SVE support is Fujitsu's a64fx (512-bit vector size); everything else on the very short list of hardware supporting SVE is still stuck with 128-bit vectors.
Fixed guest branches just get turned into host branches and work like normal.
Indirect guest branches would get translated through a hardware jump address cache that was structured kind of like TLB tag lookups are.
It's not too uncommon for each pipeline stage or so to have their own uop formats as each stage computes what it was designed to and culls what later stages don't need.
Because of this it's not that weird to see both a single rmw uops at, says the initial decode and microcode layer, that then gets cracked into the different uops for the different functional units later on.
> Fixed guest branches just get turned into host branches and work like normal.
How does that work in case of self-modifying code, or skewed execution (where the same x86 instruction stream has two totally different interpretations based on what offset you start at)?
One key factor against them, though, is that they were facing a company whose long-term CEO had written Only The Paranoid Survive. At that point he had moved from being the CEO to the chairman of the board. But Intel had paranoia about possible existential threats baked into its DNA.
There is no question that Intel recognized Transmeta as a potential existential threat, and aggressively went after the very low-power market that Transmeta was targeting. Intel quickly created SpeedStep, allowing power consumption to dynamically scale when not under peak demand. This improved battery life on laptops using the Pentium III, without sacrificing peak performance. They went on to produce low power chips like the Pentium M that did even better on power.
Granted, Intel never managed to match the low power that Transmeta had. But they managed to limit Transmeta enough to cut off their air supply - they couldn't generate the revenue needed to invest enough to iterate as quickly as they needed to. This isn't just a story of Transmeta stumbling. This is also a story of Intel recognizing and heading off a potential threat.
I also looked at the TM specific flags that they documented, and was surprised to find some that hadn’t been enabled on Linux despite Linus still working there at the time. They looked to be useful for low power mode, and at that time I was looking for a carry-everywhere laptop with decent run time so I invested in those flags.
Turns out they didn’t do anything observable to the system. Power draw was unphased by flipping these toggles. I don’t believe those changes ever got merged.
But it was the Linux fuckery that convinced me I wanted a bask shell and a Unix CLI and just get shit done without having to fiddle all the time. I had better things to do. So I’ve been on apple since except for Pi, Docker, and work.