This can’t be the first pass someone has made at something like this, right? There must be literal dozens of SIMD thirsty Gophers around. Would a more common pattern be to use CGO?
This was to allow for easy cross-compilation.
You should probably clarify that you are talking about 'golang plan9 assembly format', and not about the plan9 OS. This confused me a lot. (https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs)
I didn't know you could easily hand write 'golang assembly' for your functions. This seems really useful in some cases.
But I do wonder, why the std lib doesn't offer this ?
EDIT: I found the anwser: https://github.com/golang/go/issues/53171
Here is a good explanation provided by Ian Lance Taylor:
> This proposal is focused on code generation rather than language, and the details of how it changes the language are difficult to understand. Go is intended to be a simple language that it is easy to understand. Introducing complex semantics for performance reasons is not a direction that the language is going to take.
That's certainly an ... interesting choice.
Rob Pike spoke on the design of Go's assembler at a talk in 2016 [1][2]. I think it basically came down to the observation that most assembly language is roughly the same, so why not build a common assembly language that "lets you talk to the machine at the lowest level and yet not have to learn a new syntax." It also enables them to automatically generate a working assembler given an instruction manual PDF for a new architecture as input.
[1]: https://www.youtube.com/watch?v=KINIAgRpkDA [2]: https://go.dev/talks/2016/asm.slide#1
> The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail elsewhere. If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
That first sentence links to https://9p.io/sys/doc/asm.html. So yes, definitely not a coincidence.
https://plan9.io/sys/doc/comp.html
IIRC before Go was self compiling, it was compiled using 9c, and its architecture inherits from 9c.
EDIT: some nearby threads got the details better!
Back in those days I actually found that, with a few small tweaks, I could compile the Plan 9 operating system using Go's C suite. We didn't pursue it further but this was one of the options we looked into for cross-compiling Plan 9 from other operating systems.
>Go uses its own internal assembly language called Plan9.
Plan9 is the name of the OS. You wouldn't name a programming language "Linux", even if Linus created it and it was super related or not at all related.
I've been doing this for 35 years and cross compiling anything nontrivial was always a toolchain nightmare. Discovering a world where all I had to do was set GOARCH=mips64 (and possibly GOOS=darwin if I wanted mac binaries) before invoking the compiler is so magical I was extremely skeptical when I first read about it.
But the more I look into it, the more I think this is just a LLM hallucination.
The doc about the 'assembly' format doesn't give a proper name. It just calls 'go assembler'.
And I think the source of this hallucination was this first paragraph:
> The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail elsewhere. If you plan to write assembly language, you should read that document although much of it is Plan 9-specific. The current document provides a summary of the syntax and the differences with what is explained in that document, and describes the peculiarities that apply when writing assembly code to interact with Go.
For those interested, here's Thomson's paper about Plan9 C compilers: https://9p.io/sys/doc/compiler.html and https://doc.cat-v.org/bell_labs/new_c_compilers/
How is this actually used though? I searched the whole article, and no where are any operations being performed on slices.
Edit: I found what I was looking for in the linked docs: https://pkg.go.dev/github.com/pehringer/simd#pkg-index
So basically if you have 2 slices you want to add together, instead of a for loop you could do it in parallel with simd using:
simd.AddInt32(slice1, slice2, result)
The article simply misspoke by saying that the assembly language is "called plan9".
1. On amd64 those ints are actually 64bit. If you used int32 then they would be be word aligned in the parameter list. However, there is a gotcha with that. The return values will always start at a dword aligned offset on 64bit system.
2. NOSPLIT is defined in "textflag.h" which Go's compiler automatically provides. However, NOSPLIT is, from everything I've read, only respected on runtime.XX functions, so it's not doing anything there, and it's also not necessary. NOSPLIT tells the compiler not to insert code to check if the stack needs to split because it's going to overflow, which is technically unnecessary if the function doesn't need any stack space. It's basically only there on the function that checks for stack splits, to prevent that code from being injected into itself.
https://pkg.go.dev/github.com/grailbio/base/simd has some work I’ve done in this vein.
One that always sticks out to me personally is the use in Go of the term "dial" instead of "connect" for network connection establishment. This is, AFAICT, another Pike+Thompson-ism, as it can be seen previously in the form of the Plan9 dial(3) syscall — https://9fans.github.io/plan9port/man/man3/dial.html .
---
A tangent: I have wondered before whether Pike and Thompson drafted the design for the language that would become Golang long before working at Google, initially to replace C specifically in the context of being the lingua-franca for systems programming on Plan 9. And that, therefore — at least in the designer's minds — Golang would have always had Plan9 as its secret "flagship target" that it should have a 1:1 zero-impedance abstraction mapping onto. Even if they never bothered to actually make a Plan9 Golang runtime.
You could test this hypothesis by implementing an actual Golang runtime for Plan9†, and then comparing it to the Golang runtimes for other OSes — if Plan9 were the "intended home" for Golang programs, then you'd expect the Golang runtime to be very "thin" on Plan9.
(To put that another way: imagine the Golang runtime as something like WINE — a virtualization layer that implements things that could be syscalls / OS library code, in the form of client-side runtime shim code. A "WINE implementation for Windows" would be an extremely thin shim, as every shim call would just point to a 1:1-API-matched piece of OS-provided code. My hypothesis here is that "Golang for Plan9" is the same kind of thing as "WINE for Windows.")
† I was saying this as a thought experiment, not thinking there would actually be a Plan9 implementation of the Golang runtime... but there is! (https://go.dev/wiki/Plan9) So someone can actually check this :)
If it has, then it's most likely available on https://cat-v.org/. Even if it hasn't, cat-v.org is a great starting point.
Besides, close to your line of thought, and assuming you didn't knew about this already, Pike & al previously worked on Limbo[0], a "predecessor" of Go, used to wrote Inferno[1], a Plan9-like OS, which could be hosted on arbitrary OSes via a bespoke virtual machine called "Dis".
So there were indeed a few previous "drafts" for Go. I'd doubt that Go has been designed "for" Plan9 though.
[0]: https://en.wikipedia.org/wiki/Limbo_(programming_language)
[1]: https://en.wikipedia.org/wiki/Inferno_(operating_system)
The most complete library I've seen (though admittedly never used) uses CGO _partially_, with a neat hack to avoid the overhead that it comes with [1].
[0]: https://github.com/slimsag/rand/blob/f1e8d464c0021a391d5cd64...
Considering there is no introduction at all to this beyond "I discovered it's called Plan9", I'm assuming the author really thinks this is a language widely named "Plan9".
For those who are curious: "In the general case, the frame size [the parameter after NOSPLIT] is followed by an argument size, separated by a minus sign. (It's not a subtraction, just idiosyncratic syntax.) The frame size $24-8 states that the function has a 24-byte frame and is called with 8 bytes of argument, which live on the caller's frame. If NOSPLIT is not specified for the TEXT, the argument size must be provided. For assembly functions with Go prototypes, go vet will check that the argument size is correct."
Source: https://go.dev/doc/asm
Like previous you, I have to admit I'm skeptical but would be happy to be wrong.
"EdgeOS" is based on Linux, and people run vanilla Linux distributions on those boxes, as well as OpenBSD and NetBSD.
I wonder how long Marvell will continue selling those Octeon MIPS64 chips, though. Marvell (then Cavium) switched to ARM nearly a decade ago (2016) for newer chips in the Octeon series. I think Loongson sells more modern MIPS64 (or at least MIPS64-like) chips, but they don't seem to be commercially available outside China.
Hundreds of thousands of linux-based smartnic cards, actually. Fun stuff. Those particular ones were EOLd and have been replaced with ARM but the MIPS based ones will live on in the datacenters until they die, I'm sure.
> Like previous you, I have to admit I'm skeptical but would be happy to be wrong
Seriously, you are going to be delighted to be wrong. On your linux machine, go write a go program and write "GOOS=darwin GOARCH=arm64 go build ..." and you will have yourself an ARM mac binary. Or for going the other way, use GOOS=linux GOARCH=amd64. It really is that simple.
(1) https://go.dev/wiki/MinimumRequirements#amd64
That's fair, but what we call a monstrosity by modern standards is much simpler than porting the assembly
There were cross plaform languages before C, but they never really took off for system development the wat C did (OSs, for example were commonly written in pure assembly)
But for the particular case of SIMD operations, wouldn't it make more sense to use the GPU instead of the CPU? GPUs excel at parallelism and matrix operations, so the performance difference would be even greater. I suppose the lack of well maintained GPU packages and community around it don't make Go particularly well suited for this.
I install QEMU (I have the same setup locally), then it's one line each to run unit tests for: Linux 386, arm64, riscv64, ppc64le and s390x.
With QEMU installed, all you have to do is:
GOARCH=bla go test ./...
- SIMD: up to 400% speed boost, most likely on the same VMs you were already using
- GPU: orders of magnitude faster, but now you need to make sure your cloud servers have compatible GPUs attached
If you really do need crazy performance then it's worth it. If you're already stable and right-scaled and SIMD allows you to reduce your VM spend by 25%, then you probably just take the savings and move on.
If this is true, i kindly ask the author to not feel embarrassed or "exposed", but be honest, so we can learn from this. I'd like to gain confidence in these type of "LLM exposed" things, but it never seems like people would admit, no matter how obvious. And of course, here it is not obvious, this is of course a wild, very judgemental guess.
I mean, the fact that Go has own Plan9-derived format for assembler, has absoluely nothing to do with the task author aims to solve.
Go's calling convention is using registers, except when writing own assembly functions then it's stack-based; the latter is also how it worked in the past. See https://go.dev/s/regabi and https://go.dev/doc/asm
Build tags have a form "go:build" not "+build" since Go 1.17, that is for couple of years already.
More about build tags: using both build tags and filename suffix for arch-based conditional compilation is redundant. Just use one of them, not both.
I recently ported all of the amd64 assembly in Go's crypto libraries over to Avo. Very useful library for this sort of work!
The Go build system runs under your current architecture, cross-compiling tests to your target architecture.
Then, the Go test runner also runs under your current architecture, orchestrating running your cross compiled test binaries.
Since you registered to run cross-compiled binaries under QEMU, those test binaries magically run through QEMU.
The Go test runner collects test results, and reports back to you.
The first run might be slowish, as the Go compiler needs to cross compile the standard library and all your dependencies to your target platform. But once that's done and cached, and if your tests are fast, the edit-test cycle becomes pretty quick.
No, the user space is (mostly) busybox, in both cases.
The user interface is different sure.
The fact that termux (a Debian userspace) is able to run on android at all is enough to dispel your claims.
Stop with the mental gymnastics, you're ill prepared. If you wish, I can buy you a dictionary.
I suggest to attend one, or search related material freely available.
Termux runs on Android with hacks, and only to the point Google doesn't allow it anymore, unless it is side loaded.
I don't know where you did your degree or when. But my friend you are objectively wrong.
Termux no longer runs because it no longer allows (possibly using Linux capabilities?) subprocesses from around Android 10. Android 12 if memory serves actually starts killing background processes.
No hacks. Unless your degree says using the POSIX fork()/exec() API as "hacks".
Please don't embarrass yourself further. It was quite painful reading your prior response.
Also I they aren't listed as official NDKs APIs, because POSIX isn't part of neither ISO C, nor ISO C++, hence hack.
https://developer.android.com/ndk/guides/stable_apis
A hack that termux folks now suffer from, because it fails Play Store API validation for forking processes, which sidelining works around, until Google decides to forbid that as well.
Coding since 1980's, and only fools are afraid to be embarrassed.
Apple does this too for its more locked down devices.
I've been coding since the 80s too. I had assumed from your hubris and ignorance that you were young. My mistake, it's clear that you're merely an idiot.
Enjoy the weekend, happy in the knowledge that I shall no longer be engaging with you.