←back to thread

199 points ashvardanian | 5 comments | | HN request time: 5.999s | source
Show context
ashvardanian ◴[] No.46288320[source]
This article is about the ugliest — but arguably the most important — piece of open-source software I’ve written this year. The write-up ended up long and dense, so here’s a short TL;DR:

I grouped all Unicode 17 case-folding rules and built ~3K lines of AVX-512 kernels around them to enable fully standards-compliant, case-insensitive substring search across the entire 1M+ Unicode range, operating directly on UTF-8 bytes. In practice, this is often ~50× faster than ICU, and also less wrong than most tools people rely on today—from grep-style utilities to products like Google Docs, Microsoft Excel, and VS Code.

StringZilla v4.5 is available for C99, C++11, Python 3, Rust, Swift, Go, and JavaScript. The article covers the algorithmic tradeoffs, benchmarks across 20+ Wikipedia dumps in different languages, and quick starts for each binding.

Thanks to everyone for feature requests and bug reports. I'll do my best to port this to Arm as well — but first, I'm trying to ship one more thing before year's end.

replies(4): >>46288545 #>>46288790 #>>46291556 #>>46291741 #
fatty_patty89 ◴[] No.46288545[source]
Thank you

do the go bindings require cgo?

replies(1): >>46288586 #
ashvardanian ◴[] No.46288586[source]
The GoLang bindings – yes, they are based on cGo. I realize it's suboptimal, but seems like the only practical option at this point.
replies(1): >>46288610 #
fatty_patty89 ◴[] No.46288610[source]
In a normal world the Go C FFI wouldn't have insane overhead but what can we do, the language is perfect and it will stay that way until morale improves.

Thanks for the work you do

replies(2): >>46289084 #>>46289964 #
1. kbolino ◴[] No.46289964[source]
There are undoubtedly still some optimizations lying around, but the biggest source of Go's FFI overhead is goroutines.

There's only two "easy" solutions I can see: switch to N:N threading model or make the C code goroutine-aware. The former would speed up C calls at the expense of slowing down lots of ordinary Go code. Personally, I can still see some scenarios where that's beneficial, but it's pretty niche. The latter would greatly complicate the use of cgo, and defeat one of its core purposes, namely having access to large hard-to-translate C codebases without requiring extensive modifications of them.

A lot of people compare Go's FFI overhead to that of other natively compiled languages, like Zig or Rust, or to managed runtime languages like Java (JVM) or C# (.NET), but those alternatives don't use green threads (the general concept behind goroutines) as extensively. If you really want to compare apples-to-apples, you should compare against Erlang (BEAM). As far as I can tell, Erlang NIFs [1] are broadly similar to purego [2] calls, and their runtime performance [3] has more or less the same issues as CGo [4].

[1]: https://www.erlang.org/doc/system/nif.html

[2]: https://pkg.go.dev/github.com/ebitengine/purego

[3]: https://erlang.org/documentation/doc-10.1/doc/efficiency_gui...

[4]: https://www.reddit.com/r/golang/comments/12nt2le/when_dealin...

replies(1): >>46290541 #
2. fatty_patty89 ◴[] No.46290541[source]
Java has green threads and c#/.net has logical threads
replies(2): >>46290697 #>>46291700 #
3. kbolino ◴[] No.46290697[source]
Yes, I have cleaned up the wording a bit. Also, the common implementation of Rust's async is comparable to green threads, and I think Zig is adopting something like it too.

However, the "normal" execution model on all of them is using heavyweight native threads, not green threads. As far as I can tell, FFI is either unsupported entirely or has the same kind of overhead as Go and Erlang do, when used from those languages' green threads.

replies(1): >>46290946 #
4. fatty_patty89 ◴[] No.46290946{3}[source]
Genuine question, you make it seem as this is a limitation and they're all in the same bucket but how was Java for example able to scale all the enterprises while having multi threading and good ffi, same with .net.

My impression is that the go ffi is with big overhead because of the specific choices made to not care about ffi because it would benefit the go code more?

My point was that there's other gc languages/envorionments that have good ffi and were somehow able all these decades to create scalable multithreaded applications.

replies(1): >>46291799 #
5. kbolino ◴[] No.46291799{4}[source]
I would suggest gaining a better understanding of the M:N threading model versus the N:N threading model. I do not know that I can do it justice here.

Both Java and Rust flirted with green threads in their early days. Java abandoned them because the hardware wasn't ready yet, and Rust abandoned them because they require a heavyweight runtime that wasn't appropriate for many applications Rust was targeting. And yet, both languages (and others besides) ended up adding something like them in later anyway, albeit sitting beside, rather than replacing, the traditional N:N threading they primarily support.

Your question might just be misdirected; one could view it as operating systems, and not programming languages per se, that screwed it all up. Their threads, which were conservatively designed to be as compatible as possible with existing code, have too much overhead for many tasks. They were good enough for awhile, especially as multicore systems started to enter the scene, but their limitations became apparent after e.g. nginx could handle 10x the requests of Apache httpd on the same hardware. This gap would eventually be narrowed, to some extent, but it required a significant amount of rework in Apache.

If you can answer the question of why ThreadPoolExecutor exists in Java, then you are about halfway to answering the question about why M:N threading exists. The other half is mostly ergonomics; ThreadPoolExecutor is great for fanning out pieces of a single, subdividable task, but it isn't great for handling a perpetual stream of unrelated tasks that ebb and flow over time. EDIT: See the Project Loom proposal for green threads in Java today, which also brings up the ForkJoinPool, another approach to M:N threading: https://cr.openjdk.org/~rpressler/loom/Loom-Proposal.html