Most active commenters

davidatbu(5)
bjourne(3)
boroboro4(3)

Popular/hot comments

>>45141705 #

←back to thread

ML needs a new programming language – Interview with Chris Lattner

(signalsandthreads.com)

Show context

MontyCarloHall ◴[05 Sep 25 14:19 UTC] No.45138920[source]▶

>>45137373 (OP) #

The reason why Python dominates is that modern ML applications don't exist in a vacuum. They aren't the standalone C/FORTRAN/MATLAB scripts of yore that load in some simple, homogeneous data, crunch some numbers, and spit out a single result. Rather, they are complex applications with functionality extending far beyond the number crunching, which requires a robust preexisting software ecosystem.

For example, a modern ML application might need an ETL pipeline to load and harmonize data of various types (text, images, video, etc., all in different formats) from various sources (local filesystem, cloud storage, HTTP, etc.) The actual computation then must leverage many different high-level functionalities, e.g. signal/image processing, optimization, statistics, etc. All of this computation might be too big for one machine, and so the application must dispatch jobs to a compute cluster or cloud. Finally, the end results might require sophisticated visualization and organization, with a GUI and database.

There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python. Python's numerical computing libraries (NumPy/PyTorch/JAX etc.) all call out to C/C++/FORTRAN under the hood and are thus extremely high-performance, and for functionality they don't implement, Python's C/C++ FFIs (e.g. Python.h, NumPy C integration, PyTorch/Boost C++ integration) are not perfect, but are good enough that implementing the performance-critical portions of code in C/C++ is much easier compared to re-implementing entire ecosystems of packages in another language like Julia.

replies(8): >>45139364 #>>45140601 #>>45141802 #>>45143317 #>>45144664 #>>45146179 #>>45146608 #>>45146905 #

Hizonner ◴[05 Sep 25 15:00 UTC] No.45139364[source]▶

>>45138920 #

This guy is worried about GPU kernels, which are never, ever written in Python. As you point out, Python is a glue language for ML.

> There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python.

That may be true, but some of us are still bitter that all that grew up around an at-least-averagely-annoying language rather than something nicer.

replies(5): >>45139454 #>>45140625 #>>45141909 #>>45142782 #>>45147478 #

1. ModernMech ◴[05 Sep 25 16:43 UTC] No.45140625[source]▶

>>45139364 #

> This guy is worried about GPU kernels, which are never, ever written in Python. As you point out, Python is a glue language for ML.

That's kind of the point of Mojo, they're trying to solve the so-called "two language problem" in this space. Why should you need two languages to write your glue code and kernel code? Why can't there be a language which is both as easy to write as Python, but can still express GPU kernels for ML applications? That's what Mojo is trying to be through clever use of LLVM MLIR.

replies(5): >>45141705 #>>45143663 #>>45144593 #>>45145100 #>>45145290 #

2. nostrademons ◴[05 Sep 25 18:11 UTC] No.45141705[source]▶

>>45140625 (TP) #

It's interesting, people have been trying to solve the "two language problem" since before I started professionally programming 25 years ago, and in that time period two-language solutions have just gotten even more common. Back in the 90s they were usually spoken about only in reference to games and shell programming; now the pattern of "scripting language calls out to highly-optimized C or CUDA for compute-intensive tasks" is common for webapps, ML, cryptocurrency, drones, embedded, robotics, etc.

I think this is because many, many problem domains have a structure that lends themselves well to two-language solutions. They have a small homogenous computation structure on lots of data that needs to run extremely fast. And they also have a lot of configuration and data-munging that is basically quick one-time setup but has to be specified somewhere, and the more concisely you can specify it, the less human time development takes. The requirements on a language designed to run extremely fast are going to be very different from one that is designed to be as flexible and easy to write as possible. You usually achieve quick execution by eschewing flexibility and picking a programming model that is fairly close to the machine model, but you achieve flexibility by having lots of convenience features built into the language, most of which will have some cost in memory or indirections.

There've been a number of attempts at "one language to rule them all", notably PL/1, C++, Julia (in the mathematical programming subdomain), and Common Lisp, but it often feels like the "flexible" subset is shoehorned in to fit the need for zero-cost abstractions, and/or the "compute-optimized" subset is almost a whole separate language that is bolted on with similar but more verbose syntax.

replies(3): >>45142735 #>>45144777 #>>45147626 #

3. soVeryTired ◴[05 Sep 25 19:37 UTC] No.45142735[source]▶

>>45141705 #

There's a very interesting video about the "1.5 language problem" in Julia [0]. The point being that when you write high-performance Julia it ends up looking nothing like "standard" Julia.

It seems like it's just extremely difficult to give fine-grained control over the metal while having an easy, ergonomic language that lets you just get on with your tasks.

[0] https://www.youtube.com/watch?v=RUJFd-rEa0k

4. bobajeff ◴[05 Sep 25 21:10 UTC] No.45143663[source]▶

>>45140625 (TP) #

I don't think Mojo can solve the two language problem. Maybe if it was going to be superset of Python? Anyway I think that was actually Julia's goal not Mojo's.

replies(1): >>45147057 #

5. ◴[05 Sep 25 22:49 UTC] No.45144593[source]▶

>>45140625 (TP) #

6. Karrot_Kream ◴[05 Sep 25 23:08 UTC] No.45144777[source]▶

>>45141705 #

From what I can tell, gaming has mostly just embraced two language solutions. The big engines Unity, Unreal, and Godot have tight cores written in C/C++ and then scripting languages that are written on top. Hobby engines like Love2D often also have a tight, small core and are extensible with languages like Lua or Fennel.

Modern Common Lisp also seems to have given up its "one language to rule them all" mindset and is pretty okay with just dropping into CFFI to call into C libraries as needed. Over the years I've come to see that mindset as mostly a dead-end. Python, web browsers, game engines, emacs, these are all prominent living examples of two-language solutions that have come to dominate in their problem spaces.

One aspect of the "two language problem" that I find troubling though is that modern environments often ossify around the exact solution. For example, it's very difficult to have something like PyTorch in say Common Lisp even though libcuda and libdnn should be fairly straightforward to wrap in Common Lisp (see [1] for Common Lisp CUDA bindings.) JS/TS/WASM that runs in the browser often is dependent on Chrome's behavior. Emacs continues to be tied to its ancient, tech-debt ridden C runtime. There seems to be a lot of value tied into the glue between the two chosen languages and it's hard to recreate that value with other HLLs even if the "metal" language/runtime stays the same.

[1]: https://github.com/takagi/cl-cuda

replies(1): >>45144991 #

7. nostrademons ◴[05 Sep 25 23:37 UTC] No.45144991{3}[source]▶

>>45144777 #

This may be because while the computational core is small, much of the code and the value of the overall solution are actually in the HLL. That's the reason for the use of a HLL in the first place.

PyTorch is actually quite illustrative as being a counterexample that proves the rule. It was based on Torch, which had very similar if not identical BLAS routines but used Lua as the scripting language. But now everybody uses PyTorch because Lua development stopped in 2017, so all the extra goodies that people rely on now are in the Python wrapper.

The only exception seems to be when multiple scripting languages are supported, and at roughly equal points of development. So for example - SQLite continues to have most of its value in the C substrate, and is relatively easy to port to other languages, because it has so many language bindings that there's a strong incentive to write new functionality in C and keep the API simple. Ditto client libraries for things like MySQL, PostGres, MongoDB, Redis, etc. ZeroMQ has a bunch of bindings that are largely dumb passthroughs to the underlying C++ substrate.

But even a small imbalance can lead to that one language being preferenced heavily in supporting tooling and documentation. Pola.rs is a Rust substrate and ships with bindings for Python, R, and Node.js, but all the examples on the website are in Python or Rust, and I rarely hear of a non-Python user picking it up.

replies(1): >>45145207 #

8. adsharma ◴[05 Sep 25 23:55 UTC] No.45145100[source]▶

>>45140625 (TP) #

Python -> Mojo -> MLIR ... <target hardware>

Yes, you can write mojo with python syntax and transpile. You'd end up with something similar to Julia's 1.5 language problem.

Since the mojo language is not fully specified, it's hard to understand what language constructs can't be efficiently expressed in the python syntax.

Love MLIR and Mojo as debuggable/performant intermediate languages.

9. Karrot_Kream ◴[06 Sep 25 00:10 UTC] No.45145207{4}[source]▶

>>45144991 #

Very interesting observation on SQLite vs Pola.rs. Also, how could I forget that Torch was originally a Lua library when I used it forever ago.

I also wonder how much of the ossification comes from the embodied logic in the HLL. SQLite wrappers tend to be very simple and let the C core do most of the work. Something like PyTorch on the other hand layers on a lot of logic onto underlying CUDA/BLAS that is essential complexity living solely in Python the HLL. This is also probably why libcurl has so many great wrappers in HLLs because libcurl does the heavy lifting.

The pain point I see repeatedly in putting most of the logic into the performant core is asynchrony. Every HLL seems to have its own way to do async execution (Python with asyncio, Node with its async runtime, Go with lightweight green threads (goroutines), Common Lisp with native threads, etc.) This means that the C core needs to be careful as to what to expose and how to accommodate various asynchrony patterns.

10. bjourne ◴[06 Sep 25 00:22 UTC] No.45145290[source]▶

>>45140625 (TP) #

> Why can't there be a language which is both as easy to write as Python, but can still express GPU kernels for ML applications? That's what Mojo is trying to be through clever use of LLVM MLIR.

It already exists. It is called PyTorch/JAX/TensorFlow. These frameworks already contain sophisticated compilers for turning computational graphs into optimized GPU code. I dare say that they don't leave enough performance on the table for a completely new language to be viable.

replies(2): >>45147094 #>>45147780 #

11. davidatbu ◴[06 Sep 25 06:19 UTC] No.45147057[source]▶

>>45143663 #

Being a Python superset is literally a goal of Mojo mentioned in the podcast.

Edit: from other posts on this page, I've realized that being a superset of Python is now regarded a nice-to-have by Modular, not a must-have. They realized it's harder than they thought initially, basically.

12. davidatbu ◴[06 Sep 25 06:26 UTC] No.45147094[source]▶

>>45145290 #

Last I checked , all of pytorch, tensorflow, and Jax sit at a layer of abstraction that is above GPU kernels. They avail GPU kernels (as basically nodes in the computational graph you mention), but they don't let you write GPU kernels.

Triton, CUDA, etc, let one write GPU kernels.

replies(2): >>45148597 #>>45148632 #

13. imtringued ◴[06 Sep 25 08:36 UTC] No.45147626[source]▶

>>45141705 #

You say this is some ideal outcome, but I want to get as far away from python and C++ as possible.

Also, no. I can't use Python for inference, because it is too slow, so I have to export to tensorflow lite and run the model in C++, which essentially required me to rewrite half the code in C++ again.

14. saagarjha ◴[06 Sep 25 09:14 UTC] No.45147780[source]▶

>>45145290 #

There's plenty of performance on the table but I don't think it will be captured by a new language.

15. bjourne ◴[06 Sep 25 12:13 UTC] No.45148597{3}[source]▶

>>45147094 #

Yes, they kinda do. The computational graph you specify is completely different from the execution schedule it is compiled into. Whether it's 1, 2, or N kernels is irrelevant as long as it runs fast. Mojo being an HLL is conceptually no different from Python. Whether it will, in the future, become better for DNNs, time will tell.

replies(1): >>45148792 #

16. boroboro4 ◴[06 Sep 25 12:19 UTC] No.45148632{3}[source]▶

>>45147094 #

Torch.compile sits at both the level of computation graph and GPU kernels and can fuse your operations by using triton compiler. I think something similar applies to Jax and tensorflow by the way of XLA, but I’m not 100% sure.

replies(1): >>45148807 #

17. davidatbu ◴[06 Sep 25 12:42 UTC] No.45148792{4}[source]▶

>>45148597 #

I assume HLL=Higher Level Language? Mojo definitely avails lower-level facilities than Python. Chris has even described Mojo as "syntactic sugar over MLIR". (For example, the native integer type is defined in library code as a struct).

> Whether it's 1, 2, or N kernels is irrelevant.

Not sure what you mean here. But new kernels are written all the time (flash-attn is a great example). One can't do that in plain Python. E.g., flash-attn was originally written in C++ CUDA, and now in Triton.

replies(1): >>45152799 #

18. davidatbu ◴[06 Sep 25 12:44 UTC] No.45148807{4}[source]▶

>>45148632 #

Good point. But the overall point about Mojo availing a different level of abstraction as compared to Python still stands: I imagine that no amount of magic/operator-fusion/etc in `torch.compile()` would let one get reasonable performance for an implementation of, say, flash-attn. One would have to use CUDA/Triton/Mojo/etc.

replies(1): >>45149868 #

19. boroboro4 ◴[06 Sep 25 15:01 UTC] No.45149868{5}[source]▶

>>45148807 #

But python is already operating fully on different level of abstraction - you mention triton yourself, and there is new python cuda api too (the one similar to triton). More to this - flash attention 4 is actually written in python.

Somehow python managed to be both high level and low level language for GPUs…

replies(1): >>45150630 #

20. davidatbu ◴[06 Sep 25 16:25 UTC] No.45150630{6}[source]▶

>>45149868 #

IIUC, triton uses Python syntax, but it has a separate compiler (which is kinda what Mojo is doing, except Mojo's syntax is a superset of Python's, instead of a subset, like Triton). I think it's fair to describe it as a different language (otherwise, we'd also have to describe Mojo also as "Python"). Triton's website and repo describes itself as "the Triton language and compiler" (as opposed to, I dunno, "Write GPU kernels in Python").

Also, flash attention is at v3-beta right now? [0] And it requires one of CUDA/Triton/ROCm?

[0] https://github.com/Dao-AILab/flash-attention

But maybe I'm out of the loop? Where do you see that flash attention 4 is written in Python?

replies(1): >>45150751 #

21. boroboro4 ◴[06 Sep 25 16:38 UTC] No.45150751{7}[source]▶

>>45150630 #

From this perspective PyTorch is separate language, at least as soon as you start using torch.compile (only subset of PyTorch python will be compilable). That’s strength of python - it’s great for describing things and later for analyzing them (and compiling, for example).

Just to be clear here - you use triton from plain python, it runs compilation inside.

Just like I’m pretty sure not all mojo can be used to write kernels? I might be wrong here, but it would be very hard to fit general purpose code into kernels (and to be frank pointless, constrains bring speed).

As for flash attention there was a leak: https://www.reddit.com/r/LocalLLaMA/comments/1mt9htu/flashat...

22. bjourne ◴[06 Sep 25 21:00 UTC] No.45152799{5}[source]▶

>>45148792 #

Well, Mojo hasn't been released so we can't precisely say what it can and can't do. If it can emit CUDA code then it does it by transpiling Mojo into CUDA. And there is no reason why Python can't also be transpiled to CUDA.

What I mean here is that DNN code is written on a much higher level than kernels. They are just building blocks you use to instantiate your dataflow.

↑