ML needs a new programming language – Interview with Chris Lattner

(signalsandthreads.com)

311 points melodyogonna | 2 comments | 05 Sep 25 11:33 UTC | HN request time: 0s | source

Show context

MontyCarloHall ◴[05 Sep 25 14:19 UTC] No.45138920[source]▶

The reason why Python dominates is that modern ML applications don't exist in a vacuum. They aren't the standalone C/FORTRAN/MATLAB scripts of yore that load in some simple, homogeneous data, crunch some numbers, and spit out a single result. Rather, they are complex applications with functionality extending far beyond the number crunching, which requires a robust preexisting software ecosystem.

For example, a modern ML application might need an ETL pipeline to load and harmonize data of various types (text, images, video, etc., all in different formats) from various sources (local filesystem, cloud storage, HTTP, etc.) The actual computation then must leverage many different high-level functionalities, e.g. signal/image processing, optimization, statistics, etc. All of this computation might be too big for one machine, and so the application must dispatch jobs to a compute cluster or cloud. Finally, the end results might require sophisticated visualization and organization, with a GUI and database.

There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python. Python's numerical computing libraries (NumPy/PyTorch/JAX etc.) all call out to C/C++/FORTRAN under the hood and are thus extremely high-performance, and for functionality they don't implement, Python's C/C++ FFIs (e.g. Python.h, NumPy C integration, PyTorch/Boost C++ integration) are not perfect, but are good enough that implementing the performance-critical portions of code in C/C++ is much easier compared to re-implementing entire ecosystems of packages in another language like Julia.

replies(8): >>45139364 #>>45140601 #>>45141802 #>>45143317 #>>45144664 #>>45146179 #>>45146608 #>>45146905 #

benzible ◴[05 Sep 25 16:41 UTC] No.45140601[source]▶

>>45138920 #

Python's ecosystem is hard to beat, but Elixir/Nx already does a lot of what Mojo promises. EXLA gives you GPU/TPU compilation through XLA with similar performance to Mojo's demos, Explorer handles dataframes via Polars, and now Pythonx lets you embed Python when you need those specialized libraries.

The real difference is that Elixir was built for distributed systems from day one. OTP/BEAM gives the ability to handle millions of concurrent requests as well as coordinating across GPU nodes. If you're building actual ML services (not just optimizing kernels), having everything from Phoenix / LiveView to Nx in one stack built for extreme fault-tolerance might matter more than getting the last bit of performance out of your hardware.

replies(2): >>45140671 #>>45148164 #

1. melodyogonna ◴[05 Sep 25 16:47 UTC] No.45140671[source]▶

>>45140601 #

Who uses this Exla in production?

replies(1): >>45144825 #

2. benzible ◴[05 Sep 25 23:13 UTC] No.45144825[source]▶

>>45140671 (TP) #

These guys, for one: https://www.amplified.ai

See: https://www.youtube.com/watch?v=5FlZHkc4Mq4

↑