The strong AI focus seems to be a sign of the times, and not actually something that makes sense imo.
The strong AI focus seems to be a sign of the times, and not actually something that makes sense imo.
https://www.modular.com/blog/a-new-simpler-license-for-max-a...
Are you sure about that? I think Mojo was always talked about as "The language for ML/AI", but I'm unsure if Mojo was announced before the current hype-cycle, must be 2-3 years at this point right?
Swift has some nice features. However, the super slow compilation times and cryptic error messages really erase any gains in productivity for me.
- "The compiler is unable to type-check this expression in reasonable time?" On an M3 Pro? What the hell!?
- To find an error in SwiftUI code I sometimes need to comment everything out block by block to narrow it down and find the culprit. We're getting laughs from Kotlin devs.
Also, it appears to be more robust. Julia is notoriously fickle in both semantics and performance, making it unsuitable for foundational software the way Mojo strives for.
1: https://enzymead.github.io/Reactant.jl/dev/ 2: https://enzymead.github.io/Reactant.jl/dev/
They say they'll open source in 2026 [1]. But until that has happened I'm operating under the assumption that it won't happen.
[0]: https://www.modular.com/legal/community
[1]: https://docs.modular.com/mojo/faq/#will-mojo-be-open-sourced
It has been Mojo's explicit goal from the start. It has it's roots in the time that Chris Lattner spent at Google working on the compiler stack for TPUs.
It was explicitly designed to by Python-like because that is where (almost) all the ML/AI is happening.
> write state of the art kernels
You don't write kernels in Julia.
[1] https://juliagpu.github.io/KernelAbstractions.jl/stable/
https://cuda.juliagpu.org/stable/tutorials/introduction/#Wri...
With KernelAbstractions.jl you can actually target CUDA and ROCm:
https://juliagpu.github.io/KernelAbstractions.jl/stable/kern...
For python (or rather python-like), there is also triton (and probably others):
Mojo is effectively an internal tool that Modular have released publicly.
I'd be surprised to see any serious adoption until a 1.0 state is reached.
But as the other commented said, it's not really competing with PyTorch, it's competing with CUDA.
Although I have my doubts that Julia is actually willing to make the compromises which would allow Julia to go that low level. I.e. semantic guarantees about allocations and inference, guarantees about certain optimizations, and more.
First of all some people really like Julia, regardless of how it gets discussed on HN, its commercial use has been steadily growing, and has GPGPU support.
On the other hand, regardless of the sore state of JIT compilers on CPU side for Python, at least MVidia and Intel are quite serious on Python DSLs for GPGPU programming on CUDA and One API, so one gets close enough to C++ performance while staying in Python.
So Mojo isn't that appealing in the end.
People have used the same infrastructure to allow you to compile Julia code (with restrictions) into GPU kernels
Not sure how that organization compares to Mojo.
And "correlation is not causality," but the occupation with the most vibrant job market until recently was also the one that used free tools. Non-developers like myself looked to that trend and jumped on the bandwagon when we could. I'm doing things with Python that I can't do with Matlab because Python is free.
Interestingly, we may be going back to proprietary tools, if our IDE's become a "terminal" for the AI coding agents, paid for by our employers.
Like, Rust could not be a C++ library, that does not make sense. Zig could not be a C library. Julia could not be a Python library.
There is some superficial level of abstraction where all programming languages do is interchangeable computation and therefore everything can be achieved in every language. But that superficial sameness doesn't correspond to the reality of programming.
The package https://github.com/JuliaGPU/KernelAbstractions.jl was specifically designed so that julia can be compiled down to kernels.
Julia's is high level yes, but Julia's semantics allow it to be compiled down to machine code without a "runtime interpretter" . This is a core differentiating feature from Python. Julia can be used to write gpu kernels.
First-class support for AoT compilation.
https://docs.modular.com/mojo/cli/build
Yes, Julia has a few options for making executables but they feel like an afterthought.
Just comparing for example c++, c#, and typescript. These are all c-like, have heavy MS influence, and despite that all have deeply different fundamentals, concepts, use cases, and goals.
If you want the full C# experience, you will still be getting Windows, Visual Studio, or Rider.
VSCode C# support is under the same license as Visual Studio Community, and lack several tools, like the advanced graphical debugging for parallel code and code profiling.
The great Microsoft has not open sourced that debugger, nor many other tools on .NET ecosystem, also they can afford to subsidise C# development as gateway into Azure, and being valued in 4 trillion, the 2nd biggest in the world.
1. Easy packaging into one executable. Then, making sure that can be reproducible across versions. Getting code from prior, AI papers to rub can be hard.
2. Predictability vs Python runtime. Think concurrent, low-latency GC's or low/zero-overhead abstractions.
3. Metaprogramming. There have been macro proposals for Python. Mojo could borrow from D or Rust here.
4. Extensibility in a way where extensions don't get too tied into the internal state of Mojo like they do Python. I've considered Python to C++, Rust, or parallelized Python schemes many times. The extension interplay is harder to deal with than either Python or C++ itself.
5. Write once, run anywhere, to effortlessly move code across different accelerators. Several frameworks are doing this.
6. Heterogenous, hot-swappable, vendor-neutral acceleration. That's what I'm calling it when you can use the same code in a cluster with a combination of Nvidia GPU', AMD GPU's, Gaudi3's, NPU's, SIMD chips, etc.
As per the roadmap[1], I expect to start seeing more adoption once phase 1 is completed.
For example, a modern ML application might need an ETL pipeline to load and harmonize data of various types (text, images, video, etc., all in different formats) from various sources (local filesystem, cloud storage, HTTP, etc.) The actual computation then must leverage many different high-level functionalities, e.g. signal/image processing, optimization, statistics, etc. All of this computation might be too big for one machine, and so the application must dispatch jobs to a compute cluster or cloud. Finally, the end results might require sophisticated visualization and organization, with a GUI and database.
There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python. Python's numerical computing libraries (NumPy/PyTorch/JAX etc.) all call out to C/C++/FORTRAN under the hood and are thus extremely high-performance, and for functionality they don't implement, Python's C/C++ FFIs (e.g. Python.h, NumPy C integration, PyTorch/Boost C++ integration) are not perfect, but are good enough that implementing the performance-critical portions of code in C/C++ is much easier compared to re-implementing entire ecosystems of packages in another language like Julia.
Sorry, couldn't resist.
but all the normal marketing words: in my opinion it is fast, expressive, and has particularly good APIs for array manipulation
more realistic examples of compiling a Julia package into .so: https://indico.cern.ch/event/1515852/contributions/6599313/a...
1.9 and 1.10 made huge gains in package precompilation and native code caching. then attentions shifted and there were some regressions in compile times due to unrelated things in 1.11 and the upcoming 1.12. but at the same time, 1.12 will contain an experimental new feature `--trim` as well as some further standardization around entry points to run packages as programs, which is a big step towards generating self-contained small binaries. also nearly all efforts in improving tooling are focused on providing static analysis and helping developers make their program more easily compilable.
it's also important a bit to distinguish between a few similar but related needs. most of what I just described applies to generating binaries for arbitrary programs. but for the example you stated "time to first plot" of existing packages, this is already much improved in 1.10 and users (aka non-package-developers) should see sub-second TTFP, and TTFX for most packages they use that have been updated to use the precompilation goodies in recent versions
using Plots
display(plot(rand(8)))
end
1.074321 seconds
On Julia 1.12 (currently at release candidate stage), <1mb hello world is possible with juliac (although juliac in 1.12 is still marked experimental)I'll warn you that Julia's ML ecosystem has the most competitive advantage on "weird" types of ML, involving lots of custom gradients and kernels, integration with other pieces of a simulation or diffeq, etc.
if you just want to throw some tensors around and train a MLP, you'll certainly end up finding more rough edges than you might in PyTorch
If you're interested, they think the language will be ready for open source after completing phase 1 of the roadmap[2].
What Julia needs though: wayyyy more thorough tooling to support auto generated docs, well integrated with package management tooling and into the web package management ecosystem. Julia attracts really cutting edge research and researchers writing code. They often don't have time to write docs and that shouldn't really matter.
Julia could definitely use some work in the areas discussed in this podcast, not so much the high level interfaces but the low level ones. That's really hard though!
Or, arguably worse: my expectation is that they'll open source it, wait for it to get a lot of adoption, possibly some contribution, certainly a lot of mindshare, and then change the license to some text no one has ever heard of that forbids use on nvidia hardware without paying the piper or whatever
If it ships with a CLA, I hope we never stop talking about that risk
> There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python.
That may be true, but some of us are still bitter that all that grew up around an at-least-averagely-annoying language rather than something nicer.
I don't believe the first two are true, and as a point of reference Rider is part of their new offerings that are free for non-commercial use https://www.jetbrains.com/rider/#:~:text=free%20for%20non-co...
I also gravely, gravely doubt the .NET ecosystem has anything in the world to do with Azure
Then again, I am also open to the fact that I'm jammed up by the production use of dynamically typed languages, and maybe the "for ML" part means "I code in Jupyter notebooks" and thus give no shits about whether person #2 can understand what's happening
Combine that with all the cutting edge applied math packages often being automatically compatible with the autodiff and GPU array backends, even if the library authors didn't think about that... it's a recipe for a lot of interesting possibilities.
Then the title should be "why GPU kernel programming needs a new programming language." I can get behind that; I've written CUDA C and it was not fun (though this was over a decade ago and things may have since improved, not to mention that the code I wrote then could today be replaced by a couple lines of PyTorch). That said, GPU kernel programming is fairly niche: for the vast majority of ML applications, the high-level API functions in PyTorch/TensorFlow/JAX/etc. provide optimal GPU performance. It's pretty rare that one would need to implement custom kernels.
>which are never, ever written in Python.
Not true! Triton is a Python API for writing kernels, which are JIT compiled.
With Mojo, on the other hand, I think a library (or improvements to an existing library) would have been a better approach. A new language needlessly forks the developer community and duplicates work. But I can see the monetary incentives that made the Mojo developers choose this path, so good for them.
explicit and strict types on arguments to functions is one way, but certainly not the only way, nor probably the best way to effect that
I readily admit that I am biased in that I believe that having a computer check that every reference to every relationship does what it promises, all the time
more generally, the most important bits of a particular function to understand is
* what should it be called with
* what should it return
* what side effects might it have
and the "what" here refers to properties in a general sense. types are a good shortcut to signify certain named collections of properties (e.g., the `Int` type has arithmetic properties). but there are other ways to express traits, preconditions, postconditions, etc. besides types
But I’m much more interested in how MLIR opens the door to “JAX in <x>”. I think Julia is moving in that direction with Reactant.jl, and I think there’s a Rust project doing something similar (I think burn.dev may be using ONNX has an even higher-level IR). In my ideal world, I would be able to write an ML model and training loop in some highly verified language and call it from Python/Rust for training.
Machine Learning is shortened to ML, too.
This posting is about "Why ML needed Mojo", but does not tell us why the license of Mojo is garbage.
M. Learning as an example of compute intensive tasks could have been the Rails moment for Ruby here, but seems like Mojo is dead on arrival — it was trending here at hackernews when announced, but no one seems to talk about it now.
---------------
(I like em-dashes, but not written wit any AI, except for language tools spellchecker)
they sure can...
The real difference is that Elixir was built for distributed systems from day one. OTP/BEAM gives the ability to handle millions of concurrent requests as well as coordinating across GPU nodes. If you're building actual ML services (not just optimizing kernels), having everything from Phoenix / LiveView to Nx in one stack built for extreme fault-tolerance might matter more than getting the last bit of performance out of your hardware.
That's kind of the point of Mojo, they're trying to solve the so-called "two language problem" in this space. Why should you need two languages to write your glue code and kernel code? Why can't there be a language which is both as easy to write as Python, but can still express GPU kernels for ML applications? That's what Mojo is trying to be through clever use of LLVM MLIR.
As far as the executable size, it was only 85kb in my test, a bouncing balls simulation. However, it required 300MB of Julia libraries to be shipped with it. About 2/3 of that is in libjulia-codegen.dll, libLLVM-16jl.dll. So you're shipping this chunky runtime and their LLVM backend. If you're willing to pay for that, you can ship a Julia executable. It's a better story than what Python offers, but it's not great if you want small, self-contained executables.
Mojo also has a bunch of documentation https://docs.modular.com/mojo/ as well as hundreds of thousands of lines of open source code you can check out: https://github.com/modular/modular
The Mojo community is really great, please consider joining, either our discourse forum: https://forum.modular.com/ or discord https://discord.com/invite/modular chat.
-Chris Lattner
Azure pays for .NET, and projects like Aspire.
I think this is because many, many problem domains have a structure that lends themselves well to two-language solutions. They have a small homogenous computation structure on lots of data that needs to run extremely fast. And they also have a lot of configuration and data-munging that is basically quick one-time setup but has to be specified somewhere, and the more concisely you can specify it, the less human time development takes. The requirements on a language designed to run extremely fast are going to be very different from one that is designed to be as flexible and easy to write as possible. You usually achieve quick execution by eschewing flexibility and picking a programming model that is fairly close to the machine model, but you achieve flexibility by having lots of convenience features built into the language, most of which will have some cost in memory or indirections.
There've been a number of attempts at "one language to rule them all", notably PL/1, C++, Julia (in the mathematical programming subdomain), and Common Lisp, but it often feels like the "flexible" subset is shoehorned in to fit the need for zero-cost abstractions, and/or the "compute-optimized" subset is almost a whole separate language that is bolted on with similar but more verbose syntax.
Most people that know this kind of thing don't get much value out of using a high level language to do it, and it's a huge risk because if the language fails to generate something that you want, you're stuck until a compiler team fixes and ships a patch which could take weeks or months. Even extremely fast bug fixes are still extremely slow on the timescales people want to work on.
I've spent a lot of my career trying to make high level languages for performance work well, and I've basically decided that the sweet spot for me is C++ templates: I can get the compiler to generate a lot of good code concisely, and when it fails the escape hatch of just writing some architecture specific intrinsics is right there whenever it is needed.
Have a hard time believing C++ and Java don't have rich enough ecosystems. Not saying they make for good glue languages, but everything was being written in those languages before Python became this popular.
Got any sources on that? I've been interested in learning Julia for a while but don't because it feels useless compared to Python, especially now with 3.13
Don't worry. If you stick around this industry long enough you'll see this happen several more times.
It's not about "richness," it's about giving a language ecosystem for people who don't really want to do the messy, low-level parts of software, and which can encapsulate the performance-critical parts with easy glue
https://techcrunch.com/2023/08/24/modular-raises-100m-for-ai...
You don’t raise $130M at a $600M valuation to make boring old dev infrastructure that is sorely needed but won’t generate any revenue because no one is willing to pay for general purpose programming languages in 2025.
You raise $130M to be the programming foundation of next Gen AI. VCs wrote some big friggen checks for that pitch.
In practice, the Julia package ecosystem is weak and generally correctness is not a high priority. But the language is great, if you're willing to do a lot of the work yourself.
https://www.cocoawithlove.com/blog/2016/07/12/type-checker-i...
let a: Double = -(1 + 2) + -(3 + 4) + -(5)
Still fails on a very recent version of Swift, Swift 6.1.2, if my test works.
Chris Lattner mentions something about him being more of an engineer than a mathematician, but a responsible and competent computer science engineer that is designing programming languages with complex type systems, absolutely has to at least be proficient in university-level mathematics and relevant theory, or delegate out and get computer scientists to find and triple-check any relevant aspects of the budding programing language.
I haven't seen new languages that market themselves for specific features that couldn't be done just as easily through straight classes with operator overloading.
- https://www.youtube.com/watch?v=pdJQ8iVTwj8
- https://open.spotify.com/episode/6flH0XxwdIbayoXTHOgAfI
- https://podcasts.apple.com/us/podcast/381-chris-lattner-futu...
He has two other episodes on the show:
- https://www.youtube.com/watch?v=nWTvXbQHwWs (Episode 131, October 18th, 2020)
- https://www.youtube.com/watch?v=yCd3CzGSte8 (Episode 21, May 13th, 2019)
It seems like it's just extremely difficult to give fine-grained control over the metal while having an easy, ergonomic language that lets you just get on with your tasks.
nah never ever ever ever ever ... except
https://github.com/FlagOpen/FlagGems
https://github.com/linkedin/Liger-Kernel
https://github.com/meta-pytorch/applied-ai
https://github.com/gpu-mode/triton-index
https://github.com/AlibabaPAI/FLASHNN
https://github.com/unslothai/unsloth
the number of people commenting on this stuff that don't know what they're actually talking about grows by leaps and bounds every day...
I've been involved in a few programming language projects, so I'm sympathetic as to how much work goes into one and how long they can take.
At the same time, it makes me wary of Julia, because it highlights that progress is very slow. I think Julia is trying to be too much at once. It's hard enough to be a dynamic, interactive language, but they also want to claim to be performant and compiled. That's a lot of complexity for a small team to handle and deliver on.
I watched a lot of your talks about mojo, where you mention how mojo benefits from very advanced compiler technology. But i've never seen you give concret example of this advanced technology. Please, can you write a blog about that, going as deep and hardcore tech as you can ? As i'm not a compiler dev i'll probably understand 20% of it, but hopefully i'll start to get a sense of how advanced the whole thing is.
C++ just seems like a safer bet but I'd love something better and more ergonomic.
Optimizing Julia is much harder than optimizing Fortran or C.
They have some fancy tech around compile-time interpreter, mojopkg, and their code generator. They also have a graph compiler (not mentioned in the talk) that can fuse kernels written in Mojo.
If OpenJDK did not or could not exist, we would all be mega fucked. Luckily alternative Java implementations exist, are performant, and are largely indistinguishable from the non-free stuff. Well, except whatever android has going on... That stuff is quirky.
But if you look at dotnet, prior to opensourcing it, it was kind of a fucked up choice to go with. You were basically locked into Windows Server for your backend and your application would basically slowly rot over time as you relied on legacy windows subsystems that even Microsoft barely cared to support, like COM+.
There were always alternative dotnet runtimes like Mono, but they were very not feature complete. Which, ironically, is why we saw so much Java. The CLR is arguably a better designed VM and C# a better language, but it doesn't matter.
God, I hate exceptions so much. I have never seen anyone use exceptions correctly in either Java (at FAANG) or in any regular Python application.
I'm much more in favor of explicit error handling like in Go, or the syntax sugar Rust provides.
but there are other ways to express traits, preconditions, postconditions, etc. besides types
You can also put that in the type system, and expressive languages do. Its just a compiler limitations when we can't.I mean, even in C++ with concepts we can do most of that. And C++ doesn't have the most expressive type system.
If I make the assumption that future ML code will be written by ML algorithms (or at least 'transpiled' from natural language), and Lisp S-Expressions are basically the AST, it would make sense that it would be the most natural language for a machine to write in. As a side benefit, it's already the standard to have a very feature complete REPL, so as a replacement for Python it seems it would fit well
Its not just that OS tooling is "free", it's also better and works for way longer. If you relied on proprietary Delphi-compatible tooling, well... you fucked up!
Like, python says to throw an exception when an iterator reaches the end so if my custom C iterator does that is it wrong? I do kind of want to be able to do the whole 'for i in whatever: do(stuff(i))' thing so...
As a FAQ answer to "is this available?" pointing me to a thing which says it will be removed at some unspecified point sort of misses the point. I guess maybe the answer is "Not for long" ?
I've got these crazy tuned up CUDA kernels that are relatively straightforward to build in isolation and really where all the magic happens, and there's this new CUTLASS 3 stuff and modern C++ can call it all trivially.
And then there's this increasingly thin film of torch crap that's just this side of unbuildable and drags in this reference counting and broken setup.py and it's a bunch of up and down projections to the real hacker shit.
I'm thinking I'm about one squeeze of the toothpaste tube from just squuezing that junk out and having a nice, clean, well-groomed C++ program that can link anything and link into anything.
Modern Common Lisp also seems to have given up its "one language to rule them all" mindset and is pretty okay with just dropping into CFFI to call into C libraries as needed. Over the years I've come to see that mindset as mostly a dead-end. Python, web browsers, game engines, emacs, these are all prominent living examples of two-language solutions that have come to dominate in their problem spaces.
One aspect of the "two language problem" that I find troubling though is that modern environments often ossify around the exact solution. For example, it's very difficult to have something like PyTorch in say Common Lisp even though libcuda and libdnn should be fairly straightforward to wrap in Common Lisp (see [1] for Common Lisp CUDA bindings.) JS/TS/WASM that runs in the browser often is dependent on Chrome's behavior. Emacs continues to be tied to its ancient, tech-debt ridden C runtime. There seems to be a lot of value tied into the glue between the two chosen languages and it's hard to recreate that value with other HLLs even if the "metal" language/runtime stays the same.
PyTorch is actually quite illustrative as being a counterexample that proves the rule. It was based on Torch, which had very similar if not identical BLAS routines but used Lua as the scripting language. But now everybody uses PyTorch because Lua development stopped in 2017, so all the extra goodies that people rely on now are in the Python wrapper.
The only exception seems to be when multiple scripting languages are supported, and at roughly equal points of development. So for example - SQLite continues to have most of its value in the C substrate, and is relatively easy to port to other languages, because it has so many language bindings that there's a strong incentive to write new functionality in C and keep the API simple. Ditto client libraries for things like MySQL, PostGres, MongoDB, Redis, etc. ZeroMQ has a bunch of bindings that are largely dumb passthroughs to the underlying C++ substrate.
But even a small imbalance can lead to that one language being preferenced heavily in supporting tooling and documentation. Pola.rs is a Rust substrate and ships with bindings for Python, R, and Node.js, but all the examples on the website are in Python or Rust, and I rarely hear of a non-Python user picking it up.
That being said, I've always found the argument that types can be overly restrictive and prevent otherwise valid code from running unconvincing. I've yet to see dynamic code that benefits from this alleged advantage.
Nearly universally the properly typed code for the same thing is better, more reliable and easier for new people to understand and modify. So sure, you can avoid all of this if the types are really what bother you, but it feels a bit like saying "there are stunts I can pull off if I'm not wearing a seatbelt that I just can't physically manage if I am."
If doing stunts is your thing, knock yourself out, but I'd rather wear a seatbelt and be more confident I'm going to get to my destination in one piece.
Yes, you can write mojo with python syntax and transpile. You'd end up with something similar to Julia's 1.5 language problem.
Since the mojo language is not fully specified, it's hard to understand what language constructs can't be efficiently expressed in the python syntax.
Love MLIR and Mojo as debuggable/performant intermediate languages.
I also wonder how much of the ossification comes from the embodied logic in the HLL. SQLite wrappers tend to be very simple and let the C core do most of the work. Something like PyTorch on the other hand layers on a lot of logic onto underlying CUDA/BLAS that is essential complexity living solely in Python the HLL. This is also probably why libcurl has so many great wrappers in HLLs because libcurl does the heavy lifting.
The pain point I see repeatedly in putting most of the logic into the performant core is asynchrony. Every HLL seems to have its own way to do async execution (Python with asyncio, Node with its async runtime, Go with lightweight green threads (goroutines), Common Lisp with native threads, etc.) This means that the C core needs to be careful as to what to expose and how to accommodate various asynchrony patterns.
It already exists. It is called PyTorch/JAX/TensorFlow. These frameworks already contain sophisticated compilers for turning computational graphs into optimized GPU code. I dare say that they don't leave enough performance on the table for a completely new language to be viable.
(for real though, this looks cool; what kinds of longevity should we expect from such a project - how can we be sure?)
For example Rust gives you a `Result<Thing, ErrorType>`, which might be a `Thing` or might be one of the possible error types given by `ErrorType`. So when you get a function's return value, you have to deal with the fact that it might have failed with this specific error.
1) we were "ambitiously optimistic" (different way of saying "ego-driven naïveté" perhaps :) ) and 2) the internet misread our long-term ambitions as being short-term goals.
We've learned that the world really really wants a better Python and the general attention spans of clickbait are very short - we've intentionally dialed back to very conservative claims to avoid the perception of us overselling.
We're still just as ambitious though!
-Chris
While I'm not a C++ expert, doing the same in Python is just one pip install away, so yeah both "richness" and "ease of use" of the ecosystem matters
Yet the roadmap says:
> As Mojo matures through phase 3, we believe Mojo will become increasingly compatible with Python code and deeply familiar to Python users, except more efficient, powerful, coherent, and safe. Mojo may or may not evolve into a full superset of Python, and it's okay if it doesn't.
This is incredibly confusing. If it's _not_ aiming for Python compatibility, why are we talking about Python at all?
Also, is anyone actually considering using an emoji as a file extension?
consider that at minimum both FB and OAI themselves definitely make heavy use of the Triton backend in PyTorch.
It was probably the best option during the 2000s, and only superseded when Theano (Python) and Torch (Lua) emerged.
A Lisp comeback in this niche would be great. Python has fantastic libraries, but the language leaves a lot to be desired.
That programming language designers have to be careful about a type system and its type checking's asymptotic time complexity was 100% widely known before Swift was first created. Some people like to diss on mathematics, but this stuff can have severe practical and widespread engineering consequences. I don't expect everyone to master everything, but budding programming language designers should then at least realize that there may be important issues, and for instance mitigate any issues with having one or more experts checking the relevant aspects.
It is literally on the order of tens of thousands of lines of code, instead of tens of millions, to do Vulkan ML, especially if you strip out the parts of the kernel you don't need.
https://info.juliahub.com/industries/case-studies-1/author/j...
Python dominates, because 25 years ago places like CERN started to adopt Python as their main scripting language, and eventually got used for more tasks than empowered shell scripts.
It is like arguing why C dominates and nothing else can ever replace it.
That is why they opted to open source in stages (ie, the stdlib is open source right now, and they expect to open source the compiler soon).
Edit: from other posts on this page, I've realized that being a superset of Python is now regarded a nice-to-have by Modular, not a must-have. They realized it's harder than they thought initially, basically.
Triton, CUDA, etc, let one write GPU kernels.
> really hard and long term software project
That’s kind of what I mean. He commits to projects and gets groups of people talking about them and interested.
Imagine how hard it was to convince everyone at apple to use his language - and how many other smart engineers projects were not chosen. It’s not even clear the engineering merits were there for that one.
This is what differentiates python from other "morally equivalent" scripting languages.
The only other person I know of who has started and lead to maturity multiple massive and infrastructural software projects is Fabrice Bellard. I've never ran into him self promoting (podcasts, HN, etc), and yet his projects are widely used and foundational.
It seems to me like the evidence points to "if you tackle really hard, long term, and foundational software projects successfully, people will use it, regardless of your ability to self promote."
Python was the something nicer. A lot of the other options were so much worse.
Also, no. I can't use Python for inference, because it is too slow, so I have to export to tensorflow lite and run the model in C++, which essentially required me to rewrite half the code in C++ again.
First class support for Python JIT/DSLs across the whole ecosystem.
Change the way C++ is used and taught, more focused on standard C++ support and libraries, than low level CUDA extensions.
So in a way, I think you're kind of right.
At the time (2007-2009), Matlab was the application of choice for what would become "deep" learning research. Though it had its warts, and licensing issues. It was easy for students to get started with and to use, also as a lot of them were not from computer science backgrounds, but often from statistics, engineering or neuroscience.
When autograd came (this was even before gpu's), people needed something more powerful than matlab, yet familiar. Numpy already existed, and python+numpy+matplotlib give you an environment and a language very similar to matlab. The biggest hurdle was that python is zero-indexed.
If things went slightly different, I reckon we might have ended up using Octave or lua. I reckon Octave was too restrictive and poorly documented for autograd. On the other hand, lua was too dissimilar to matlab. I think it was Theano, the first widely used python autograd, and then later PyTorch, that really sealed the deal for python.
We’re now using Python for training and forking Ortex (ONNX) for inference.
The ecosystem just isn’t there, especially for training. It’s a little better for inference but still has significant gaps. I will eventually have time to push contributions upstream but Python has so much momentum behind it.
Livebooks are amazing though and a better experience than anything a Python offers sans libraries.
> Whether it's 1, 2, or N kernels is irrelevant.
Not sure what you mean here. But new kernels are written all the time (flash-attn is a great example). One can't do that in plain Python. E.g., flash-attn was originally written in C++ CUDA, and now in Triton.
But having never written cuda, I have to rely on authority to some extent for this question. And it seems to me like few are in a better position to opine on whether there's a better story to be had for the software-hardware boundary in ML than the person who wrote MLIR, Swift-for-Tensorflow (alongside with making that work on TPUs and GPUs), ran ML at Tesla for some time, was VP at SiFive, ... etc.
Fabrice is one of my examples. Walter Bright (who does spend effort on promotion). Anyone who works on the V8 compiler at Google, or query engine on Postgres who we have never heard of.
> It seems to me like the evidence points to "if you tackle really hard, long term, and foundational software projects successfully, people will use it,
That’s a common belief among engineers. If you have worked at a large company you know that’s just now how big efforts like switching from Objective-C to Swift get done.
NVIDIA wants Python and C++ people, they want a new thing to moat up on, and they know it has to be legitimately good to defy economic gravity on chips a lot of companies can design and fan now.
Somehow python managed to be both high level and low level language for GPUs…
Out of a great many outcomes a very topical one today is semiconductors, and the outcomes rival any on anything for corrupt, incompetent, and entirely consistent with our speed run of the road to irrelevance.
Also, flash attention is at v3-beta right now? [0] And it requires one of CUDA/Triton/ROCm?
[0] https://github.com/Dao-AILab/flash-attention
But maybe I'm out of the loop? Where do you see that flash attention 4 is written in Python?
Just to be clear here - you use triton from plain python, it runs compilation inside.
Just like I’m pretty sure not all mojo can be used to write kernels? I might be wrong here, but it would be very hard to fit general purpose code into kernels (and to be frank pointless, constrains bring speed).
As for flash attention there was a leak: https://www.reddit.com/r/LocalLLaMA/comments/1mt9htu/flashat...
> Still fails on a very recent version of Swift, Swift 6.1.2, if my test works.
FWIW, the situation with this expression (and others like it) has improved recently:
- 6.1 fails to type check in ~4 seconds
- 6.2 fails to type check in ~2 seconds (still bad obviously, but it's doing the same amount of work in less time)
- latest main successfully type checks in 7ms. That's still a bit too slow though, IMO. (edit: it's just first-time deserialization overhead; if you duplicate the expression multiple times, subsequent instances type check in <1ms).
These days though the type checker is not where compile time is mostly spent in Swift; usually it’s the various SIL and LLVM optimization passes. While the front end could take care to generate less redundant IR upfront, this seems like a generally unavoidable issue with “zero cost abstraction” languages, where the obvious implementation strategy is to spit out a ton of IR, inline everything, and then reduce it to nothing by transforming the IR.
That’s really only true if you have overloading though! Without overloading there are no disjunction choices to attempt, and if you also have principal typing it makes the problem of figuring out diagnostics easier, because each expression has a unique most general type in isolation (so your old CSDiag design would actually work in such a language ;-) )
But perhaps a language where you have to rely on generics for everything instead of just overloading a function to take either an Int or a String is a bridge too far for mainstream programmers.
It feels like a much better design point overall.
My original reply was just to point out that constraint solving, in the abstract, can be a very effective and elegant approach to these problems. There’s always a tradeoff, and it all depends on the combination of other features that go along with it. For example, without bidirectional inference, certain patterns involving closures become more awkward to express. You can have that, without overloading, and it doesn’t lead to intractability.
In my opinion, constraint solving would be a bad design point for Mojo, and I regret Swift using it. I'm not trying to say that constraint solving is bad for all languages and use-cases.
What I mean here is that DNN code is written on a much higher level than kernels. They are just building blocks you use to instantiate your dataflow.
I don't think Mojo is a big part of my future, but I've been wrong before and being in Compiler Explorer can't hurt you.
By blaming Khronos you're really just reiterating blame on Intel and AMD with the tacit inclusion of Apple. I suppose you could blame Nvidia for not giving away their IP, but they were a staunch OpenCL supporter from the start.
Even if the goal is to be just "close enough", It seems as pie-in-the-sky (Py-in-the-sky?) now as it did when Mojo was first announced. CPython has a huge surface area and it seems like if Mojo is going to succeed they are going to want to focus on differentiating features in the ML space and not going feature-for-feature with CPython. I don't know what "close enough" is, but closer than Py2 was to Py3, certainly.
Or NextSTEP. Or DX 9. Or whatever the fuck.
That shit sucked when it came out and it's only gotten worse. The cherry on top is the companies that promised they're the bees knees actually know that, which is why they left them to die. And, unfortunately, your applications along with them.
Dotnet was on a slow roll to complete obsolence until MS saw the writing on the wall and open sourced it.
Chris mentioned pushing updates to MLIR while working at Mojo, i'd be curious to have a summary what those changes are, because i believe this is where the hard core innovations are happening.