←back to thread

311 points melodyogonna | 3 comments | | HN request time: 0.015s | source
Show context
MontyCarloHall ◴[] No.45138920[source]
The reason why Python dominates is that modern ML applications don't exist in a vacuum. They aren't the standalone C/FORTRAN/MATLAB scripts of yore that load in some simple, homogeneous data, crunch some numbers, and spit out a single result. Rather, they are complex applications with functionality extending far beyond the number crunching, which requires a robust preexisting software ecosystem.

For example, a modern ML application might need an ETL pipeline to load and harmonize data of various types (text, images, video, etc., all in different formats) from various sources (local filesystem, cloud storage, HTTP, etc.) The actual computation then must leverage many different high-level functionalities, e.g. signal/image processing, optimization, statistics, etc. All of this computation might be too big for one machine, and so the application must dispatch jobs to a compute cluster or cloud. Finally, the end results might require sophisticated visualization and organization, with a GUI and database.

There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python. Python's numerical computing libraries (NumPy/PyTorch/JAX etc.) all call out to C/C++/FORTRAN under the hood and are thus extremely high-performance, and for functionality they don't implement, Python's C/C++ FFIs (e.g. Python.h, NumPy C integration, PyTorch/Boost C++ integration) are not perfect, but are good enough that implementing the performance-critical portions of code in C/C++ is much easier compared to re-implementing entire ecosystems of packages in another language like Julia.

replies(8): >>45139364 #>>45140601 #>>45141802 #>>45143317 #>>45144664 #>>45146179 #>>45146608 #>>45146905 #
nialv7 ◴[] No.45146608[source]
You argument is circular. Python has all this ecosystem _because_ it have been the language of choice for ML for a decade. At this point it's difficult to beat, but doesn't explain why it was chosen all those years ago.
replies(2): >>45146697 #>>45147882 #
1. 317070 ◴[] No.45147882[source]
I was there when it was chosen all those years ago.

At the time (2007-2009), Matlab was the application of choice for what would become "deep" learning research. Though it had its warts, and licensing issues. It was easy for students to get started with and to use, also as a lot of them were not from computer science backgrounds, but often from statistics, engineering or neuroscience.

When autograd came (this was even before gpu's), people needed something more powerful than matlab, yet familiar. Numpy already existed, and python+numpy+matplotlib give you an environment and a language very similar to matlab. The biggest hurdle was that python is zero-indexed.

If things went slightly different, I reckon we might have ended up using Octave or lua. I reckon Octave was too restrictive and poorly documented for autograd. On the other hand, lua was too dissimilar to matlab. I think it was Theano, the first widely used python autograd, and then later PyTorch, that really sealed the deal for python.

replies(2): >>45148368 #>>45148960 #
2. nickpeterson ◴[] No.45148368[source]
You were there 30 years ago, when the strength of men failed?
3. breuleux ◴[] No.45148960[source]
We chose Python for Theano because Python was already the language of choice for our research lab. If it had been my choice, I would probably have picked Scheme (I was really into macros at that time) or Ruby (I think it's better designed than Python). But if we had done it in another language than Python, frankly, I'm not sure it would have taken off in the first place. Python already had quite a bit of inertia, likely thanks to numpy and matplotlib.