←back to thread

311 points melodyogonna | 6 comments | | HN request time: 0s | source | bottom
Show context
MontyCarloHall ◴[] No.45138920[source]
The reason why Python dominates is that modern ML applications don't exist in a vacuum. They aren't the standalone C/FORTRAN/MATLAB scripts of yore that load in some simple, homogeneous data, crunch some numbers, and spit out a single result. Rather, they are complex applications with functionality extending far beyond the number crunching, which requires a robust preexisting software ecosystem.

For example, a modern ML application might need an ETL pipeline to load and harmonize data of various types (text, images, video, etc., all in different formats) from various sources (local filesystem, cloud storage, HTTP, etc.) The actual computation then must leverage many different high-level functionalities, e.g. signal/image processing, optimization, statistics, etc. All of this computation might be too big for one machine, and so the application must dispatch jobs to a compute cluster or cloud. Finally, the end results might require sophisticated visualization and organization, with a GUI and database.

There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python. Python's numerical computing libraries (NumPy/PyTorch/JAX etc.) all call out to C/C++/FORTRAN under the hood and are thus extremely high-performance, and for functionality they don't implement, Python's C/C++ FFIs (e.g. Python.h, NumPy C integration, PyTorch/Boost C++ integration) are not perfect, but are good enough that implementing the performance-critical portions of code in C/C++ is much easier compared to re-implementing entire ecosystems of packages in another language like Julia.

replies(8): >>45139364 #>>45140601 #>>45141802 #>>45143317 #>>45144664 #>>45146179 #>>45146608 #>>45146905 #
1. goatlover ◴[] No.45141802[source]
> There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python.

Have a hard time believing C++ and Java don't have rich enough ecosystems. Not saying they make for good glue languages, but everything was being written in those languages before Python became this popular.

replies(2): >>45142107 #>>45144959 #
2. j2kun ◴[] No.45142107[source]
Yeah the OP here listed a bunch of Python stuff that all ends up shelling out to C++. C++ is rich enough, period, but people find it unpleasant to work in (which I agree with).

It's not about "richness," it's about giving a language ecosystem for people who don't really want to do the messy, low-level parts of software, and which can encapsulate the performance-critical parts with easy glue

replies(2): >>45143014 #>>45145614 #
3. FuckButtons ◴[] No.45143014[source]
I mean, you’ve basically described why people use Python, it’s a way to use C/C++ without having to write it.
replies(1): >>45143132 #
4. anakaine ◴[] No.45143132{3}[source]
And ill take that reason every single day. I could spend days or more working out particular issues in C++, or I could use a much nicer to use glue language with a great ecosystem and a huge community driving it and get the same task done in minutes to hours.
5. flourpower471 ◴[] No.45144959[source]
Ever tried to write a web scraper in c++?
6. lairv ◴[] No.45145614[source]
I tried to statically link DuckDB to one of my C++ project earlier this year and it took me 3 days to have something working on Windows/Linux/MacOS (just to be able to use the dependency)

While I'm not a C++ expert, doing the same in Python is just one pip install away, so yeah both "richness" and "ease of use" of the ecosystem matters