e.g. avoid dplyr overriding base::filter
use(“dplyr”, c(“mutate”, “summarize”))
For engineering stuff i want strong static analysis (type hints, pydantic, mypy), observability (logfire, structlog), and support (can i upload a package to my cloud package registry?).
For ML stuff, i want the libraries everyone else uses (pytorch, huggingface) because popularity brings a lot of development and documentation and obscure github issues the R clones lack.
Userbase matters. In R, hardly any users are doing any engineering; most R code only needs to run successfully one time. The ecosystem reflects that. The python-based ML world has the same problem, but the broader sea of python engineers helps counterbalance.
There’s a ton more python code out there so the LLM reliability in python code just makes my life easier. R was great and still is, but my world is now more than just data eng, model fitting, and viz. I have to deal with operationalizing and working with people who aren’t just data science and most org don’t have the luxury of having an easy production R system so I can get my python code over the line and trust a good engineer will be okay smeshing that into the production stack which is likely heavy Python. (Instead of saying oh we don’t work with R we do Python Java so it will take 3-5x longer).
Another sad truth is the cool ml kids all want to do pytorch deep ML training / post training / rlhf / ppo / gdpr gtfo so you are not real hardcore ml if you only do R. I know it’s stupid but the world is kind of like that.
You want to hire people who want to build their careers on the cool stack. I know it’s not all the cool talk the hackers here play with but for real world application I have a lot of other considerations.