←back to thread

Big Book of R

(www.bigbookofr.com)
288 points sebg | 7 comments | | HN request time: 0s | source | bottom
1. uptownfunk ◴[] No.43648229[source]
I will say, now after 15 years messing with this. With LLM I just do it all in Python. But, I still miss the elegance and simplicity of R for data manipulation and analysis. Especially the dplyr semantics. They really nailed it. I think they got crushed by the namespace / import system. There’s something about R that makes you so fluid and intuitive. But the engineering, the efficiency, I get with Python now, I can’t go back.
replies(2): >>43650701 #>>43653052 #
2. dkga ◴[] No.43650701[source]
I agree with all your comment… except the very last bit. Do you really find python to be more efficient at engineering stuff than R? And especially speed, which in my experience at least is broadly the same if not faster with R because it interages easier with Rust and C++?
replies(3): >>43654080 #>>43661370 #>>43661670 #
3. tylermw ◴[] No.43653052[source]
Funny you mention namespacing: R 4.5.0 was just released today with the new `use()` function, which allows you import just what you need instead of clobbering your global namespace, equivalent to python’s `from x import y` syntax.

e.g. avoid dplyr overriding base::filter

use(“dplyr”, c(“mutate”, “summarize”))

replies(1): >>43658211 #
4. claytonjy ◴[] No.43654080[source]
Not OP, but i think python is very far above R for engineering stuff. I built my early career on R and ran R user groups. R is great for one-off analyses, or low-volume controlled repetition like running the same report with new inputs.

For engineering stuff i want strong static analysis (type hints, pydantic, mypy), observability (logfire, structlog), and support (can i upload a package to my cloud package registry?).

For ML stuff, i want the libraries everyone else uses (pytorch, huggingface) because popularity brings a lot of development and documentation and obscure github issues the R clones lack.

Userbase matters. In R, hardly any users are doing any engineering; most R code only needs to run successfully one time. The ecosystem reflects that. The python-based ML world has the same problem, but the broader sea of python engineers helps counterbalance.

5. kgwgk ◴[] No.43658211[source]
The release notes say:

    (Actually already available since R 4.4.0.)
6. uptownfunk ◴[] No.43661370[source]
Everything I need can get done in python, so I don’t even need to deal with rust and cpp. Adding language interop between r and cpp is now just another thing on my plate, so just stick to Python and pay the cost of less elegant code for data manipulation which I am okay with because now I just need to read it and not write it.

There’s a ton more python code out there so the LLM reliability in python code just makes my life easier. R was great and still is, but my world is now more than just data eng, model fitting, and viz. I have to deal with operationalizing and working with people who aren’t just data science and most org don’t have the luxury of having an easy production R system so I can get my python code over the line and trust a good engineer will be okay smeshing that into the production stack which is likely heavy Python. (Instead of saying oh we don’t work with R we do Python Java so it will take 3-5x longer).

Another sad truth is the cool ml kids all want to do pytorch deep ML training / post training / rlhf / ppo / gdpr gtfo so you are not real hardcore ml if you only do R. I know it’s stupid but the world is kind of like that.

You want to hire people who want to build their careers on the cool stack. I know it’s not all the cool talk the hackers here play with but for real world application I have a lot of other considerations.

7. uptownfunk ◴[] No.43661670[source]
On further reflection I think the sweet spot for R for me Has always been prototyping and exploration. Where you don’t exactly know what the logic needs to be, or how the data needs to be cut to get at what you want. So that rapid type of exploration R is really really good at. Closer to math for me than software engineering. And if I had a job where I could just do that all day I’d be pretty happy at this point in my life. and you can’t use a pivot table Google sheets or excel to get at the cut you want or the logic is too complex to do in Google sheets. So for that sweet spot, which is still a broad niche, R is excellent and shines.