←back to thread

Big Book of R

(www.bigbookofr.com)
288 points sebg | 5 comments | | HN request time: 0.537s | source
Show context
cye131 ◴[] No.43649039[source]
R especially dplyr/tidyverse is so underrated. Working in ML engineering, I see a lot of my coworkers suffering through pandas (or occasionally polars or even base Python without dataframes) to do basic analytics or debugging, it takes eons and gets complex so quickly that only the most rudimentary checks get done. Anyone working in data-adjacent engineering work would benefit from R/dplyr in their toolkit.
replies(6): >>43649143 #>>43649208 #>>43649881 #>>43650319 #>>43650677 #>>43683325 #
1. aquafox ◴[] No.43650677[source]
Why not mix R and Python in interactive analysis workflows: 1) Download positron: https://github.com/posit-dev/positron 2) Set up a quarto (.qmd) notebook 3) Set up R and Python code chunks in tour quarto document 4a) Use reticulate to spawn a Python session inside R and exchange objects beween both languages (https://github.com/posit-dev/positron/pull/4603) 4b) Write a few helper functions that pass objects between R and Python by reading/writing a temporary file.
replies(5): >>43650688 #>>43653111 #>>43656358 #>>43657369 #>>43690598 #
2. dkga ◴[] No.43650688[source]
This is exactly what I do for the vast majority of my academic papers. It combines the power and flexibility of R for statistics, which I agree with the upstream poster is incredibly underrated (especially with tidyverse) with python.
3. goosedragons ◴[] No.43656358[source]
Org mode in Emacs is even better at this IMO. Only downside is that no guarantee other people use Emacs too.
4. b-rodrigues ◴[] No.43657369[source]
I'm writing a package called rixpress that leverages Nix to build reproducible pipelines with targets in either R or Python

Here's the github to the package https://github.com/b-rodrigues/rixpress/tree/master

and here's an example pipeline https://github.com/b-rodrigues/rixpress_demos/tree/master/py...

5. p00dles ◴[] No.43690598[source]
Is this what tools like Nextflow or Snakemake aim to do? I don't know, and I'm genuinely curious, because I'm starting to work in bioinformatics and doing different parts of an analysis pipeline in R and Python seems common, and, necessary really if you want to use certain packages.

I'm wondering if I should devote time to learning Nextflow/Snakemake, or whether the solution that you outlined is "sufficient" (I say "sufficient" in quotes because of course, depends on the use case).