Big Book of R

(www.bigbookofr.com)

288 points sebg | 5 comments | 10 Apr 25 17:34 UTC | HN request time: 1.676s | source

Show context

cye131 ◴[10 Apr 25 23:51 UTC] No.43649039[source]▶

R especially dplyr/tidyverse is so underrated. Working in ML engineering, I see a lot of my coworkers suffering through pandas (or occasionally polars or even base Python without dataframes) to do basic analytics or debugging, it takes eons and gets complex so quickly that only the most rudimentary checks get done. Anyone working in data-adjacent engineering work would benefit from R/dplyr in their toolkit.

replies(6): >>43649143 #>>43649208 #>>43649881 #>>43650319 #>>43650677 #>>43683325 #

1. aquafox ◴[11 Apr 25 05:09 UTC] No.43650677[source]▶

>>43649039 #

Why not mix R and Python in interactive analysis workflows: 1) Download positron: https://github.com/posit-dev/positron 2) Set up a quarto (.qmd) notebook 3) Set up R and Python code chunks in tour quarto document 4a) Use reticulate to spawn a Python session inside R and exchange objects beween both languages (https://github.com/posit-dev/positron/pull/4603) 4b) Write a few helper functions that pass objects between R and Python by reading/writing a temporary file.

replies(5): >>43650688 #>>43653111 #>>43656358 #>>43657369 #>>43690598 #

2. dkga ◴[11 Apr 25 05:11 UTC] No.43650688[source]▶

>>43650677 (TP) #

This is exactly what I do for the vast majority of my academic papers. It combines the power and flexibility of R for statistics, which I agree with the upstream poster is incredibly underrated (especially with tidyverse) with python.

3. goosedragons ◴[11 Apr 25 17:36 UTC] No.43656358[source]▶

>>43650677 (TP) #

Org mode in Emacs is even better at this IMO. Only downside is that no guarantee other people use Emacs too.

4. b-rodrigues ◴[11 Apr 25 19:12 UTC] No.43657369[source]▶

>>43650677 (TP) #

I'm writing a package called rixpress that leverages Nix to build reproducible pipelines with targets in either R or Python

Here's the github to the package https://github.com/b-rodrigues/rixpress/tree/master

and here's an example pipeline https://github.com/b-rodrigues/rixpress_demos/tree/master/py...

5. p00dles ◴[15 Apr 25 09:13 UTC] No.43690598[source]▶

>>43650677 (TP) #

Is this what tools like Nextflow or Snakemake aim to do? I don't know, and I'm genuinely curious, because I'm starting to work in bioinformatics and doing different parts of an analysis pipeline in R and Python seems common, and, necessary really if you want to use certain packages.

I'm wondering if I should devote time to learning Nextflow/Snakemake, or whether the solution that you outlined is "sufficient" (I say "sufficient" in quotes because of course, depends on the use case).

↑