(www.bigbookofr.com)

288 points sebg | 1 comments | 10 Apr 25 17:34 UTC | HN request time: 0.203s | source

Show context

cye131 ◴[10 Apr 25 23:51 UTC] No.43649039[source]▶

R especially dplyr/tidyverse is so underrated. Working in ML engineering, I see a lot of my coworkers suffering through pandas (or occasionally polars or even base Python without dataframes) to do basic analytics or debugging, it takes eons and gets complex so quickly that only the most rudimentary checks get done. Anyone working in data-adjacent engineering work would benefit from R/dplyr in their toolkit.

replies(6): >>43649143 #>>43649208 #>>43649881 #>>43650319 #>>43650677 #>>43683325 #

aquafox ◴[11 Apr 25 05:09 UTC] No.43650677[source]▶

>>43649039 #

Why not mix R and Python in interactive analysis workflows: 1) Download positron: https://github.com/posit-dev/positron 2) Set up a quarto (.qmd) notebook 3) Set up R and Python code chunks in tour quarto document 4a) Use reticulate to spawn a Python session inside R and exchange objects beween both languages (https://github.com/posit-dev/positron/pull/4603) 4b) Write a few helper functions that pass objects between R and Python by reading/writing a temporary file.

replies(5): >>43650688 #>>43653111 #>>43656358 #>>43657369 #>>43690598 #

1. p00dles ◴[15 Apr 25 09:13 UTC] No.43690598[source]▶

>>43650677 #

Is this what tools like Nextflow or Snakemake aim to do? I don't know, and I'm genuinely curious, because I'm starting to work in bioinformatics and doing different parts of an analysis pipeline in R and Python seems common, and, necessary really if you want to use certain packages.

I'm wondering if I should devote time to learning Nextflow/Snakemake, or whether the solution that you outlined is "sufficient" (I say "sufficient" in quotes because of course, depends on the use case).

↑

Big Book of R