I’m sure part of Python’s success is sheer mindshare momentum from being a common computing denominator, but I’d guess the integration story is part of the margins. Your back end may well already be in python or have interop, reducing stack investment and systems tax.
I added the SQL query to the top of the R script to generate the input data.frame and my Python code reads the output CSV to do subsequent processing and storage into Django models.
I use a subprocess running Rscript to run the script.
It's not elegant but it is simple. This part of the system only has to run daily so efficiency isn't a big deal.
Well, better late than never I guess.
For capital P Production use I would still rewrite it in rust (polars) or go (stats). But that’s only if it’s essential to either achieve high throughput with concurrency or measure performance in nanoseconds vs microseconds.
I guess you'll need to decide whether this is a big enough issue to warrant the new dependencies.
I tried Shiny a few years back and frankly it was not good enough to be considered. Maybe it's matured since then--I'll give it another look.
> Not having a Django-like or others web stack python may have talks more about the users of R than the language per se. Its background was to replace S which was a proprietary statistics language not to enter competition with Perl used in CGI and early web.
I'm aware, but that doesn't address the problem I pointed out in any way.
> R is very powerful and is Lisp in disguise coupled with the same infrastructure that let you use C under the hood like python for most libraries/packages.
Things I don't want to ever do: use C to write a program that displays my R data to the web.
Thanks for posting!
The problem is pinning dependencies. So while an R analysis written using base R 20 or 30 years ago works fine, something using dplyr is probably really difficult to get up and running.
At my old work we took a copy of CRAN when we started a new project and added dependencies from then.
So instead of asking for dplyr version x.y, as you'd do ... anywhere, we added dplyr as it and its dependencies where stored on CRAN on this specific date.
We also did a lot of systems programming in R, which I thought of as weird, but for the exact same reason as you are saying for Python.
But R is really easy to install, so I don't see why you can't setup a step in your pipeline that does R - or even both R and Python. They can read dataframes from eachothers memory.
My employer is using R to crunch numbers enbeded in a large system based on microservices.
The only thing to keep in mind is that most people writing R are not programmers by trade so it is good to have one person on the project who can refactor their code from time to time.
Here's the github to the package https://github.com/b-rodrigues/rixpress/tree/master
and here's an example pipeline https://github.com/b-rodrigues/rixpress_demos/tree/master/py...
I'm wondering if I should devote time to learning Nextflow/Snakemake, or whether the solution that you outlined is "sufficient" (I say "sufficient" in quotes because of course, depends on the use case).