←back to thread

Big Book of R

(www.bigbookofr.com)
288 points sebg | 2 comments | | HN request time: 0.435s | source
Show context
gsf_emergency_2 ◴[] No.43649466[source]
Any Julians comment?

Having seen Julia proposed as the nemesis of R (not python, that too political, non-lispy)

>the creator of the R programming language, Ross Ihaka, who provided benchmarks demonstrating that Lisp’s optional type declaration and machine-code compiler allow for code that is 380 times faster than R and 150 times faster than Python

(Would especially love an overview of the controversies in graphics/rendering)

https://news.ycombinator.com/item?id=42785785

replies(4): >>43651746 #>>43652322 #>>43658146 #>>43683361 #
1. Hasnep ◴[] No.43652322[source]
In my opinion, Julia has the best alternative to dplyr in its Dataframes.jl package [1]. The syntax is slightly more verbose than dplyr because it's more explicit, but in exchange you get data transformations that you can leave for 6 months and when you come back you can read and understand very quickly. When I used R, if I hadn't commented a pipeline properly I would have to focus for a few minutes to understand it.

In terms of performance, DF.jl seems to outperform dplyr in benchmarks, but for day to day use I haven't noticed much difference since switching to Julia.

There are also APIs built on top of DF.jl, but I prefer using the functions directly. The most promising seems to be Tidier.jl [2] which is a recreation of the Tidyverse in Julia.

In Python, Pandas is still the leader, but its API is a mess. I think most data scientists haven't used R, and so they don't know what they're missing out on. There was the Redframes project [3] to give Pandas a dplyr-esque API which I liked, but it's not being actively developed. I hope Polars can keep making progress in replacing Pandas, but it's still not quite as good as dplyr or even DF.jl.

For plotting, Julia's time to first plot has got a lot better in recent versions, from memory it's something like 20 seconds a few years ago down to 3 seconds now. It'll never be as fast as matplotlib, but if you leave your terminal window open you only pay that price once.

I actually think the best thing to come out of Julia recently is AlgebraOfGraphics.jl [4]. To me it's genuinely the biggest improvement to plotting since ggplot which is a high bar. It takes the ggplot concept of layers applied with the + operator and turns it into an equation, where + adds a layer on top of another, and the * operator has the distributive property, so you can write an expression like data * (layer_1 + layer_2) to visualise the same data with two visualisations. It's very powerful, but because it re-uses concepts from maths that you're already familiar with, it doesn't take a lot of brain space compared to other packages I've used.

[1] https://dataframes.juliadata.org/ [2] https://github.com/TidierOrg/Tidier.jl [3] https://github.com/maxhumber/redframes [4] https://aog.makie.org/

replies(1): >>43654959 #
2. staplung ◴[] No.43654959[source]
Thanks for the links. FWIW, the link for 4 (aog) is currently 404'd, which is amusing because the site is still up. They just seem to have deleted their own top level index.html file. Anyway, this works:

https://aog.makie.org/v0.10.3/