Scala 3 slowed us down?

1. spockz ◴[07 Dec 25 16:15 UTC] No.46182774[source]▶

For me the main takeaway of this is that you want to have automated performance tests in place combined with insights into flamegraphs by default. And especially for these kind of major language upgrade changes.

replies(2): >>46182923 #>>46185326 #

2. esafak ◴[07 Dec 25 16:32 UTC] No.46182923[source]▶

>>46182774 (TP) #

What are folks using for perf testing on JVM these days?

replies(5): >>46183086 #>>46183506 #>>46184574 #>>46185332 #>>46188235 #

3. noelwelsh ◴[07 Dec 25 16:53 UTC] No.46183086[source]▶

>>46182923 #

jmh is what I've always used for small benchmarks.

4. cogman10 ◴[07 Dec 25 17:46 UTC] No.46183506[source]▶

>>46182923 #

For production systems I use flight recordings (jfrs). To analyze I use java mission control.

For OOME problems I use a heap dump and eclipse memory analysis tool.

For microbenchmarks, I use JMH. But I tend to try and avoid doing those.

5. gavinray ◴[07 Dec 25 19:56 UTC] No.46184574[source]▶

>>46182923 #

async-profiler

6. malkia ◴[07 Dec 25 21:27 UTC] No.46185326[source]▶

>>46182774 (TP) #

Benchmarking requires a bit of different setup than the rest of the testing, especially if you want down to the ms timings.

We have continous benchmarking of one of our tools, it's written in C++, and to get "same" results everytime we launch it on the same machine. This is far from ideal, but otherwise there be either noisy neighbours, pesky host (if it's vm), etc. etc.

One idea that we thought was what if we can run the same test on the same machine several times, and check older/newer code (or ideally through switches), and this could work for some codepaths, but not for really continous checkins.

Just wondering what folks do. I can assume what, but there is always something hidden, not well known.

replies(2): >>46185438 #>>46197762 #

7. spockz ◴[07 Dec 25 21:28 UTC] No.46185332[source]▶

>>46182923 #

I use jmh for micro benchmarks on any code we know is sensitive and to highlight performance differences between different implementations. (Usually keep them around but not run on CI as an archive of what we tried.)

Then we do benchmarking of the whole Java app in the container running async-profiler into pyroscope. We created a test harness for this that spins up and mocks any dependencies based on api subscription data and contracts and simulates performance.

This whole mechanism is generalised and only requires teams that create individual apps to work with contract driven testing for the test harness to function. During and after a benchmark we also verify whether other non functionals still work as required, i.e. whether tracing is still linked to the right requests etc. This works for almost any language that we use.

8. spockz ◴[07 Dec 25 21:39 UTC] No.46185438[source]▶

>>46185326 #

I agree for measuring latency differences you want similar setups. However, by running two versions of the app concurrently on the same machine they both get impacted more or less the same by noisy neighbours. Moreover, by inspecting the flamegraph you can, manually, see these large shifts of time allocation quickly. For automatic comparison you can of course use the raw data.

In addition you can look at total cpu seconds used, memory allocation on kernel level, and specifically for the jvm at the GC metrics and allocation rate. If these numbers change significantly then you know you need to have a look.

We do run this benchmark comparison in most nightly builds and find regressions this way.

replies(1): >>46187256 #

9. malkia ◴[08 Dec 25 01:23 UTC] No.46187256{3}[source]▶

>>46185438 #

Good points there - Thanks @spockz!

10. ◴[08 Dec 25 04:17 UTC] No.46188235[source]▶

>>46182923 #

11. esafak ◴[08 Dec 25 21:16 UTC] No.46197762[source]▶

>>46185326 #

https://en.wikipedia.org/wiki/Hardware_performance_counter can help with noisy neighbors. I am still getting into this.

replies(1): >>46202031 #

12. spockz ◴[09 Dec 25 06:56 UTC] No.46202031{3}[source]▶

>>46197762 #

Yes, that can help with detecting how much cpu was actually used during the run. But it doesn’t influence benchmark results. Not sure how exactly to use it for doing subsequent runs and comparing final performance. Then this needs to be extrapolated to final performance in production.

replies(1): >>46202420 #

13. malkia ◴[09 Dec 25 07:58 UTC] No.46202420{4}[source]▶

>>46202031 #

Yeah, what you want to know is which change caused the slowdown, or maybe improved the performance and reasonable metric behind it (for example frame-rate for a game, or something like this).