The rmem example seems to allay fears that it will make changes one can't reverse.
You notice that your aggregate error rate been drifting upwards since using bpftune. It turns out, in reality, there is some complex interaction between the tuning and your routers, or your TOR switches, or whatever - there is feedback that causes oscillations in a tuned value, swinging between too high and too low.
Can you see how this is not a matter of simple deduction and rollbacks?
This scenario is plausible. Autotuning generally has issues with feedback, since the overall system lacks control theoretic structure. And the premise here is that you use this to tune a large number of machines where individual admin is infeasible.
Does sound like a potential way to implement literal chaos.
Surely it's like anything else, you do pre-release testing and balance the benefits for you against the risks?
Alternatively: if you have a fleet of thousands of machines you can very easily do a binary search with them to a)establish the problem with the auto-tuner and then b)which of the changes it settled on are causing your problems.
I get the impression you've never actually managed a "fleet" of systems, because these techniques would have immediately occurred to you.
Run the tune on one machine. Looks good? Put it on ten. Looks good? Put it on one hundred. Looks good? Put it on everyone.
Find an issue a week later, and want to dig into it? Run 100 machines back on the old tune, and 100 machines with half the difference. See what happens.