←back to thread

256 points BSDobelix | 1 comments | | HN request time: 0.457s | source
Show context
gausswho ◴[] No.42164371[source]
With this tool I am wary that I'll encounter system issues that are dramatically more difficult to diagnose and troubleshoot because I'll have drifted from a standard distro configuration. And in ways I'm unaware of. Is this a reasonable hesitation?
replies(6): >>42164481 #>>42164533 #>>42164535 #>>42164760 #>>42164990 #>>42168400 #
pbhjpbhj ◴[] No.42164481[source]
>"bpftune logs to syslog so /var/log/messages will contain details of any tuning carried out." (from OP GitHub readme)

The rmem example seems to allay fears that it will make changes one can't reverse.

replies(1): >>42164502 #
admax88qqq ◴[] No.42164502[source]
It’s not a questions of being able to reverse. It’s a question of being able to diagnose that one of these changes even was the problem and if so which one.
replies(3): >>42165004 #>>42166373 #>>42168709 #
nehal3m ◴[] No.42165004[source]
If they can be reversed individually you can simply deduce by rolling back changes one by one, no?
replies(2): >>42165128 #>>42165540 #
spenczar5 ◴[] No.42165540[source]
Suppose you run a fleet of a thousand machines. They all autotune. They are, lets say, serving cached video, or something.

You notice that your aggregate error rate been drifting upwards since using bpftune. It turns out, in reality, there is some complex interaction between the tuning and your routers, or your TOR switches, or whatever - there is feedback that causes oscillations in a tuned value, swinging between too high and too low.

Can you see how this is not a matter of simple deduction and rollbacks?

This scenario is plausible. Autotuning generally has issues with feedback, since the overall system lacks control theoretic structure. And the premise here is that you use this to tune a large number of machines where individual admin is infeasible.

replies(5): >>42166437 #>>42166446 #>>42166449 #>>42167131 #>>42167792 #
1. toast0 ◴[] No.42167792[source]
When you have a thousand machines, you can usually get feedback pretty quick, in my experience.

Run the tune on one machine. Looks good? Put it on ten. Looks good? Put it on one hundred. Looks good? Put it on everyone.

Find an issue a week later, and want to dig into it? Run 100 machines back on the old tune, and 100 machines with half the difference. See what happens.