The tools seems to mostly tweak various networking settings. You could set up a test instance with monitoring, throw load at it, and change the parameters the tool modifies (one at a time!) to see how it reacts.
Then humans would decide if they want to implement that as is, partly, modified, or not at all.
I've been running this for a while on my laptop. So far yet to see any particular weirdness, but also I don't know that I can state with any confidence it has a positive impact either. I've not carried out any benchmarks in either direction.
It logs all changes that it's going to make including what they were on before. Here's an example from my logs:
bpftune[1852994]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
bpftune[1852994]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 7864320) -> (4096 131072 9830400)
You notice that your aggregate error rate been drifting upwards since using bpftune. It turns out, in reality, there is some complex interaction between the tuning and your routers, or your TOR switches, or whatever - there is feedback that causes oscillations in a tuned value, swinging between too high and too low.
Can you see how this is not a matter of simple deduction and rollbacks?
This scenario is plausible. Autotuning generally has issues with feedback, since the overall system lacks control theoretic structure. And the premise here is that you use this to tune a large number of machines where individual admin is infeasible.
Does sound like a potential way to implement literal chaos.
Surely it's like anything else, you do pre-release testing and balance the benefits for you against the risks?
Alternatively: if you have a fleet of thousands of machines you can very easily do a binary search with them to a)establish the problem with the auto-tuner and then b)which of the changes it settled on are causing your problems.
I get the impression you've never actually managed a "fleet" of systems, because these techniques would have immediately occurred to you.
usage, main() https://github.com/oracle/bpftune/blob/6a50f5ff619caeea6f04d...
- [ ] CLI opts: --pretend-allow <tuner> or --log-only-allow <tuner> or [...]
Probably relevant function headers in libbpftune.c:
bpftune_sysctl_write(
bpftuner_tunable_sysctl_write(
bpftune_module_load(
static void bpftuner_scenario_log(struct bpftuner *tuner, unsigned int tunable, ; https://github.com/oracle/bpftune/blob/6a50f5ff619caeea6f04d... https://github.com/oracle/bpftune/blob/6a50f5ff619caeea6f04d...
Run the tune on one machine. Looks good? Put it on ten. Looks good? Put it on one hundred. Looks good? Put it on everyone.
Find an issue a week later, and want to dig into it? Run 100 machines back on the old tune, and 100 machines with half the difference. See what happens.
You will obviously have a change management system which describes all the changes you have made to your putative standard distro configs. You will also be monitoring those changes.
This tool logs all the changes it makes via the standard logging system, which can be easily captured, shipped and aggregated and then queried and reported on.
This is not a tool from Clown Cars R US, it's from a reasonably reputable source - Oracle (lol etc). Even better, you can read the code and learn or critique.
Not being funny but I'd rather this sort of thing by far than any amount of wooo handwavy wankery. Would you prefer openly described and documented or "take our word for it"?
Which is now a list you will have to check for every issue. I don't think they are complaining they don't trust the writers of the code, just that it adds confounding variables to your system
This is an expert system/advice run by real people (at a reasonably well respected firm) not an AI wankery thingie. It is literally expert advice and it is being given away and in code form which you can read.
What on earth is wrong with that?
Only if you don't know what you're doing, which, with no judgement whatsoever, might be true for OP. Reading the source, it affects some networking related flags. If the local audio craps out, it's not related. If the Bluetooth keyboard craps out, it's not related. If the hard drive crashes, it's not related.
I get that is just adding more variables to the system, but this isn't Windows, where the changes under the hood are this mystery hotfix that got applied and we have no idea what it did and the vendor notes raise more questions than it asks and your computer working feels like this house of cards that's gonna fall over if you look at it funny. If the system is acting funny, just disable this, reset them all back to default, possibly by rebooting, and see if the problem persists. If you're technical enough to install this, I don't think disabling it and rebooting is beyond your abilities.
The world is now very highly interconnected. When I was a child, I would have rubbish conversations over the blower to an aunt in Australia - that latency was well over one second - satellite links. Nowadays we have direct fibre connections.
So, does you does ?
Without change capture, solid regression testing, or observability, it seems difficult to manage these changes. I’d like to how others are managing these kinds of changes to readily troubleshoot them, without lots of regression testing or observability, if anyone has successes to share.
Oracle exists for one sole purpose, which is to make Larry money. Anything they “give away for free” almost always includes a non-obvious catch which you only discover during some future audit.
In this case it appears to be gpl and thus most likely harmless. But I’d expect either the license to change once people are hooked, or some glaring lack of functionality that’s not immediately obvious, that can only be remediated by purchasing a license of some sort.
It's a great idea.
So if this tool makes a well reasoned and ostensibly sensible tweak which happens to expose some flaw in your system and takes it down, being able to say "those experts Oracle made the mistake, not me" might get you out of the hot seat. But it's better to never be in the hot seat.
That's an unusual way to spell "git". But that's not the point. The point is that change management is useless unless you know why you are doing it. That's why all commit messages should contain the "why".
> You will also be monitoring those changes
What you should do is monitor and evaluate the findings, then on taking the decision this is actually what you want, commit it and stage it through the testing environments.
Automatically tuning parameters means diagnosing problems will be even harder than today. There is also the risk of diverging test and prod unless you are being careful. You really have to know what you are doing when deploying these tools.
The worst catastrophes I've seen involves automatically scaling ram/disk/pods. A problem that should have been trivial in the first place can quickly set off feedback loops.
Anything that Oracle makes available for community contributions should be assumed will be dramatically restricted via license when Oracle figures out how to monetize it.
Ideally this tool could passively monitor and recommend instead of changing settings in production which could lead to loss of availability by feedback failure; -R / --rollback actively changes settings, which could be queued or logged as idk json or json-ld messages.
I think most times when a project from a big company goes closed, the features added afterwards usually only benefit other big companies anyways.
Right now I prefer to be happy they ever bothered at all (to make open source things), rather than prematurely discount it entirely.
Maybe you weren't implying that it should be discounted entirely, but I bet a lot of people were thinking that.