Most active commenters
  • sgarland(5)
  • gerdesj(5)
  • pbhjpbhj(3)
  • fragmede(3)

256 points BSDobelix | 76 comments | | HN request time: 1.927s | source | bottom
1. bloopernova ◴[] No.42163717[source]
Fascinating!

I'd like to hear from people who are running this. Is it effective? Worth the setup time?

2. usr1106 ◴[] No.42163756[source]
Interesting. But if tuning parameters to their best values were easy, shouldn't the kernel just do that in the first place?
replies(5): >>42163778 #>>42163824 #>>42163880 #>>42164567 #>>42167444 #
3. nitinreddy88 ◴[] No.42163778[source]
It depends on workload. This tool generates recommended config for that specific machine workload. App Nodes can have completely different recommendations vs Database Nodes. It will be completely different for Workstation.
replies(1): >>42163814 #
4. usr1106 ◴[] No.42163814{3}[source]
Sure, but the kernel could just do the same. Of course the kernel is already too big. Is BPF the right level to make it more modular? Just thinking, I don't think I have the answer.
5. robinhoodexe ◴[] No.42163820[source]
Is tuning the TCP buffer size for instance worth it?
replies(6): >>42163899 #>>42166594 #>>42167580 #>>42167891 #>>42168594 #>>42169545 #
6. onetoo ◴[] No.42163824[source]
This doesn't necessarily find the best parameters, and it doesn't necessarily do it easily. From my reading, it will converge on a local optimum, and it may take some time to do that.

In theory, I don't see why the kernel couldn't have a parameter-auto-tune similar to this. In practice, I think the kernel has to work in so many different domains, it'd be impossible to land on a "globally good enough" set of tuning heuristics.

I'm far from a kernel developer, so I'm ready to be corrected here.

IMO if we ever see something like this deployed widely, it will be because a popular distribution decided to install it by default.

7. RandomThoughts3 ◴[] No.42163880[source]
I would reverse the question: if it can be done by a BPF module, why should it be in the kernel?

Distributions turning it on by default is another story. Maybe it deserves to be shiped on all the time but that's not the same thing as being part of the kernel.

replies(1): >>42164008 #
8. viraptor ◴[] No.42163899[source]
It depends. At home - probably not. On a fleet of 2000 machines where you want to keep network utilisation close to 100% with maximal throughput, and the non-optional settings translate to a not-trivial value in $ - yes.
replies(1): >>42164286 #
9. BSDobelix ◴[] No.42163953[source]
BTW one can use it out of the box with CachyOS.

After installation -> CachyOS Hello -> Apps/Tweaks

10. jiehong ◴[] No.42164008{3}[source]
Indeed!

The kernel might already be too monolithic.

This kernel parameters optimisation reminds me of PGO compilation in programs.

Yet, perhaps the kernel could come with multiple defaults config files, each being a good base for different workloads: server, embedded, laptop, mobile, database, router, etc.

replies(1): >>42166453 #
11. bastloing ◴[] No.42164013[source]
It's great how it grew out of simple packet filtering into tracing and monitoring. It's one of those great tools most should know. Been using it for years.
replies(1): >>42166435 #
12. gmuslera ◴[] No.42164065[source]
Two words: “feedback loop”.

That was the first idea that jumped in when thinking in what could go wrong, not because the Linux kernel, or BPF or this program, just for how it is intended to work. There might be no risk of that happening, there may be controls around that, or if they happen they might only converge to an estable state, but still it is something to have in the map.

replies(1): >>42164266 #
13. nevon ◴[] No.42164079[source]
I wonder how effective this would be in multi-tenant environments like shared k8s clusters. On the one hand, each application running will have a different purpose and will move around between nodes over time, but on the other hand there are likely broad similarities between most applications.
14. marcosdumay ◴[] No.42164266[source]
> or if they happen they might only converge to an stable state

That one will always be dependent on the usage patterns. So the auto-tuner can't guarantee it.

Also, I imagine the risk of the feedback turning positive is related to the system load (not CPU, but the usage of the resources you are optimizing). If so, it will make your computer less able to manage load. But this can still be useful for optimizing for latency.

15. londons_explore ◴[] No.42164286{3}[source]
TCP parameters are a classic example of where an autotuner might bite you in the ass...

Imagine your tuner keeps making the congestion control more aggressive, filling network links up to 99.99% to get more data through...

But then any other users of the network see super high latency and packet loss and fail because the tuner isn't aware of anything it isn't specifically measuring - and it's just been told to make this one application run as fast as possible.

replies(1): >>42167701 #
16. mrbluecoat ◴[] No.42164334[source]
> bpftune is designed to be zero configuration; there are no options

On behalf of every junior administrator, overworked IT admin, and security-concerned "cattle" wrangler, thank you.

Having to learn a thousand+ knobs & dials means most will never be touched. I for one welcome automated assistance in this area, even if the results are imperfect.

replies(2): >>42164571 #>>42171394 #
17. gausswho ◴[] No.42164371[source]
With this tool I am wary that I'll encounter system issues that are dramatically more difficult to diagnose and troubleshoot because I'll have drifted from a standard distro configuration. And in ways I'm unaware of. Is this a reasonable hesitation?
replies(6): >>42164481 #>>42164533 #>>42164535 #>>42164760 #>>42164990 #>>42168400 #
18. pbhjpbhj ◴[] No.42164481[source]
>"bpftune logs to syslog so /var/log/messages will contain details of any tuning carried out." (from OP GitHub readme)

The rmem example seems to allay fears that it will make changes one can't reverse.

replies(1): >>42164502 #
19. admax88qqq ◴[] No.42164502{3}[source]
It’s not a questions of being able to reverse. It’s a question of being able to diagnose that one of these changes even was the problem and if so which one.
replies(3): >>42165004 #>>42166373 #>>42168709 #
20. sgarland ◴[] No.42164533[source]
Yes, it is. IMO, except for learning (which should not be done in prod), you shouldn’t make changes that you don’t understand.

The tools seems to mostly tweak various networking settings. You could set up a test instance with monitoring, throw load at it, and change the parameters the tool modifies (one at a time!) to see how it reacts.

replies(2): >>42164601 #>>42170649 #
21. ◴[] No.42164535[source]
22. sgarland ◴[] No.42164567[source]
I’d rather the kernel present a good-enough but extremely stable set of configs. If I’m using a distro like Arch or Gentoo, then sure, maybe run wild (though both of those would probably assume I’m tuning them anyway), but CentOS, Debian, et al.? Stable and boring. If you change something, you’d better know what it is, and why you’re doing it.
replies(1): >>42170448 #
23. sgarland ◴[] No.42164571[source]
I think it’s still important to know what those dials and knobs do, otherwise (as the currently top-voted comment says) when things break, you’ll be lost.
replies(1): >>42167575 #
24. nine_k ◴[] No.42164601{3}[source]
I'd run such a tool on prod in "advice mode". It should suggest the tweaks, explaining the reasoning behind them, and listing the actions necessary to implement them.

Then humans would decide if they want to implement that as is, partly, modified, or not at all.

replies(2): >>42164659 #>>42166301 #
25. sgarland ◴[] No.42164659{4}[source]
Fair point, though I didn’t see any such option with this tool.
replies(1): >>42164688 #
26. nine_k ◴[] No.42164688{5}[source]
It's developed in the open; we can create Github issue.

Actually https://github.com/oracle/bpftune/issues/99

replies(1): >>42167417 #
27. trelliscoded ◴[] No.42164760[source]
If your staging doesn’t do capacity checks in excess of what production sees, yes.
28. Twirrim ◴[] No.42164990[source]
Disclaimer: I work for Oracle, who publish this tool, though I have nothing to do with the org or engineers that created it

I've been running this for a while on my laptop. So far yet to see any particular weirdness, but also I don't know that I can state with any confidence it has a positive impact either. I've not carried out any benchmarks in either direction.

It logs all changes that it's going to make including what they were on before. Here's an example from my logs:

    bpftune[1852994]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
    bpftune[1852994]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 7864320) -> (4096 131072 9830400)
29. nehal3m ◴[] No.42165004{4}[source]
If they can be reversed individually you can simply deduce by rolling back changes one by one, no?
replies(2): >>42165128 #>>42165540 #
30. jstanley ◴[] No.42165128{5}[source]
Only if you already suspect that this tool caused the problem.
31. klysm ◴[] No.42165315[source]
This seems to step into control theory, which I think is somewhat underapplied in software engineering.
32. spenczar5 ◴[] No.42165540{5}[source]
Suppose you run a fleet of a thousand machines. They all autotune. They are, lets say, serving cached video, or something.

You notice that your aggregate error rate been drifting upwards since using bpftune. It turns out, in reality, there is some complex interaction between the tuning and your routers, or your TOR switches, or whatever - there is feedback that causes oscillations in a tuned value, swinging between too high and too low.

Can you see how this is not a matter of simple deduction and rollbacks?

This scenario is plausible. Autotuning generally has issues with feedback, since the overall system lacks control theoretic structure. And the premise here is that you use this to tune a large number of machines where individual admin is infeasible.

replies(5): >>42166437 #>>42166446 #>>42166449 #>>42167131 #>>42167792 #
33. 6SixTy ◴[] No.42165865[source]
As an insane Gentoo user, sign me the F*** up
replies(1): >>42168446 #
34. bink ◴[] No.42166301{4}[source]
I agree. There's very little need to have a tool automatically changing complex kernel settings on the fly unless your infrastructure is undergoing dramatic changes in load and behavior on a daily basis, which seems unlikely for a modern server.
35. pbhjpbhj ◴[] No.42166373{4}[source]
I focused primarily on guesswho's "in ways I am unaware of".

Your issue appears to be true for any system change. Although, risk will of course vary.

36. orbital-decay ◴[] No.42166435[source]
The entire modern ML and most of the HPC grew out of very simple programmable shaders in GeForce 3.
37. pbhjpbhj ◴[] No.42166437{6}[source]
>not only can we observe the system and tune appropriately, we can also observe the effect of that tuning and re-tune if necessary. //

Does sound like a potential way to implement literal chaos.

Surely it's like anything else, you do pre-release testing and balance the benefits for you against the risks?

38. Modified3019 ◴[] No.42166446{6}[source]
Sounds like you have your answer of “don’t use it” then.
39. pstuart ◴[] No.42166449{6}[source]
In that scenario you could run it on a couple servers, compare and contrast, and then apply globally via whatever management tool you use.
40. 6SixTy ◴[] No.42166453{4}[source]
I second the different profiles for server, laptop, and so on. Though I know the kernel already comes with default configs, so I think there could be room for specialized kernel config options in addition to what's already there.

Though in my opinion, there's already kind of too much segmentation between the different use cases. A server is just a role for a computer, and embedded could literally mean anything. Quite a few WiFi access points come with a USB port on them so you can plug in a USB drive and start a SMB server.

41. crest ◴[] No.42166594[source]
It depends mostly on the bandwidth-delay-product and packet loss you expect on each connection. A there is a vast difference between a local interactive SSH session and downloading a large VM image from across an ocean.
42. KennyBlanken ◴[] No.42167131{6}[source]
Presumably one would use autotune to find optimized parameters, and then roll those out via change control, either one parameter at a time, or a mix of parameters across the systems.

Alternatively: if you have a fleet of thousands of machines you can very easily do a binary search with them to a)establish the problem with the auto-tuner and then b)which of the changes it settled on are causing your problems.

I get the impression you've never actually managed a "fleet" of systems, because these techniques would have immediately occurred to you.

replies(1): >>42167272 #
43. spenczar5 ◴[] No.42167272{7}[source]
Certainly when we managed Twitch’s ~10,000 boxes of video servers, neither of the tasks you describe would have been simple. We underinvested in tools, for sure. Even so, I don’t think you can really argue that dynamically changing configs like this are going to make life easier!
44. westurner ◴[] No.42167417{6}[source]
In the existing issue, we can link to the code and docs that would need to be understood and changed:

usage, main() https://github.com/oracle/bpftune/blob/6a50f5ff619caeea6f04d...

- [ ] CLI opts: --pretend-allow <tuner> or --log-only-allow <tuner> or [...]

Probably relevant function headers in libbpftune.c:

bpftune_sysctl_write(

bpftuner_tunable_sysctl_write(

bpftune_module_load(

static void bpftuner_scenario_log(struct bpftuner *tuner, unsigned int tunable, ; https://github.com/oracle/bpftune/blob/6a50f5ff619caeea6f04d... https://github.com/oracle/bpftune/blob/6a50f5ff619caeea6f04d...

replies(1): >>42174416 #
45. LinuxBender ◴[] No.42167444[source]
HP-UX and to a lesser degree AIX had some of this. Instead of just static tunable values there were some tunable formulas exposed to the admin. If one knew what they were doing they could change the formula as needed without having to recompile the kernel. I would not mind something like this being exposed from the Linux kernel as tunable options as opposed to just static values. Such options could reduce the number of companies that need to recompile the kernel.
46. zymhan ◴[] No.42167575{3}[source]
That's exactly why it's such a burden.
replies(1): >>42167967 #
47. zymhan ◴[] No.42167580[source]
There's not much cost to doing it, so yes.
48. withinboredom ◴[] No.42167701{4}[source]
It literally covers this exact scenario in the readme and explains how it prevents that.
49. toast0 ◴[] No.42167792{6}[source]
When you have a thousand machines, you can usually get feedback pretty quick, in my experience.

Run the tune on one machine. Looks good? Put it on ten. Looks good? Put it on one hundred. Looks good? Put it on everyone.

Find an issue a week later, and want to dig into it? Run 100 machines back on the old tune, and 100 machines with half the difference. See what happens.

50. toast0 ◴[] No.42167891[source]
In my experience running big servers, tuning TCP buffers is definitely worth it, because different kinds of servers have different needs. It doesn't often work miracles, but tuning buffers is low cost, so the potential for a small positive impact is often worth the time to try.

If your servers communicate at high datarates with a handful of other servers, some of which are far away, but all of which have large potential throughput, you want big buffers. Big buffers allow you to have a large amount of data in flight to remote systems, which lets you maintain throughput regardless of where your servers are. You'd know to look at making buffers bigger if your throughput to far away servers is poor.

If you're providing large numbers of large downloads to public clients that are worldwide from servers in the US only, you probably want smaller buffers. Larger buffers would help with throughput to far away clients, but slow, far away clients will use a lot of buffer space and limit your concurrency. Clients that disappear mid download will tie up buffers until the connection is torn down and it's nice if that's less memory for each instance. You'd know to look at making buffers smaller if you're using more memory than you think is appropriate for network buffers... a prereq is monitoring memory use by type.

If you're serving dynamic web pages, you want your tcp buffers to be at least as big as your largest page, so that your dynamic generation never has to block for a slow client. You'd know to look at this if you see a lot of servers blocked on sending to clients, and/or if you see divergent server measured response times for things that should be consistent. This is one case where getting buffer sizes right can enable miracles; Apache pre-fork+mod_PHP can scale amazingly well or amazingly poorly; it scales well when you can use an accept filter so apache doesn't get a socket until the request is ready to be read, and PHP/apache can send the whole response to the tcp buffer without waiting, then closes the socket; letting the kernel deal with it from there. Keep-alive and TLS make this a bit harder, but the basic idea of having enough room to buffer the whole page still fits.

51. sgarland ◴[] No.42167967{4}[source]
I continue to be amazed and frustrated that people will simultaneously believe that deeply understanding a programming language is a noble pursuit, but that deeply understanding the software that allows their code to run is somehow burdensome.

Ops remains extremely important and extremely real. You can abstract it away in exchange for higher costs and less control, but ultimately someone at some level has studied these parameters, and decided what is best for your code.

52. gerdesj ◴[] No.42168400[source]
"because I'll have drifted from a standard distro configuration"

You will obviously have a change management system which describes all the changes you have made to your putative standard distro configs. You will also be monitoring those changes.

This tool logs all the changes it makes via the standard logging system, which can be easily captured, shipped and aggregated and then queried and reported on.

This is not a tool from Clown Cars R US, it's from a reasonably reputable source - Oracle (lol etc). Even better, you can read the code and learn or critique.

Not being funny but I'd rather this sort of thing by far than any amount of wooo handwavy wankery. Would you prefer openly described and documented or "take our word for it"?

replies(2): >>42168447 #>>42171423 #
53. gerdesj ◴[] No.42168446[source]
You are not insane. You are running a distro that enables you to patch both the distro package definitions and the source code that is compiled by those definitions. Then there is the day to day stuff such as USE ...

I have a VM that was rather unloved that I ended up using git to go back in time to gradually move it forwards in time and update. It took quite a while. It started off life in the noughties and now sports a 6.10 kernel. I won't bore you further but it did take a while but the data is still there and accessible on a modern platform.

What is mad about that?

replies(1): >>42168619 #
54. cortesoft ◴[] No.42168447{3}[source]
> You will obviously have a change management system which describes all the changes you have made to your putative standard distro configs. You will also be monitoring those changes.

Which is now a list you will have to check for every issue. I don't think they are complaining they don't trust the writers of the code, just that it adds confounding variables to your system

replies(2): >>42168508 #>>42168577 #
55. gerdesj ◴[] No.42168508{4}[source]
We (in IT security) are expected to abrogate responsibility to funky AI or whatevs anti virus and other stuff. Buy and install a security package from ... whoever ... and all will be well.

This is an expert system/advice run by real people (at a reasonably well respected firm) not an AI wankery thingie. It is literally expert advice and it is being given away and in code form which you can read.

What on earth is wrong with that?

replies(3): >>42168546 #>>42170490 #>>42171201 #
56. cortesoft ◴[] No.42168546{5}[source]
If the alternative is those proprietary anti virus products, sure this is better.

The original comment was comparing to doing nothing and just using the standard distro, I believe.

replies(1): >>42168654 #
57. fragmede ◴[] No.42168577{4}[source]
> for every issue.

Only if you don't know what you're doing, which, with no judgement whatsoever, might be true for OP. Reading the source, it affects some networking related flags. If the local audio craps out, it's not related. If the Bluetooth keyboard craps out, it's not related. If the hard drive crashes, it's not related.

I get that is just adding more variables to the system, but this isn't Windows, where the changes under the hood are this mystery hotfix that got applied and we have no idea what it did and the vendor notes raise more questions than it asks and your computer working feels like this house of cards that's gonna fall over if you look at it funny. If the system is acting funny, just disable this, reset them all back to default, possibly by rebooting, and see if the problem persists. If you're technical enough to install this, I don't think disabling it and rebooting is beyond your abilities.

58. fragmede ◴[] No.42168594[source]
If you're on 10 or 100 gig, it's almost required to get close to line speed performance.
59. fragmede ◴[] No.42168619{3}[source]
Back in the day when Gentoo was in favor, USE flags and recompiling the world could make a huge difference in performance on much slower computers with far more limited RAM, and you cloths really see the difference. After waiting week for the world to recompile, that it. These days, unless you're, a hyperscaler and need to eke that last 10% of performance our of your hardware, ain't nobody got time for that.

    emerge --update --deep --with-bdeps=y --newuse @world
replies(1): >>42168699 #
60. hrdwdmrbl ◴[] No.42168645[source]
What is the magnitude of the effect size? The readme provides no information about what kind of results should be expected or might result.
61. gerdesj ◴[] No.42168654{6}[source]
I do hope I haven't offended anyone but I also hope I will leave no one in any doubt that IT security is important.

The world is now very highly interconnected. When I was a child, I would have rubbish conversations over the blower to an aunt in Australia - that latency was well over one second - satellite links. Nowadays we have direct fibre connections.

So, does you does ?

62. gerdesj ◴[] No.42168699{4}[source]
I was a Gentoo aficionado for quite a while (20+ years). I'm going back too (rather slowly)

There is a lot of very decent technology in Gentoo - it literally describes what all other distros need to do, in some cases.

You mention a week of compilation - I remember the gcc 4 thing too!

63. yourapostasy ◴[] No.42168709{4}[source]
Record changes in git and then git bisect issues, maybe?

Without change capture, solid regression testing, or observability, it seems difficult to manage these changes. I’d like to how others are managing these kinds of changes to readily troubleshoot them, without lots of regression testing or observability, if anyone has successes to share.

64. arminiusreturns ◴[] No.42169545[source]
Worked with Weka to maximize NFSv4/mellanox throughput, and it was absolutely required to get targets.
65. withinboredom ◴[] No.42170448{3}[source]
These parameters affect things at the hardware level, and in the kernel. There’s “good enough” defaults already, but if you want to take full advantage of your specific hardware, you need to tune it. This is especially important the more different your hardware is from kernel developers’ computers (like 10g nics).
66. tw04 ◴[] No.42170490{5}[source]
Well two points: Oracle isn’t really and hasn’t really been respected in several decades.

Oracle exists for one sole purpose, which is to make Larry money. Anything they “give away for free” almost always includes a non-obvious catch which you only discover during some future audit.

In this case it appears to be gpl and thus most likely harmless. But I’d expect either the license to change once people are hooked, or some glaring lack of functionality that’s not immediately obvious, that can only be remediated by purchasing a license of some sort.

replies(1): >>42171582 #
67. zorked ◴[] No.42170649{3}[source]
Learning and self-tuning happens everywhere. Cache sizes that adjust to load, pools that grow and shrink, etc. This is just adding autotuning to something that doesn't have it. With presumably better algorithms than the rules-of-thumb approach that is more common.

It's a great idea.

68. lupusreal ◴[] No.42171201{5}[source]
Being able to dodge responsibility for something going wrong is great, but it's always better for you if you aren't in a position where you have to dodge responsibility in the first place.

So if this tool makes a well reasoned and ostensibly sensible tweak which happens to expose some flaw in your system and takes it down, being able to say "those experts Oracle made the mistake, not me" might get you out of the hot seat. But it's better to never be in the hot seat.

69. xorcist ◴[] No.42171394[source]
Knobs exist for a reason. If there was no reason for it it shouldn't exist. Turning knobs automatically is one of two things: It's either an awfully bad idea, or should be turned into an upstream patch. Speaking from practice, it's usually the former.
70. xorcist ◴[] No.42171423{3}[source]
> change management system

That's an unusual way to spell "git". But that's not the point. The point is that change management is useless unless you know why you are doing it. That's why all commit messages should contain the "why".

> You will also be monitoring those changes

What you should do is monitor and evaluate the findings, then on taking the decision this is actually what you want, commit it and stage it through the testing environments.

Automatically tuning parameters means diagnosing problems will be even harder than today. There is also the risk of diverging test and prod unless you are being careful. You really have to know what you are doing when deploying these tools.

The worst catastrophes I've seen involves automatically scaling ram/disk/pods. A problem that should have been trivial in the first place can quickly set off feedback loops.

replies(1): >>42171746 #
71. ◴[] No.42171561[source]
72. efitz ◴[] No.42171582{6}[source]
Anything that Oracle gives away for free today should be assumed will be converted to be monetized as soon as there is sufficient uptake in usage.

Anything that Oracle makes available for community contributions should be assumed will be dramatically restricted via license when Oracle figures out how to monetize it.

replies(1): >>42178647 #
73. notpushkin ◴[] No.42171746{4}[source]
This could be a nice starting point for such a system though. Is there a logging-only mode?
74. westurner ◴[] No.42174416{7}[source]
Here's that, though this is dnvted to -1? https://github.com/oracle/bpftune/issues/99#issuecomment-248...

Ideally this tool could passively monitor and recommend instead of changing settings in production which could lead to loss of availability by feedback failure; -R / --rollback actively changes settings, which could be queued or logged as idk json or json-ld messages.

75. ranger_danger ◴[] No.42178647{7}[source]
FOSS can always be forked and progressed from there; see MariaDB.

I think most times when a project from a big company goes closed, the features added afterwards usually only benefit other big companies anyways.

Right now I prefer to be happy they ever bothered at all (to make open source things), rather than prematurely discount it entirely.

Maybe you weren't implying that it should be discounted entirely, but I bet a lot of people were thinking that.