Most active commenters
  • samwho(14)
  • didibus(5)
  • monkeyelite(5)
  • bonoboTP(4)
  • IceDane(3)
  • ndriscoll(3)
  • umanwizard(3)

←back to thread

688 points samwho | 49 comments | | HN request time: 1.77s | source | bottom
1. IceDane ◴[] No.45016742[source]
Every time I see an introduction to Big-O notation, I immediately go look for a section about the worst case to find the most common, fundamental misconception that is present in nearly every such text.

Lo and behold:

> Because it's common for an algorithm's performance to depend not just on the size of the input, but also its arrangement, big O notation always describes the worst-case scenario.

This is not true. I'm on my phone, so I can't be bothered to explain this in more detail, but there is plenty of information about this online.

replies(4): >>45016873 #>>45016884 #>>45016889 #>>45017375 #
2. lordleft ◴[] No.45016873[source]
Are you referring to the fact that Big O has notation and concepts revolving around best-case/average case scenarios, as well as worst-case scenarios?
replies(1): >>45017715 #
3. bonoboTP ◴[] No.45016884[source]
Yes. The notation can be used to describe how fast any function grows. That function may be a worst-case runtime wrt. input length, the average-case runtimve wrt. input length, or anything else, really.

But the most common usage is indeed worst-case analysis, especially in intro courses.

This is also wrong in the article:

> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).

It conflates two different, orthogonal concepts: upper vs lower bounds on the one hand, and best vs worst case analysis on the other.

The phrase "the best case is O(n)" already contradicts "big O notation always describes the worst-case scenario". They clearly use it for best case right there in the article.

replies(2): >>45016968 #>>45016973 #
4. samwho ◴[] No.45016889[source]
It’s okay, you’re not the first to point it out to me!

I’d be interested to hear how you would explain it when you’re at a more appropriate device.

replies(4): >>45017003 #>>45017060 #>>45017631 #>>45018258 #
5. samwho ◴[] No.45016968[source]
I’d love to hear how those two things differ, and how I could include it in the post in a way that won’t be too scary for a beginner to the topic.

That section in particular has already seen several revisions based on other peoples’ comments.

replies(1): >>45017154 #
6. ayhanfuat ◴[] No.45016973[source]
> But the most common usage is indeed worst-case analysis, especially in intro courses.

It is also the most sensible one. Best case version is basically useless, if you are lucky you may end up with O(1) but that doesn't tell you anything. Average case could be good but it requires you to estimate the distribution.

replies(2): >>45020678 #>>45026494 #
7. IceDane ◴[] No.45017003[source]
TLDR: Different contexts, different analyses. You can do a best-case, average-case and worst-case analysis. A linear search is O(1) in the best case, for example.

Maybe this comes across as nitpicky to some, but that doesn't make the text any less incorrect, unfortunately. It's an extremely common misconception, however, and I would say that a huge proportion of students that go through an introductory course on this subject come out thinking the same thing.

> It’s okay, you’re not the first to point it out to me!

It would be nice if you corrected it - there are already too many people making this mistake online.

replies(1): >>45017053 #
8. samwho ◴[] No.45017053{3}[source]
So your contention is with the word “always”? It doesn’t always mean worst case? I got told off by someone else for _not_ saying this.

I really just want to find the way of describing this that won’t net me comments like yours. It is very disheartening to spend so much time on something, and really try to do the topic justice, to be met with a torrent of “this is wrong, that’s wrong, this is also wrong.” Please remember I am a human being trying to do my best.

replies(4): >>45017196 #>>45017256 #>>45017266 #>>45017302 #
9. ndriscoll ◴[] No.45017060[source]
In order to write f(n) is O(g(n)), you already had to smuggle in the idea that you were talking worst case before even talking about O(*). What is f? Typically it is the max of "steps taken" over all input problems of size n. i.e. the worst case.

The O(g(n)) part says f is asymptotically bounded by g(n) up to some constant factor.

replies(1): >>45017124 #
10. samwho ◴[] No.45017124{3}[source]
This is how you would explain it?
replies(1): >>45017313 #
11. bonoboTP ◴[] No.45017154{3}[source]
Basically, for each input length, you can have different runtimes depending on the contents of that input. If you always pick the highest runtime for each length, that gives you a specific function. You can then derive either a lower or an upper bound for that function. Same for the best case.

You can state four combinations in their common sense intuition as

- even in the worst case, the runtime grows only at most as fast as n^2 (modulo multiplication and addition)

- in the worst case, the runtime can grow at least as fast as n^2

- in the best case, the runtime can grow only at most as fast as n^2

- even in the best case, the runtime grows at least as fast as n^2

One can imagine it as not just a single curve on a function plot, but a band, a range. Then the top line of this range can be bounded from above or below, and the bottom can also be bounded from above or below. The common case is when you give a bound for the top curve, from above.

How to work this into the article, I don't know, sorry.

replies(1): >>45017216 #
12. jibal ◴[] No.45017196{4}[source]
> I got told off by someone else for _not_ saying this.

And what made you think they were right?

> I really just want to find the way of describing this that won’t net me comments like yours.

Do research. There's a large literature on algorithmic complexity ... maybe start with https://en.wikipedia.org/wiki/Big_O_notation

> It is very disheartening to spend so much time on something, and really try to do the topic justice, to be met with a torrent of “this is wrong, that’s wrong, this is also wrong.” Please remember I am a human being trying to do my best.

non sequitur

replies(1): >>45017233 #
13. samwho ◴[] No.45017216{4}[source]
I appreciate you taking the time to explain this to me, thank you.

I feel like the way I have it is a reasonable, if strictly incorrect, simplification for someone new to the material.

When I write these, I have the newly-hired bootcamp grad as my target audience. Someone who has been taught the basics for how to write code for the web and work as a junior within a company, but has no computer science background. I want them to understand enough to make sense of what they see.

replies(1): >>45017465 #
14. samwho ◴[] No.45017233{5}[source]
Thank you.
15. blt ◴[] No.45017256{4}[source]
The definition of big O notation is pure math - there is nothing specific to analysis of algorithms.

For example: "the function x^2 is O(x^3)" is a valid sentence in big-O notation, and is true.

Big O is commonly used in other places besides analysis of algorithms, such as when truncating the higher-order terms in a Taylor series approximation.

Another example is in statistics and learning theory, where we see claims like "if we fit the model with N samples from the population, then the expected error is O(1/sqrt(N))." Notice the word expected - this is an average-case, not worst-case, analysis.

replies(1): >>45017680 #
16. IceDane ◴[] No.45017266{4}[source]
> So your contention is with the word “always”? It doesn’t always mean worst case? I got told off by someone else for _not_ saying this.

Using "always" is definitely wrong, but that isn't really what makes this wrong. BonoboTP caught another thing that I missed which tells me that you misunderstand some of these concepts on a somewhat fundamental level. Quote:

> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).

Big O, Ω, and Θ are ways to describe asymptotic bounds on runtime. You can apply them in best, average, or worst case analyses. For example, linear search is O(1), Ω(1), and Θ(1) in the best case but it is O(n), Ω(n), and Θ(n) in the worst case.

> I really just want to find the way of describing this that won’t net me comments like yours. It is very disheartening to spend so much time on something, and really try to do the topic justice, to be met with a torrent of “this is wrong, that’s wrong, this is also wrong.” Please remember I am a human being trying to do my best.

I'm sorry.. but what? You posted some learning material on the web which is wrong. I am just correcting you, and in a civil manner, mind you. I'm not exactly sure what you want me to do. Should we all just pretend that there aren't errors in what you posted so we don't hurt your feelings?

replies(1): >>45017356 #
17. bonoboTP ◴[] No.45017302{4}[source]
> It is very disheartening to spend so much time on something, and really try to do the topic justice, to be met with a torrent of “this is wrong, that’s wrong, this is also wrong.” Please remember I am a human being trying to do my best.

HN (and forums) can be blunt in this way (cf. Cunningham's Law). When you put something up, most of the reaction will be about how it's wrong. Things being correct is just the default, people won't comment on how correct something is. Try to take it not personally and more like a helpful piece of info. Ideally you'd want all statements to be correct. Textbook authors often incentivize hardcore nitpicking by mentioning the submitters in the acknowledgments or even cash. Try to take it in that manner.

replies(1): >>45017366 #
18. ndriscoll ◴[] No.45017313{4}[source]
If you're just trying to explain what f~O(g) means, then it makes sense to talk in terms of functions and asymptotes like you would in an intro to calculus, sure. Then, separately, computer scientists are interested in f(n) = max_{|S|=n} steps_required(program, S) where S is some input/problem type with some notion of size.

The fact that f looks innocuous but is secretly actually this complicated thing is perhaps the stumbling block for people. The actual O(*) part is just "line stays at or below other line once you go far enough out" (with a slight tweak to allow you to pre-stretch the bounding line).

replies(1): >>45017400 #
19. samwho ◴[] No.45017356{5}[source]
No, I'm grateful for the time you're spending time helping me understand. I'm struggling to come up with wording that satisfies everyone and venting some frustration at you. I'm sorry, that wasn't fair of me.

It's not clear to me based on what you're written what I should change. With the "always" wording changed, what else do I need to do for this to be correct? Ideally I want to avoid talking about asymptotic bounds.

20. samwho ◴[] No.45017366{5}[source]
That framing really helps, thank you. <3
21. tromp ◴[] No.45017375[source]
O() doesn't necessarily describe worst-case behaviour. It just provides an asymptotic upper bound. So a quadratic sorting algorithm can still be said to be O(n^3), although that might be a little misleading.
replies(2): >>45017466 #>>45022188 #
22. samwho ◴[] No.45017400{5}[source]
I'm aiming this at junior software engineers without math or computer science backgrounds. My own computer science background is very old at this point, and I never did have the math. So I'm not surprised I've made mistakes that are being pointed out by folks that live and breathe this stuff. I want to impart enough useful (if not strictly correct) information to folks to make a difference in their day jobs.
23. samwho ◴[] No.45017466[source]
I'd love to hear more about how a quadratic sorting algorithm could be said to be O(n^3). That isn't intuitive to me.
replies(2): >>45017698 #>>45017700 #
24. bonoboTP ◴[] No.45017465{5}[source]
Maybe like this:

"It's common for an algorithm's performance to depend not just on the size of the input, but also its arrangement. In such cases we typically use big O notation to describe the worst-case scenario. However, note that we may perform similar analysis on the best case or the average case as well."

But here:

> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).

This is plain false, the worst-case runtime of bubble sort is indeed Θ(n^2). Either delete it, or write about the upper bound / lower bound thing.

replies(1): >>45017565 #
25. samwho ◴[] No.45017565{6}[source]
I made some wording changes based on what you wrote, I'd appreciate your feedback.

I'm struggling to understand how the part about big theta is false. Is the best case for bubble sort not O(n)? I've deleted the section for the time being.

replies(1): >>45017728 #
26. umanwizard ◴[] No.45017631[source]
Big O notation can be used to describe any function, whether worst case runtime of an algorithm, average case runtime of an algorithm, the price of eggs in China as a function of time, or anything else.
27. samwho ◴[] No.45017680{5}[source]
I should have probably been a lot more clear about who the target audience of my post was.
28. ndriscoll ◴[] No.45017698{3}[source]
Because |n^2|≤|n^3| as n→∞, so if |f| ≤ A|n^2| as n→∞, then |f| ≤ A|n^3| as n→∞.
29. mfsch ◴[] No.45017700{3}[source]
Technically the big O notation denotes an upper bound, i.e. it doesn’t mean “grows as fast as” but “grows at most as fast as”. This means that any algorithm that’s O(n²) is also O(n³) and O(n⁴) etc., but we usually try to give the smallest power since that’s the most useful information. The letter used for “grows as fast as” is big Theta: https://en.wikipedia.org/wiki/Big_O_notation#Use_in_computer...
replies(1): >>45017800 #
30. umanwizard ◴[] No.45017715[source]
It is the same notation in either case. The worst-case runtime of quicksort is O(n^2). The average runtime is O(n * log(n)).
31. umanwizard ◴[] No.45017728{7}[source]
The difference between big theta and big O has nothing to do with worst vs. average case. They are two completely orthogonal axes.

You can talk about the big-O of the average case, the big-theta of the average case, the big-O of the worst case, or the big-theta of the worst case.

32. samwho ◴[] No.45017800{4}[source]
Ahh I see, thank you!
33. mminer237 ◴[] No.45018258[source]
I would just say "almost always" to cut off the pedants. You do use the same notation to describe the best-case and average-case as well; people just don't care about those as often.
replies(2): >>45018497 #>>45019948 #
34. samwho ◴[] No.45018497{3}[source]
I did push up a rewording that hopefully lands better. Fingers crossed.
35. jibal ◴[] No.45019948{3}[source]
Pejoratively referring to people as "pedants" simply because they know what they are talking about is quite the ad hominem fallacy.

As for what people care about, it depends on what they're doing and what they know about the input distribution. If you're concerned about DOS attacks or response time in a game, then worse case is of primary importance. But if you're paying for wall time then you measure a hash table lookup by its O(1) average time, not its O(n) worst case time.

36. tialaramex ◴[] No.45020678{3}[source]
You'd think, but one of the world's most popular C++ stdlib implementations (Clang's libcpp) shipped the actual Quicksort, which has O(N squared) worst case as its default sort until just a few years ago. Because the typical case for Quicksort is fast, so, most C++ programmers didn't notice.

And I really do mean just a few years. Like, Joe Biden was President.

37. didibus ◴[] No.45022188[source]
In math that’s true, but a practitioner software engineer uses it to mean the common "tightest" upper bound we can easily prove for the case we care about (worst, amortized, average, etc.), and not the actual tightest possible bound.
replies(2): >>45022710 #>>45038283 #
38. monkeyelite ◴[] No.45022710{3}[source]
It has a meaning already - and that is the sets of function which satisfy the condition. So O(n^2) is a subset of O(n^3)
replies(3): >>45025082 #>>45029524 #>>45029629 #
39. arethuza ◴[] No.45025082{4}[source]
Any everything is O(BB(n))
40. namibj ◴[] No.45026494{3}[source]
Or, instead of "estimate the distribution" look at what conceptual properties of the instance the big-O cares about.

And then derive big-O in terms of such properties.

See for example Tetris-LoadBalanced and Tetris-Reordered (on top of Tetris-LoadBalanced) for such works: https://arxiv.org/abs/1909.12102

41. didibus ◴[] No.45029524{4}[source]
> It has a meaning already

Natural language tends to work like that, many words take on different meaning in different context, and the dictionary often needs to add alternate definitions as language evolves.

If you come to a software engineering interview and take the mathematical meaning and answer O(n^3), you won't get the job. You need to know the linga franca of the trade, not just the textbooks.

I know this can be frustrating, I just wouldn't really recommend staking your ground on this mountain.

The other mistake, is to assume the interviewer doesn't know better, and you do, and try to explain their mistake. A lot of them know very well, you simply didn't understand correctly the ask because you got confused with the context of the problem statement, and it shows you didn't adapt to the working environment successfully.

And if you think that's bad, wait till you talk to product ;)

replies(1): >>45031477 #
42. didibus ◴[] No.45029629{4}[source]
And to add to my other comment, I don't think it came to mean that out of stupidity either.

As far as I know, and correct me if I'm wrong, there isn't a mathematical term for an approximate tight upper bound that's based off vague criteria of "fast enough".

In practice, people needed a way to say something like, let's not waste time getting all precise here, what's the approximate close-ish tight bound that's obvious, alright, that will work, let's use it.

Big Thetha implies proving the lower bound, and generally specifying the exact upper bound. So it was a worse term to use to describe the above, I think people settled on BigO, because you end up with some upper bound, and not necessarily the tightest one, but you lend on a tight-enough upper bound that it serves your practical purpose of choosing an algorithm to use in your application.

And so it became BigO, all for a lack of a better word, and eventually that became common parlance in those circles.

replies(1): >>45030944 #
43. monkeyelite ◴[] No.45030944{5}[source]
> there isn't a mathematical term for an approximate tight upper bound that's based off vague criteria of "fast enough"

In every mathematical writing they come up with definitions to precisely describe this kind of things - that’s what the big oh/theta aim to do - formalize what we mean by “bounded by in long term growth”.

replies(1): >>45042602 #
44. monkeyelite ◴[] No.45031477{5}[source]
You’re describing a situation of poor communication - not a fault for using the precise definition of Big oh.
replies(1): >>45042720 #
45. account42 ◴[] No.45038283{3}[source]
No, that would be \Omega(g(x))
46. didibus ◴[] No.45042602{6}[source]
I don’t think it’s really possible to give a precise mathematical definition of "good enough for my use case."

In practical software engineering, what is generally cared about is having a reasonable worst case upper bound. They don’t necessarily need the absolutely tightest bound, so you don’t need to spend extra effort chasing that. At the same time, they want to avoid bounds that are so loose they don’t give meaningful information.

That means the "upper bound" being asked for isn’t just about pure math. It requires applying practical judgment to the context we’re working in and the concerns we’re trying to address.

So yes, I’m asking for BigO, but with some flexibility. If you can provide the tightest bound, that’s great. If you can provide a tight enough bound that isn’t misleadingly loose, that also works.

A mathematician might cringe at words like "tight enough" and that’s fair. There’s no rigorous definition for it. But in the practical world where engineers are solving problems and users are paying for solutions, sometimes we need to rely on best judgment to land on a definition that serves the use case.

Because that's the prevalent need, BigO almost always implies the above in those context.

replies(1): >>45053282 #
47. didibus ◴[] No.45042720{6}[source]
You know the correct definition of BigO, which is great. Where things fell short was recognizing how industry jargon often stretches textbook terms. The skill is being able to adapt, realize when people are talking about a related but different concept, and engage on that level. The strongest candidates handle both the theory and the communication.

So I'd ding the candidate for poor communication.

replies(1): >>45047577 #
48. monkeyelite ◴[] No.45047577{7}[source]
You're not engaging with the content of my comment and larping as my hiring manager - dinged for poor communication.
49. monkeyelite ◴[] No.45053282{7}[source]
> I don’t think it’s really possible to give a precise mathematical definition

Have you studied math in an academic setting? The book “Art of Computer Programming” is all about formal definitions for understanding performance behavior.

> They don’t necessarily need the absolutely tightest bound,

Which is why I mention that big oh represents sets. But you dismissed that.