Analytic Combinatorics – A Worked Example

1. antics ◴[08 Apr 25 18:09 UTC] No.43624766[source]▶

This is only tangentially related, but one thing that I am still kind of hopeful about is analytic combinatorics will free us from the tyranny of asymptotic analysis.

I suppose this is kind of a hot take, and in some ways less relevant than it was (say) 10 years ago, but I believe the theoretical CS community has spent a lot of time analyzing the performance of various algorithms over the years and kind of pointedly not inventing algorithms which accomplish some result at all, even if it is only known to be computable in a pathologically inefficient way. This means a lot of breakthroughs that could have been TCS breakthroughs have been invented, usually worse, by other communities.

I am not the only one who thinks so, here[1] is an obscure talk where Michael Mitzenmacher (a well-known CS theory prof at Harvard) makes more or less this specific complaint.

One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.

With that all said I'm open to the argument that TCS has fundamentally changed and this is no longer such a big issue. But perusing the various TCS conferences I sense it isn't.

[1]: https://www.youtube.com/watch?v=p3anjSAnKBo

[2]: https://algo.inria.fr/flajolet/Publications/book.pdf

replies(8): >>43625074 #>>43625202 #>>43625358 #>>43625382 #>>43626260 #>>43627410 #>>43628547 #>>43637318 #

2. krick ◴[08 Apr 25 18:34 UTC] No.43625074[source]▶

>>43624766 (TP) #

> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.

My understanding is that it's pretty much how it started, but proved to be impractical. Meaning, we absolutely can quantify the work any algorithm will take, but for that we need to define the architecture first, i.e., to specify what is the unit of work, what are operations available to us. And, sure, we can do that for any given task (and we always could), but it's usually too messy to include that in general discussions about algorithms, not specifically concerned with the architecture.

replies(1): >>43627153 #

3. qsort ◴[08 Apr 25 18:46 UTC] No.43625202[source]▶

>>43624766 (TP) #

I'm not a researcher, so start with the default assumption that I don't know what I'm talking about, but I had the same impression when studying those topics, asymptotic analysis is kind of broken.

For example, we assume that arbitrary arithmetic takes constant time. Good. Except that if you take that seriously, then P=PSPACE. So we might say that bit operations on fixed registers are constant time. But nobody does that because then arithmetic is O(lg n) and it's too much trouble to keep track of everything (and if you do, then hashtables aren't O(1) any longer!) If we are more sophisticated we might take a transidchotomous model, but that's really weird and we have results like sorting in less than n lg n time. Not to mention that the mathematical foundations get a bit shaky with more than one variable.

I think the hard problem underneath is that it's hard to define a computational model that (a) can be reasoned about and isn't extremely weird, (b) has solid mathematical foundations, (c) is reasonably predictive of real world performance, and (d) is not linked to a specific architecture. Asymptotic analysis is kind of a compromise. It can be formulated in very elementary terms, it makes mathematical sense and if you plot the performance of a sorting algorithm on a chart it kind of looks like the graph of n log n if you squint a bit.

Knuth does something similar to what you're suggesting IIRC, albeit without the analytic combinatorics machinery. For some programs he derives a precise expression as a function of the size of input that describes how many operations of each kind the underlying machine will perform. In numerical analysis there is a similar concept, e.g. evaluating a polynomial with Horner has the same asymptotic complexity as the naive method, but we say it's better because it takes exactly the optimal amount of floating point operations (and, in practice, because it uses fused multiply-adds, but that's another story...)

A research program along those lines would be certainly fascinating, but the combination of "needs to be reasonably predictive" and "needs to be portable" is a tough nut to crack.

replies(2): >>43627164 #>>43636712 #

4. PollardsRho ◴[08 Apr 25 19:01 UTC] No.43625358[source]▶

>>43624766 (TP) #

I don't think it's controversial to say that asymptotic analysis has flaws: the conclusions you draw from it only hold in the limit of larger inputs, and sometims "larger" means "larger than anything you'd be able to run it on." Perhaps as Moore's law dies we'll be increasingly able to talk more about specific problem sizes in a way that won't become obsolete immediately.

I suppose my question is why you think TCS people would do this analysis and development better than non-TCS people. Once you leave the warm cocoon of big-O, the actual practical value of an algorithm depends hugely on specific hardware details. Similarly, once you stop dealing with worst-case or naive average-case complexity, you have to try and define a data distribution relevant for specific real-world problems. My (relatively uninformed) sense is that the skill set required to, say, implement transformer attention customizing to the specific hierarchical memory layout of NVIDIA datacenter GPUs, or evaluate evolutionary optimization algorithms on a specific real-world problem domain, isn't necessarily something you gain in TCS itself.

When you can connect theory to the real world, it's fantastic, but my sense is that such connections are often desired and rarely found. At the very least, I'd expect that to often be a response to applied CS and not coming first from TCS: it's observed empirically that the simplex algorithm works well in practice, and then that encourages people to revisit the asymptotic analysis and refine it. I'd worry that TCS work trying to project onto applications from the blackboard would lead to less rigorous presentations and a lot of work that's only good on paper.

replies(2): >>43625873 #>>43627220 #

5. openasocket ◴[08 Apr 25 19:03 UTC] No.43625382[source]▶

>>43624766 (TP) #

> One of the nice things about Flajolet/Sedgewick[2] (mentioned in this post) is that it gives a nice set of combinators for (at least it seems to me, in theory) precisely enumerating how many "steps" an algorithm would take given some input. I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take. And that this in turn would free TCS to focus on things that are more germane to the field as a whole.

Check out the Art of Computer Programming, by Knuth. He writes all of the algorithms in a hypothetical machine code called MIX (MMIX in later editions). He does go through algorithmic analysis to the degree you describe, showing exactly how many cycles a particular routine may take to complete, based on different variables. This is actually fairly complex even for simple programs. Consider the simple case of computing the maximum in a list:

    let current_max := 0
    for x in array_of_numbers {
        if x > current_max {
            current_max = x;
        }
    }
    return current_max;

Anyone can tell you this runs in O(N) time. But exactly how many cycles would this take? Ignoring the architectural details, you would need to know how often the condition "x > current_max" is true. If the list of numbers is in ascending order, this will be true every time, if it is in descending order it will only be true once. The difference in the number of cycles between those two cases is significant. I believe Knuth works through this example, and shows that, for a randomly-ordered array, this would be true around log_2(n) times.

replies(2): >>43627140 #>>43628868 #

6. LegionMammal978 ◴[08 Apr 25 20:01 UTC] No.43625873[source]▶

>>43625358 #

Average-case complexity can be a fickle beast. As you say, the simplex algorithm for LP is great in practice, so it's rarely problematic to use. But meanwhile, people also say, "Modern SAT/SMT solvers are great, they can solve huge problems!" Yet when I try using one, I usually run into exponential runtimes very quickly, especially for unsatisfiable instances.

Overall, it's annoying to tell whether an NP-hard problem is always really hard, or if ~all practical instances can be solved with a clever heuristic. It doesn't help that most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.

replies(2): >>43626152 #>>43628849 #

7. aleph_minus_one ◴[08 Apr 25 20:31 UTC] No.43626152{3}[source]▶

>>43625873 #

> most problems receive little attention (e.g., to find solvable special cases) after being labeled NP-hard.

Since I know quite some people who exactly work on this kind of problem of finding special classes that can be solved in polynomial time, my impression is of course different.

But I think it can be said that if some NP-hard problem is very important in practice and there is no easy way to to get around this problem (i.e. it will also be practically relevant in, say, 15 years), the problem will for sure get a lot more attention.

This is also the reason why some NP-hard problems are much more researched than others.

replies(1): >>43626541 #

8. thomasahle ◴[08 Apr 25 20:44 UTC] No.43626260[source]▶

>>43624766 (TP) #

> I optimistically imagine a world where we replace the relatively non-specific asymptotic bounds of an algorithm with a very specific quantification of the work an algorithm will take.

I think you're getting this backwards. The (good) point you and Mitzenmacher are making is that our analysis standards are _too strict_, and hence we avoid good algorithms that are just too hard to analyze.

If we instead require exact combinatorics, e.g., using Flajolet/Sedgewick, that will be _even harder_ than if we _just_ try to do asymptotic analysis. Remember that asymptotics are a way to _approximate_ the exact number of steps. It's supposed to make it _easier_ for you.

replies(1): >>43627176 #

9. LegionMammal978 ◴[08 Apr 25 21:14 UTC] No.43626541{4}[source]▶

>>43626152 #

Yeah, perhaps I was a bit unfair, it's just that the problems that have gotten good results never seem to be the ones I need! C'est la vie, I suppose. (In particular, I've been working with recognizing small formal languages from samples, which has usually NP-hard, but has a surprising number of solvable cases. But my impression is that most modern work has gone into various forms of probabilistic grammars, which aren't really what I'm looking for.)

Sometimes it's also helpful to look into approximation algorithms, e.g., a good LLL implementation can work wonders for certain problems. But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.

replies(1): >>43626692 #

10. aleph_minus_one ◴[08 Apr 25 21:34 UTC] No.43626692{5}[source]▶

>>43626541 #

> But heaven forbid someone obtains an inapproximability result for your problem, then you're really in trouble.

This is not (necessarily) true:

For example, there exists a great approximation algorithm (Goemans-Williamson algorithm) for MAXCUT in graphs with non-negative edge weights.

On the other hand, when negative weights do occur, one can show (unless P=NP) that there exists no polynomial-time approximation algorithm for which the approximation guarantee is a constant factor times the optimal solution.

But since the Goemans-Williamson algorithm is a great algorithm (if the Unique Games Conjecture holds, and P != NP, it has the best approximation guarantee that any approximation algorithm for MAXCUT with non-negative weights can get in polynomial time) nobody "forbids" you to use it in situations where also negative edge weights can occur. You will loose the approximation goodness guarentee, but in practice, this algorithm nevertheless gives good results in this situation, just not certified good results.

11. antics ◴[08 Apr 25 22:34 UTC] No.43627140[source]▶

>>43625382 #

I actually have read a chunk of the relevant TAOCP, but didn't mention it because the comment was already kind of long. I remarked to a friend the other day that it's kind of funny that we've a little bit come full circle on asymptotic analysis in that we are now counting FLOPs, and it is not a little bit like what Knuth does here. :)

replies(1): >>43663155 #

12. antics ◴[08 Apr 25 22:37 UTC] No.43627153[source]▶

>>43625074 #

Yep, That's exactly why we moved to asymptotic analysis. I was just hoping analytic combinatorics would help us move back to something like this.

13. antics ◴[08 Apr 25 22:39 UTC] No.43627164[source]▶

>>43625202 #

Yep, I mentioned this elsewhere in the threads—Knuth does do this quite a bit in TAOCP. I think in a lot of ways we've ended up completely full circle, in that we are now counting FLOPs. This time though I think we have very little hope for something as nice and pretty as asymptotics. At least ones that are useful.

14. antics ◴[08 Apr 25 22:40 UTC] No.43627176[source]▶

>>43626260 #

Oh I think there is no real doubt that this is true, it's why analytic combinatorics did not succeed in replacing asymptotic analysis. I just wish it had. If it was tractable in a similar set of areas I think a lot of things would be quite a bit better.

15. antics ◴[08 Apr 25 22:46 UTC] No.43627220[source]▶

>>43625358 #

I just meant that there are lots of problems where I think TCS could have contributed but didn't, because the relevant analysis was not acceptable at FOCS or SODA (or whatever). A good example is the original transformers paper, which is vastly over-complicated relative to (say) a later treatment[1] by Ruslan Salakhutdinov's lab, which shows that attention is "just" plain-old Nadaraya-Watson kernel smoothing—which if you don't know is a weighted average of points covered in elementary grad stats.

I'm not saying it was TCS people would be "better" at discovering this or anything like that, I'm just saying that the opportunity for contribution from TCS people is very high, and because the fields are not well-integrated you often have to wait years to uncover what ML concepts "really" are. I think working together would benefit both ML and TCS!

[1]: https://arxiv.org/pdf/1908.11775

16. ww520 ◴[08 Apr 25 23:16 UTC] No.43627410[source]▶

>>43624766 (TP) #

Coincidentally I just did an asymptotic complexity analysis yesterday (something I rarely do). It seemed right since it's by the book but I wasn't sure. After running the extensive benchmarks on the algorithm, I was so happy to find the results matched the asymptotic prediction. The lesson is verify, verify, and always verify.

[1] Algorithm, https://github.com/williamw520/toposort/blob/master/Algorith...

[2] Benchmarks, https://github.com/williamw520/toposort/blob/master/README.m...

17. elchananHaas ◴[09 Apr 25 03:09 UTC] No.43628547[source]▶

>>43624766 (TP) #

In cryptography it is common to get exact estimates for the number of bit operations required to break a cipher. Asymptomatic analysis is great in some circumstances. It for example explains why quicksort is faster than bubble sort for large inputs. Beyond that you want some empirical results.

18. pfdietz ◴[09 Apr 25 04:23 UTC] No.43628849{3}[source]▶

>>43625873 #

I found SMT solves were stymied by merely running into cubic runtimes.

19. pfdietz ◴[09 Apr 25 04:27 UTC] No.43628868[source]▶

>>43625382 #

Natural log of n, actually. That's because the expected number of times a new max is found is H_n, the nth harmonic number, where n is the size of the array. And H_n = ln n + gamma + o(1).

20. Tainnor ◴[09 Apr 25 19:34 UTC] No.43636712[source]▶

>>43625202 #

> For example, we assume that arbitrary arithmetic takes constant time.

Arithmetic on fixed-width number types is constant. It's very much not constant on arbitrary precision integers, something that underpins much of cryptography. Indeed, if you try to e.g. use Gaussian elimination (which uses O(n^3) arithmetic operations) to compute the determinant of a large integer matrix, you'll run into trouble, which is why there is another algorithm for this use case: https://en.m.wikipedia.org/wiki/Bareiss_algorithm

(Disclaimer: Also not a researcher)

replies(1): >>43637784 #

21. svat ◴[09 Apr 25 20:15 UTC] No.43637318[source]▶

>>43624766 (TP) #

In fact in these slides from a 2011/2012 talk, Sedgewick talks about exactly this, even going as far as to draw a contrast between "TCS" and "analysis of algorithms" (the "scientific method"): https://sedgewick.io/talks/#algorithms-for-the-masses = https://sedgewick.io/wp-content/uploads/2022/03/2011-12AlgsM... onwards.

(Also touched upon briefly in https://sedgewick.io/talks/#lasting-legacy-of-flajolet = https://sedgewick.io/wp-content/uploads/2022/03/2013-14Flajo...)

His point is exactly that, often researchers care only about worst-case analysis and asymptotics, which is not necessarily a useful scientific analysis of how long an algorithm is expected to take on a machine as a function of the input size (which was the original motivation for Knuth etc to develop the field).

The same authors (but ordered as Sedgewick and Flajolet!) also have a book called An Introduction to the Analysis of Algorithms; there's a course website at https://aofa.cs.princeton.edu/home/ and videos on Coursera (https://www.coursera.org/learn/analysis-of-algorithms) — it is advanced mathematics but less advanced than the Analytic Combinatorics book.

22. qsort ◴[09 Apr 25 20:48 UTC] No.43637784{3}[source]▶

>>43636712 #

My sentence there is a bit unclear -- I meant it as an implication.

Arithmetic on fixed-width registers in O(1) is equivalent to saying that bit operations are O(1), which intuitively makes the most sense, after all adding two uint64 is not the same as adding two arbitrary precision integers. But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.

Also, what does fixed mean? Is it like the word size in an actual computer architecture, where registers are 64 bits and that's all you'll ever get, or it depends on the size of input? Because the latter is the transdichotomous model, yet another option. Unsurprisingly you can get different results with different computational models.

This is not to say that asymptotic analysis is wrong, obviously, but it's a bit of a mess and it's important to remember what computational model the analysis was performed under.

replies(1): >>43641096 #

23. Tainnor ◴[10 Apr 25 06:00 UTC] No.43641096{4}[source]▶

>>43637784 #

> But that's not always the model we pick, often we just assume that arithmetic is O(1) and sweep length considerations under the rug.

So, I only have rudimentary undergraduate knowledge about complexity theory, but my materials (e.g. in theoretical CS, cryptography and computational algebra) were always incredibly clear about the fact that arbitrary integer arithmetic isn't constant - this is why complexity classes such as weakly vs. strongly NP-complete exist. Do you have any examples of where people would treat arbitrary length integer arithmetic as constant? (I think most people just never have to use arbitrary precision arithmetic, so that's why they'd not talk about it explicitly.)

You're right that computational complexity depends on the model and that this makes it nontrivial to use in practice. The theoreticians would sweep this under the rug with "these models are all polynomially equivalent to Turing machines" (which lets us formulate a conjecture such as P=NP precisely), but of course that doesn't in practice help you distinguish between a constant, linear or quadratic algorithm.

24. ogogmad ◴[12 Apr 25 10:31 UTC] No.43663155{3}[source]▶

>>43627140 #

I think this needs the use of a CAS (Computer Algebra System, like Mathematica) to calculate the runtime of some code relative to different choices of computer architecture. Then it looks practical. Could even be added to IDEs. But without the help of a CAS, few people would attempt this, because it would be too tedious to do by hand.