←back to thread

287 points imadr | 3 comments | | HN request time: 0s | source
Show context
godelski ◴[] No.45108523[source]
I'm not a fan of how people talk about "first principles" as I think it just leads to lots of confusion. It's a phrase common in computer science that makes many other scientific communities cringe. First principles are things that cannot be reduced and you have to have very good justifications for these axioms. The reason the other scientific communities cringe is because either (most likely case) it's being used improperly and someone is about to demonstrate their nativity, or they know they're about to dive into a pedantic nightmare of nuances and they might never escape the rabbit holes that are about to follow.

In fact, I'd like to argue that you shouldn't learn things from first principles, at least in the beginning. Despite the article not being from first principles, it does illustrate some of the problems of first principles: they are pedantic. Everything stems from first principles so they have to be overly pedantic and precise. Errors compound so a small error in one's first principles becomes enormous by the time you look at what you're actually interested in. Worst of all, it is usually subtle, making it difficult to find and catch. This makes them a terrible place to begin, even when one already has expertise and is discussing with another expert. But it definitely should not be the starting place for an expert to teach a non-expert.

What makes it clear that the author isn't a physicist is that they don't appear to understand the underlying emergent phenomena[0]. It's probably a big part of why this post feels so disordered. All the phenomena they discussed are the same, but you need to keep digging deeper to find that (there's points where even physicists know they are the same but not how or why). It just feels like they are showing off their physics knowledge, but it is well below that which is found in an undergraduate physics degree[1]. This is why you shouldn't start at first principles, its simplicity is too complex. You'd need to start with subjects more complicated than QED. The rest derive out of whatever a grand unified theory is.

But as someone who's done a fair amount of physical based rendering, I'm just uncertain what this post has to do with it. I would highly recommend the book "Physically Based Rendering: From Theory To Implementation" by Pharr, Jakob, and Humphreys that the author says the post is based on. It does a much better job at introducing the goals and focusing on getting the reader up to speed. In particular, they define how the goal of PBR is to make things indistinguishable from a real photograph, which is a subtle but important distinction from generating a real photograph.

That said, I still think there's nice things about this post and the author shouldn't feel ashamed. It looks like they put a lot of hard work in and there are some really nice animations. It's clear they learned a lot and many of the animations there are not as easy as they might appear. I'm being critical but I want them to know to keep it up, but that I think it needs refinement. Finding the voice of a series of posts can be quite hard and don't let stumbles in the beginning prevent you from continuing.

[0] Well that and a lack of discussion of higher order interference patterns because physicists love to show off {Hermite,Laguerre}-Gaussian mode simulations https://en.wikipedia.org/wiki/Gaussian_beam#Higher-order_mod...

[1] In a degree you end up "learning physics" multiple times. Each time a bit deeper. By the end of an undergraduate degree every physicist should end up feeling like they know nothing about physics.

replies(10): >>45108693 #>>45108784 #>>45108817 #>>45109028 #>>45109031 #>>45109152 #>>45111038 #>>45112922 #>>45113311 #>>45113895 #
imadr ◴[] No.45108817[source]
Thanks for the constructive criticism! A few points I'd like to discuss:

Let's suppose the aim of the article was indeed to learn PBR from first principles, what would it look like? Quantum electrodynamics?

I think there is merit in exploring different physical models for fun and scientific curiosity (like I mentioned in the first chapter). I (personally) feel that it's boring to just dump equations like Snell's law without exploring the deeper meaning behind it. I also feel that it's easier to grasp if you have some surface knowledge about more complex physical models.

I agree however that I probably made many mistakes since I didn't study physics, I'd appreciate any feedback to improve that.

I dislike "Physically Based Rendering: From Theory To Implementation", I personally think that the literate programming approach of the book is way too confusing and disorganized. I prefer the SIGGRAPH talk by Naty Hoffman[0]

[0] https://www.youtube.com/watch?v=j-A0mwsJRmk

replies(5): >>45109092 #>>45109452 #>>45109819 #>>45109847 #>>45115087 #
godelski ◴[] No.45109847[source]
Sure! And I appreciate the response. I hope I didn't come off as too mean, it can be hard to find that balance in text, especially while criticizing. I really do not want to discourage you, and I think you should keep going. Don't let mistakes stop you.

  > Let's suppose the aim of the article was indeed to learn PBR from first principles, what would it look like?
I think you shouldn't go that route, but the most honest answer I can give is that such a beginning doesn't exist in physics knowledge. You could start with something like String Theory, Supergravity, Loop Quantum Gravity, or some other proposition for a TOE. Physicists are still on the search for first principles.

All this is well beyond my expertise btw, despite having worked in optics. If you want to see some of this complexity, but at a higher level, I'd highly recommend picking up Jackson's Elecrtodynamics book. That's that canonical E&M book for graduate level physics, Griffith's is the canonical version for undergraduate (Junior/Senior year). Both are very well written. I also really like Fowles's "Introduction to Modern Optics", and it is probably somewhere in between (I read it after Griffiths).

I am in full agreement with you that having deep knowledge makes a lot of more shallow topics (and even many other deep topics) far easier to grasp. But depth takes time and it is tricky to get people to follow deep dives. I'm not trying to discourage you here, I actually do encourage going deep, but just noting how this is a tricky line and that's why it is often avoided. Don't just jump into the deepend. Either wade people in or the best option is to lead them in so they don't even recognize they're going deep until they're already there.

  > I dislike <PBR Book>, I personally think that the literate programming approach of the book is way too confusing and disorganized
This is very understandable and I think something you should hone in on and likely where you can make something very successful. But an important thing to note about his SIGGRAPH talk is his audience. His talk is aimed at people who are experts in computer graphics, but likely computer scientists and not physicists. So his audience knows a fair amount of rendering to begin with and can already turn much of what's being said into the code already. But if you listen to it again I think you'll pick up on where he mentions they'll ignore a bunch of things[0]. There's no shame in ignoring some things and working your way forward. I actually like what Hoffman said at 22:25 "and we know that's an error. But we'll live with it for now." That's the mark of good scientific and engineering thinking: acknowledge errors and assumptions, triage, but move forward. A common mistake looks similar, dismissing those errors as inconsequential. That's putting them in the trash rather than tabling for later. Everything is flawed, so the most important thing is keeping track of those flaws, least we have to do extra work to rediscover them.

So, who is your audience?

This is just my opinion, so you have to be the real judge; but I think you should leverage your non-expertise. One of the hard things when teaching is that once you understand something you quickly forget how difficult it was to learn those things. We quickly go from "what the fuck does any of this mean" to "well that's trivial" lol. You referenced Feynman in your blog post and most important thing I learned from him is one of the best tools for learning is teaching (I've given way too many lectures to my poor cat lol). It forces you to answer a lot more questions, ones you normally would table and eventually forget about. But at your stage it means you have an advantage, that the topics you are struggling with and have overcome are much more fresh. When learning things we often learn from multiple sources (you yourself shared overlapping material), and that's because multiple perspectives give us lots of benefits. But at this point, don't try to be a physicist. If you want to work towards that direction, great! If you don't, that's okay too. But let your voice speak from where you are now.

Reading your blog post and poking through others, there's a "you" that's clear in there. Lean into it, because it is good. I like your attention to detail. Like in your Ray Marching post how you just color code everything. Not everyone is going to like that, but I appreciate it and find it very intuitive. I'm a huge fan of color coding equations myself and make heavy use of LaTeX's annotate-equations package when I make slides.

But I think looking at this post in isolation the biggest part to me is that it is unclear where you're going. This is a problem I suffer from a lot in early drafts. An advisor once gave me some great advice that works really well for any formal communication. First, tell "them" what you're going to tell them, then tell them, then tell them what you told them. It's dumb, but it helps. This is your intro, it is your hook. I think there's places for these ideas but early on they almost feel disconnected. This is really hard to get right and way too easy to overthink. I think I can help with a question: "What is your thesis?"/"What is your main goal?" Is it "learn how our human eyes capture light and how our brains interpret it as visual information"? Or is it "Physically based rendering from first principles". Or even "learn how to create physically realistic renderings of various materials." These are not exactly the same thing. When I'm struggling with this problem it is because I have too much to say. So my process is to create a "vomit draft" where I just get all the things out and it's going to be a mess and not in the right order. But once out of my head they are easier to put together and in the right order. After your vomit draft check your alignment. What is most important and what can be saved? What's the most bare bones version of what you need to communicate? Build out of that.

I do really think there's a good blog post in here and I can see a lot of elements that suggest a good series may come. So I do really encourage you to keep going. Look at what people are saying they like and what they dislike. But also make sure to not take them too literally. Sometimes when we complain about one thing we don't know our issue is something else. What I'm saying is don't write someone else's perfect post, write your post, but find best how to communicate what you want. I know I've said a lot, and I haven't exactly answered all your questions, but I hope this helps.

[0] There's a side note here that I think is actually more important than it appears. But the thing is that there's a weird relationship between computation and accuracy. I like to explain this looking at a Taylor Series as an example. Our first order approximation is usually easy to calculate and can usually get us a pretty good approximation (not always true btw). Usually much more than 50% accurate. Second order is much more computationally intensive and it'll significantly increase your accuracy but not as much as before. The thing is accuracy converges much like a log-like curve (or S-curve) while computation increases exponentially. So you need to make these trade-offs between computational feasibility and accuracy. The most important part is keeping track of your error. Now, the universe itself is simple and the computational requirements for it are lower than it takes us to simulate but there's a much deeper conversation about this that revolves around emergence. (The "way too short" version is there's islands of computational reducibility) But the main point here is this is why you should typically avoid going too low quickly, because you end up introducing too much complexity all at once and the simplicity of it all is masked by this complexity.

replies(2): >>45110180 #>>45117945 #
howardyou ◴[] No.45117945{3}[source]

    > But the thing is that there's a weird relationship between computation and accuracy. I like to explain this looking at a Taylor Series as an example. Our first order approximation is usually easy to calculate and can usually get us a pretty good approximation (not always true btw). Usually much more than 50% accurate. Second order is much more computationally intensive and it'll significantly increase your accuracy but not as much as before. The thing is accuracy converges much like a log-like curve (or S-curve) while computation increases exponentially.
This is something I've been thinking about a lot lately that I'd like to better understand. Are there any examples in physics or machine learning that you can think of that have more specific figures?
replies(1): >>45119328 #
godelski ◴[] No.45119328{4}[source]
I'm not sure what you exactly mean. But if you are interested in the problem in general I think any book on computational physics will make you quickly face this constraint. There's a reason people love first order methods like Euler but why second or higher order methods are needed in other situations. Or maybe you could look at second order gradient descent methods as they apply to machine learning (add "Hessian" to your search). You'll see there's some tradeoffs involved. And let's just note that through first order methods alone you may not be able to even reach the same optima that second order methods can. Or you could dig into approximation theory.

But I think first I'd just do some Taylor or Fourier expansions of some basic functions. This can help you get a feel of what's going on and why this relationship holds. The Taylor expansion one should be really easy. Clearly the second derivative is more computationally intensive than the first, because in order to calculate the second derivative you have to also calculate the first, right?

Mind you there are functions where higher order derivatives are easier to calculate. For example, the 100th derivative of x is just as easy to calculate as the second. But these are not the classes of functions we're usually trying to approximate...

replies(1): >>45119716 #
1. howardyou ◴[] No.45119716{5}[source]
Touching on what you were saying about accuracy converging like a log-like curve while computation increases exponentially, do you have an example where increasing computational resources by ten times leads to, say, only a 20% improvement in accuracy?
replies(1): >>45121426 #
2. godelski ◴[] No.45121426[source]
What I said before is a bit handwavy so I want to clarify this first. If we make the assumption that there is something that's 100% accurate, I'm saying that your curve will typically make the most gains at the start and much less as at the end. There can be additional nuances in this when discussing the limitations of metrics but I'd table that for your current stage (it is an incredibly important topic, so make sure you come back to it. You just need some pre-reqs to get a lot out of it[0]).

So maybe a classic example of this is the infamous 80/20 rule. You can read about the Pareto Principle[1] which really stems from the Pareto Distribution, which is a form of a Power Distribution. If you're looking at the wiki page for the Pareto Distribution (or Power Law), you'll see the shapes I'm talking about.

A real life example of this is when you train a machine learning model. Let's take accuracy for just simplicity. Let's look at PyTorch's example on using Tensorboard since that includes a plot at the very end[2]. Their metric is loss, which in this case is the inverse accuracy. So Accuracy is 0-100 (0 to 1) where higher is better, loss is just 1-accuracy, so 0 means perfectly accurate. From 0-2k iterations, they went from 1 to 0.6 (a 0.4 gain). Then at 4k iterations they are at a 0.4 loss (a 0.2 gain over 2k iterations). You see how this continues? It is converging towards a loss of 0.2 (accuracy = 80%). This is exactly what I'm talking about. Look at your improvements over some delta (in our case loss/(2k iterations)). It's a second order effect here, meaning it's non-linear.

This nonlinearity shows up everywhere. Going back to the 80/20 rule, it is often applied to coding. 80% of code is written using 20% of the time, but 20% of the code is written with 80% of the time. This should make sense as there are different bottlenecks. We'd be naive to just measure by lines of code (see [0]). A lot of time is spent on debugging, right? And mostly debugging just a few key areas. The reason this is true can derive from a simple fact: not all lines of code are equally as important.

So the other example I mentioned in the previous comment is Fourier Series[3]. That wiki has some nice visualizations and you'll be able to grasp what I'm talking about from them. Pay close attention to that first figure, the middle plot (image 2/17). These are different order approximations to a square wave. Might be hard to see, but as the more complex the wave (higher order) the better approximation you get to that square wave. Pay close attention to the calculations. Do a few yourself! How much work goes into calculating each term? Or rather, each order of approximation. I think you'll get the sense pretty quickly here that every higher order calculation requires you to also do the lower order ones.

As a more realistic example I am the creator of a state of the art image generator (I won't say which one to maintain some anonymity). When training my model the score quickly improves and really only a small amount of time. This training run took approximately 2 weeks wall time (what the clock says, not the GPU). Most of the improvement (via metric) took place in the first 6hrs. I was >90% of the way to my final score within the first day. If you look at the loss function in full, almost everything looks flat. But if you window it to exclude the first 24hrs, the shape reappears! There's a fractal nature to this (Power Distribution!). To put numbers to this, my whole run took 1M iterations and my final score was ~4.0. My first measurement was at 5k and was 180. My next measurement was at 25k and at 26. 15@50k, 9@100k, 6.8@200k, 5@500k, and so on. This is very normal and expected. (Then there's the complexity of [0]. Visually the images improved too. At 5k they were meaningless blobs. By 100k they had the general desired shape and even some detail appeared. By 500k most images resembled my target. At 800k I had SOTA but had could tell things were off. By 1M I thought there was a huge visual improvement from 800k but this is all down to subtle details and there are no measurements that can accurately reflect this)

I am happy to answer more but you're also asking about a complex topic with a lot of depth. One I absolutely love, but just giving you a warning :)

[0] The super short version is no matter what measurement you take you are using a proxy. Even a ruler is a proxy for a meter. It isn't exact. When measuring you approximate the measurement of the ruler which is an approximation of the measurement of a meter. This case is typically very well aligned so the fact that it is a proxy doesn't matter much (if you include your uncertainties). This isn't so simple when you move to more complex metrics like every single one you see in ML. Even something like "accuracy" is not super well defined. Go through a simple dataset like CIFAR-10 and you'll find some errors in labels. You'll also find some more things to think about ;) Table this for now but keep it in the back of your head and let it mature.

[1] https://en.wikipedia.org/wiki/Pareto_principle

[2] https://docs.pytorch.org/tutorials/intermediate/tensorboard_...

[3] https://en.wikipedia.org/wiki/Fourier_series

replies(1): >>45128842 #
3. howardyou ◴[] No.45128842[source]
Thanks for all of that!

If you don't mind, could I talk about it with you more over email? My email address is listed in my profile.