Most active commenters
  • AdieuToLogic(8)
  • geophile(8)
  • strken(5)

←back to thread

1087 points smartmic | 25 comments | | HN request time: 1.863s | source | bottom
Show context
titanomachy ◴[] No.44305194[source]
“Good debugger worth weight in shiny rocks, in fact also more”

I’ve spent time at small startups and on “elite” big tech teams, and I’m usually the only one on my team using a debugger. Almost everyone in the real world (at least in web tech) seems to do print statement debugging. I have tried and failed to get others interested in using my workflow.

I generally agree that it’s the best way to start understanding a system. Breaking on an interesting line of code during a test run and studying the call stack that got me there is infinitely easier than trying to run the code forwards in my head.

Young grugs: learning this skill is a minor superpower. Take the time to get it working on your codebase, if you can.

replies(48): >>44305342 #>>44305375 #>>44305388 #>>44305397 #>>44305400 #>>44305414 #>>44305437 #>>44305534 #>>44305552 #>>44305628 #>>44305806 #>>44306019 #>>44306034 #>>44306065 #>>44306133 #>>44306145 #>>44306181 #>>44306196 #>>44306403 #>>44306413 #>>44306490 #>>44306654 #>>44306671 #>>44306799 #>>44307053 #>>44307204 #>>44307278 #>>44307864 #>>44307933 #>>44308158 #>>44308299 #>>44308373 #>>44308540 #>>44308675 #>>44309088 #>>44309822 #>>44309825 #>>44309836 #>>44310156 #>>44310430 #>>44310742 #>>44311403 #>>44311432 #>>44311683 #>>44312050 #>>44312132 #>>44313580 #>>44315651 #
geophile ◴[] No.44305628[source]
I am also in the camp that has very little use for debuggers.

A point that may be pedantic: I don't add (and then remove) "print" statements. I add logging code, that stays forever. For a major interface, I'll usually start with INFO level debugging, to document function entry/exit, with param values. I add more detailed logging as I start to use the system and find out what needs extra scrutiny. This approach is very easy to get started with and maintain, and provides powerful insight into problems as they arise.

I also put a lot of work into formatting log statements. I once worked on a distributed system, and getting the prefix of each log statement exactly right was very useful -- node id, pid, timestamp, all of it fixed width. I could download logs from across the cluster, sort, and have a single file that interleaved actions from across the cluster.

replies(4): >>44305698 #>>44306106 #>>44306184 #>>44308522 #
1. AdieuToLogic ◴[] No.44306184[source]
> A point that may be pedantic: I don't add (and then remove) "print" statements. I add logging code, that stays forever. For a major interface, I'll usually start with INFO level debugging, to document function entry/exit, with param values.

This is an anti-pattern which results in voluminous log "noise" when the system operates as expected. To the degree that I have personally seen gigabytes per day produced by employing it. It also can litter the solution with transient concerns once thought important and are no longer relevant.

If detailed method invocation history is a requirement, consider using the Writer Monad[0] and only emitting log entries when either an error is detected or in an "unconditionally emit trace logs" environment (such as local unit/integration tests).

0 - https://williamyaoh.com/posts/2020-07-26-deriving-writer-mon...

replies(2): >>44306322 #>>44307569 #
2. strken ◴[] No.44306322[source]
It's absolutely not an anti-pattern if you have appropriate tools to handle different levels of logging, and especially not if you can filter debug output by area. You touch on this, but it's a bit strange to me that the default case is assumed to be "all logs all the time".

I usually roll my own wrapper around an existing logging package, but https://www.npmjs.com/package/debug is a good example of what life can be like if you're using JS. Want to debug your rate limiter? Write `DEBUG=app:middleware:rate-limiter npm start` and off you go.

replies(3): >>44306540 #>>44307369 #>>44309715 #
3. AdieuToLogic ◴[] No.44306540[source]
> It's absolutely not an anti-pattern if you have appropriate tools to handle different levels of logging, and especially not if you can filter debug output by area.

It is an anti-pattern due to what was originally espoused:

  I add logging code, that stays forever. For a major 
  interface, I'll usually start with INFO level debugging, to 
  document function entry/exit, with param values.
There is no value for logging "function entry/exit, with param values" when all collaborations succeed and the system operates as intended. Note that service request/response logging is its own concern and is out of scope for this discussion.

Also, you did not address the non-trivial cost implications of voluminous log output.

> You touch on this, but it's a bit strange to me that the default case is assumed to be "all logs all the time".

Regarding the above, production-ready logging libraries such as Logback[0], log4net[1], log4cpp[2], et al, allow for run-time configuration to determine what "areas" will have their entries emitted. So "all logs all the time" is a non sequitur in this context.

What is relevant is the technique identified of emitting execution context when it matters and not when it doesn't. As to your `npm` example, I believe this falls under the scenario I explicitly identified thusly:

  ... an "unconditionally emit trace logs" environment
  (such as local unit/integration tests).
0 - https://logback.qos.ch/

1 - https://logging.apache.org/log4net/index.html

2 - https://log4cpp.sourceforge.net/

replies(2): >>44306621 #>>44318706 #
4. strken ◴[] No.44306621{3}[source]
I understand that you explained some exceptions to the rule, but I disagree with two things: the assumption of incompetence on the part of geophile to not make logging conditional in some way, and adding the label of "anti-pattern" to something that's evidently got so much nuance to it.

> the non-trivial cost implications of voluminous log output

If log output is conditional at compile time there are no non-trivial cost implications, and even at runtime the costs are often trivial.

replies(2): >>44306829 #>>44308529 #
5. AdieuToLogic ◴[] No.44306829{4}[source]
> ... I disagree with two things: the assumption of incompetence on the part of geophile to not make logging conditional in some way ...

I assumed nothing of the sort. What I did was identify an anti-pattern and describe an alternative which experience has shown to be a better approach.

"Incompetence" is your word, not mine.

> ... and adding the label of "anti-pattern" to something that's evidently got so much nuance to it.

I fail to see the nuance you apparently can see.

>> the non-trivial cost implications of voluminous log output

> If log output is conditional at compile time there are no non-trivial cost implications, and even at runtime the costs are often trivial.

Cloud deployment requires transmission of log entries to one or more log aggregators in order to be known.

By definition, this involves network I/O.

Network communication is orders of magnitude slower than local I/O.

Useless logging of "function entry/exit, with param values" increases pressure on network I/O.

Unless logging is allowed to be lossy, which it never is, transmission must be completed when log buffers are near full capacity.

Provisioning production systems having excessive logging can often require more resources than those which do not excessively log.

Thus disproving:

> ... even at runtime the costs are often trivial.

When considering the implication of voluminous log output in a production environment.

replies(3): >>44307087 #>>44318745 #>>44318757 #
6. strken ◴[] No.44307087{5}[source]
As I said, conditional. As in, you add logging to your code but you either remove it at compile time or you check your config at run time. By definition, work you don't do is not done.
replies(1): >>44307428 #
7. TeMPOraL ◴[] No.44307369[source]
I know a lot of people do that in all kinds of software (especially enterprise), still, I can't help but notice this is getting close to Greenspunning[0] territory.

What you describe is leaving around hand-rolled instrumentation code that conditionally executes expensive reporting actions, which you can toggle on demand between executions. Thing is, this is already all done automatically for you[1] - all you need is the right build flag to prevent optimizing away information about function boundaries, and then you can easily add and remove such instrumentation code on the fly with a debugger.

I mean, tracing function entry and exit with params is pretty much the main task of a debugger. In some way, it's silly that we end up duplicating this by hand in our own projects. But it goes beyond that; a lot of logging and tracing I see is basically hand-rolling an ad hoc, informally-specified, bug-ridden, slow implementation of 5% of GDB.

Why not accept you need instrumentation in production too, and run everything in a lightweight, non-interactive debugging session? It's literally the same thing, just done better, and couple layers of abstraction below your own code, so it's more efficient too.

--

[0] - https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

[1] - Well, at least in most languages used on the backend, it is. I'm not sure how debugging works in Node at the JS level, if it exists at all.

replies(2): >>44315207 #>>44322889 #
8. TeMPOraL ◴[] No.44307428{6}[source]
Conditionals aren't free either, and conditionals - especially compile-time - on logging code are considered by some a bugprone anti-pattern as well.

The code that computes data for and assembles your log message may end up executing logic that affects the system elsewhere. If you put that code under conditional, your program will behave differently depending on the logging configuration; if you put it outside, you end up wasting potentially substantial amount of work building log messages that never get used.

replies(1): >>44308300 #
9. panstromek ◴[] No.44307569[source]
I don't quite like littering the code with logs, but I understand there's a value to it.

The problem is that if you only log problems or "important" things, then you have a selection bias in the log and don't have a reference of how the log looks like when the system operates normally.

This is useful when you encounter unknown problem and need to find unusual stuff in the logs. This unusual stuff is not always an error state, it might be some aggregate problem (something is called too many times, something is happening in problematic order, etc.)

replies(2): >>44318779 #>>44324494 #
10. strken ◴[] No.44308300{7}[source]
This is getting a bit far into the weeds, but I've found that debug output which is disabled by default in all environments is quite safe. I agree that it would be a problem to leave it turned on in development, testing, or staging environments.
11. foldr ◴[] No.44308529{4}[source]
The whole concept of an “anti-pattern” is a discussion ender. It’s basically a signal that one party isn’t willing to consider the specific advantages and disadvantages of a particular approach in a given context.
12. williamdclt ◴[] No.44309715[source]
FWIW, it seems like poor man's tracing. You'd get that and a lot more having opentelemetry setup (using Jaeger for UI locally)
13. strken ◴[] No.44315207{3}[source]
I agree that logging all functions is reinventing the wheel.

I think there's still value in adding toggleable debug output to major interfaces. It tells you exactly what and where the important events are happening, so that you don't need to work out where to stick your breakpoints.

14. geophile ◴[] No.44318706{3}[source]
> There is no value for logging "function entry/exit, with param values" when all collaborations succeed and the system operates as intended.

Well, I agree completely, but those conditions are a tall order. The whole point of debugging (by whatever means you prefer) is for those situations in which things don't succeed or operate as intended. If I have a failure, and suspect a major subsystem, I sure do want to see all calls and param values leading up to a failure.

In addition to this point, you have constructed a strawman in which logging is on all the time. Have you ever looked at syslog? On my desktop Linux system, output there counts as voluminous. It isn't so much space, or so CPU-intensive that I would consider disabling syslog output (even if I could).

The large distributed system I worked on would produce a few GB per day, and the logs were rotated. A complete non-issue. And for the rare times that something did fail, we could turn up logging with precision and get useful information.

replies(1): >>44324026 #
15. geophile ◴[] No.44318745{5}[source]
You are very attached to this "voluminous" point. What do you mean by it?

As I said, responding to another comment of yours, a distributed system I worked on produced a few GB a day. The logs were rotated daily. They were never transmitted anywhere, during normal operation. When things go wrong, sure, we look at them, and generate even more logging. But that was rare. I cannot stress enough how much of a non-issue log volume was in practice.

So I ask you to quantify: What counts (to you) as voluminous, as in daily log file sizes, and how many times they are sent over the network?

replies(1): >>44323710 #
16. geophile ◴[] No.44318757{5}[source]
BTW, your username is a bit too on-the-nose, given the way you are arguing, using "anti-pattern" as a way to end all discussion.
replies(1): >>44324311 #
17. geophile ◴[] No.44318779[source]
I don't know what's "important" at the beginning. In my work, logging grows as I work on the system. More logging in more complex or fragile parts. Sometimes I remove logging where it provides no value.
18. geophile ◴[] No.44322889{3}[source]
A logging library is very, very far from a Turing complete language, so no Greenspunning. (Yes, I know about that Java logger fiasco from a few years ago. Not my idea.)

I don't want logging done automatically for me, what I want is too idiosyncratic. While I will log every call on major interfaces, I do want to control exactly what is printed. Maybe some parameter values are not of interest. Maybe I want special formatting. Maybe I want the same log line to include something computed inside the function. Also, most of my logging is not on entry/exit. It's deeper down, to look at very specific things.

Look, I do not want a debugger, except for tiny programs, or debugging unit tests. In a system with lots of processes, running on lots of nodes, if a debugger is even possible to use, it is just too much of a PITA, and provides far too miniscule a view of things. I don't want to deal with running to just before the failure, repeatedly, resetting the environment on each attempt, blah, blah, blah. It's a ridiculous way to debug a large and complex system.

What a debugger can do, that is harder with logging, is to explore arbitrary code. If I chase a problem into a part of my system that doesn't have logging, okay, I add some logging, and keep it there. That's a good investment in the future. (This new logging is probably at a detailed level like DEBUG, and therefore only used on demand. Obvious, but it seems like a necessary thing to point out in this conversation.)

19. AdieuToLogic ◴[] No.44323710{6}[source]
> You are very attached to this "voluminous" point. What do you mean by it?

I mean "a lot" or more specifically; "a whole lot."

Here is an exercise which illustrates this. For the purposes here, assume ASCII characters are used for log entries to make the math a bit easier.

Suppose the following:

  Each log statement is 100 characters.
  Each service invocation emits 50 log statements.
  Average transactions per second during high usage is 200tps.
  High usage is on average 2 hours per day.

  100 x 50 x 200 x 60 x 60 x 2 = 7_200_000_000 = 7.2GB / day
> So I ask you to quantify: What counts (to you) as voluminous, as in daily log file sizes, and how many times they are sent over the network?

The quantification is above and regarding log entries being sent over a network - in many production systems, log entries are unconditionally sent to a log aggregator and never stored in a local file system.

replies(1): >>44328512 #
20. AdieuToLogic ◴[] No.44324026{4}[source]
>> There is no value for logging "function entry/exit, with param values" when all collaborations succeed and the system operates as intended.

> Well, I agree completely, but those conditions are a tall order.

Every successful service invocation satisfies "all collaborations succeed and the system operates as intended." Another way to state this is every HTTP `1xx`, `2xx`, and `3xx` response code produced by an HTTP service qualifies as such.

> The whole point of debugging (by whatever means you prefer) is for those situations in which things don't succeed or operate as intended.

Providing sufficient context in the presence of errors, or "situations in which things don't succeed or operate as intended", was addressed thusly:

  If detailed method invocation history is a requirement, 
  consider using the Writer Monad and only emitting log 
  entries when either an error is detected or in an 
  "unconditionally emit trace logs" environment (such as 
  local unit/integration tests).
> If I have a failure, and suspect a major subsystem, I sure do want to see all calls and param values leading up to a failure.

See above.

> In addition to this point, you have constructed a strawman in which logging is on all the time.

No, I addressed your original premise in the context of a production web application, where logging is configured during deployment.

See also your own contradiction by previously asserting, "I sure do want to see all calls and param values leading up to a failure."

So which is it?

Did I construct a strawman "in which logging is on all the time"?

Or do you "want to see all calls and param values leading up to a failure", which requires "logging is on all the time"?

> Have you ever looked at syslog?

This is a strawman. Syslog is a component for logging and has nothing to do with the programs which use it.

> The large distributed system I worked on would produce a few GB per day, and the logs were rotated. A complete non-issue.

If this is the same system you described in a different comment also in this thread, I identified a standard industry practice of log entries produced by application nodes in a production environment being unconditionally sent to a log aggregator and not stored in a local file system. The reasons for this are well documented.

replies(1): >>44329492 #
21. AdieuToLogic ◴[] No.44324311{6}[source]
> BTW, your username is a bit too on-the-nose ...

Resorting to an ad hominem and non sequiturs are we? Really?

C'mon, you are better than that.

22. AdieuToLogic ◴[] No.44324494[source]
> The problem is that if you only log problems or "important" things, then you have a selection bias in the log and don't have a reference of how the log looks like when the system operates normally.

A case can be made for only logging the steps performed up to and including an error. This even excludes logging "important things" other than those satisfying system/functional requirements (such as request/response audit logs).

It is reminiscent of "the Unix philosophy", but different in important ways, and is essentially:

  Capture what would be log entries if a there is a
  future unrecoverable error.

  If an error is encountered, emit *all* log entries starting
  from the earliest point (such as receiving an external
  event or a REST endpoint request) up to and including the
  information detailing the unrecoverable error.

  If the workflow succeeds, including producing an expected
  failed workflow response (such as a validation error),
  discard the deferred log entries.
What constitutes the deferred log entries accumulated along the way is specific to the workflow and/or domain model.

While using a functional programming language and employing referentially transparent[0] abstractions (such as the Writer Monad) usually makes implementing this pattern much simpler than when using an imperative language, it can be successfully done with the latter given sufficient discipline and the workflow implementation being referentially transparent[0].

An important complement to the above is to employ other industry standard verification activities, such as unit/feature/integration tests.

0 - https://en.wikipedia.org/wiki/Referential_transparency

23. geophile ◴[] No.44328512{7}[source]
7.2G/day doesn't sound terrible. And I'd reduce it by a factor of 25, since in normal operation (i.e., not looking into a problem) I would have either 2 log statements per call (entry, exit), or none at all. It might be more than 2, if I needed detailed logging.

But in normal usage, even in the scenario you describe your argument about log volume is not convincing.

24. geophile ◴[] No.44329492{5}[source]
I'm not sure what point you are making with your scenario involving HTTP response codes. What if the HTTP server crashes, and doesn't send a response at all?

I don't know from Writer Monads. But if you only emitting log entries on some future failure or request, then that's potentially a lot of logging to keep somewhere. Where? Log aggregator? Local files? Memory? What about log volume? Does this writer monad implement log rotation? It sounds like you are sweeping a lot of the things you object to under this writer monad rug.

Let me be real clear about all calls and param values leading up to a failure.

- In normal operation, turn logging off completely, or turn on some level that produces tolerable log volume, (it seems like your threshold is much lower than mine).

- When a failure occurs: Restart the service with more logging enabled, (hence the all calls an param values), so that you have logging when the failure occurs again.

About local logs vs a log aggregator: The system I worked on was a shared nothing archive. To add storage and metadata capacity, you add nodes. Each node also stored its own log files. I get that this may not be the answer all the time, and that a log aggregator is useful in some scenarios. However, even in that case, your concerns about log volume seem overblown to me.

replies(1): >>44333995 #
25. AdieuToLogic ◴[] No.44333995{6}[source]
> I'm not sure what point you are making with your scenario involving HTTP response codes.

My point was to identify how common the "happy path" scenario is and was in direct response to:

>> There is no value for logging "function entry/exit, with param values" when all collaborations succeed and the system operates as intended.

> Well, I agree completely, but those conditions are a tall order.

Regarding your question:

> What if the HTTP server crashes, and doesn't send a response at all?

Again, the HTTP status codes were used to illustrate the frequency of successful invocations. But to your point, if an HTTP server crashes then log entries for in-flight workflows would likely not be emitted. A similar possibility also holds for local file system logging as buffering exists (sometimes on multiple levels).

> I don't know from Writer Monads.

No worries. All it is is a formal definition of a type having specific capabilities called "laws" in the functional programming world.

> But if you only emitting log entries on some future failure or request, then that's potentially a lot of logging to keep somewhere. Where? Log aggregator? Local files? Memory?

What is meant by "future failure" is a failure potentially encountered during the evaluation of a single invocation. In the context of a HTTP server, this would be the handling of each submission to a specific HTTP endpoint and verb. This is often defined within an IO Monad[0], but does not have to be, and is out of scope for this discussion.

The answer to the rest of your questions is the deferred log entry definitions are held in memory for the duration of the single service invocation, with any log emissions produced transmitted to a network-accesible log aggregator via the logging component used.

> Let me be real clear about all calls and param values leading up to a failure.

The scenario you kindly shared is understandable, yet is one which has been unacceptable in teams I have worked with. Losing the "first error" is not an option in those environments.

0 - https://en.wikibooks.org/wiki/Haskell/Understanding_monads/I...