Most active commenters
  • PhilipRoman(3)

←back to thread

601 points scalewithlee | 21 comments | | HN request time: 1.837s | source | bottom
1. netsharc ◴[] No.43793903[source]
Reminds me of an anecdote about an e-commerce platform: someone coded a leaky webshop, so their workaround was to watch if the string "OutOfMemoryException" shows up in the logs, and then restart the app.

Another developer in the team decided they wanted to log what customers searched for, so if someone typed in "OutOfMemoryException" in the search bar...

replies(2): >>43793976 #>>43795364 #
2. skipants ◴[] No.43793976[source]
I've actually gone through this a few times with our WAF. A user got IP-banned because the WAF thought a note with the string "system(..." was PHP injection.
3. PhilipRoman ◴[] No.43795364[source]
Careless analysis of free-form text logs is an underrated way to exploit systems. It's scary how much software blindly logs data without out of band escaping or sanitizing.
replies(1): >>43796073 #
4. ycombinatrix ◴[] No.43796073[source]
Why would someone "sanitize" OutOfMemoryException out of their logs? That is a silly point to make.
replies(3): >>43796243 #>>43796381 #>>43802850 #
5. owebmaster ◴[] No.43796243{3}[source]
An OutOfMemoryException log should not be the same as a search log

  Error: OutOfMemoryException
And

  Search: OutOfMemoryException
Should not be related in any way
replies(1): >>43796851 #
6. teraflop ◴[] No.43796381{3}[source]
The point is not to sanitize known strings like "OutOfMemoryException". The point is to sanitize or (preferably) escape any untrusted data that gets logged, so that it won't be confused for something else.
replies(1): >>43796721 #
7. swyx ◴[] No.43796721{4}[source]
i think GP's point is how would you even sanitize the string "OutOfMemoryException" which presumably comes from a trusted system

i guess demanding "Structured logs for everything or bust" is the answer? (i'm not a big o11y guy so pardon me if this is obvious)

replies(2): >>43797095 #>>43797161 #
8. dusanz ◴[] No.43796851{4}[source]
Until someone searches for "Error: OutOfMemoryException"
replies(2): >>43797036 #>>43802208 #
9. PhilipRoman ◴[] No.43797036{5}[source]
If structured logging is too much, unique prefixes solve this issue. Basically you need some token that user provided data is unable to output to the log. If you rigorously escape all newlines, you can then use start-of-line and end-of-line as unforgeable tokens. The possibilities are endless and it all comes down to understanding whether the intersection of two grammars is empty.
10. PhilipRoman ◴[] No.43797095{5}[source]
Low tech example: escape all newlines in user supplied strings, then add a known prefix to all user supplied data (let's say a double hashtag ##, but anything else works too). When you want to search logs for strings coming from your system, remove/ignore everything after the marker.

It all comes down to understanding whether the intersection of two grammars is empty.

replies(1): >>43797855 #
11. noisem4ker ◴[] No.43797161{5}[source]
"o11y" stands for "observability".

Numeronyms are evil and we should stop using them.

replies(3): >>43798193 #>>43798369 #>>43801650 #
12. jethro_tell ◴[] No.43797855{6}[source]
The difficulty here is that in the example above, it's unlikely, given any amount of scale, that the two people were on the same team. They were doing different things with the same data and probably didn't know what the other was doing.

Sure you could add a convention to your 'how to log' doc that specifies that all user input should be tagged with double '#' but who reads docs until things break? convention is a shitty way to make things work.

There's 100 ways that you could make this work correctly. Only restarting on a much more specific string, i.e. including the app name in the log line etc . . . but that's all just reducing the likely hood that you get burned.

I've also written a OOM-Killer.sh myself, I'm not above that, but it's one of those edge cases that's impossible to do correctly, which is why parsing and acting on log data generally considered and anti-pattern.

13. j1elo ◴[] No.43798193{6}[source]
Thanks. My mind started running the random string generator given those restrictions, like a puzzle game. But had no idea of what it meant until you wrote it. Who invented that stupid idea and thought it would be a good one?
replies(1): >>43798349 #
14. swyx ◴[] No.43798349{7}[source]
because its easily googlable.

counter point - people are going to use them, better to expose newbies early and often and then everyone is better off

shorthands will always be in demand. we used to say “horseless carriage”, then “automobile”, then “car”. would you rather use Light amplification by stimulated emission of radiation or just “laser”s? etc

in the new york times? sure, spell out observability. but on HN? come on. the term is 7 years old and is used all over the site. it’s earned it

replies(3): >>43800066 #>>43802105 #>>43802848 #
15. ramon156 ◴[] No.43798369{6}[source]
You're right, avoiding them gives better a11y
16. stalfosknight ◴[] No.43800066{8}[source]
I've never seen it before.
17. vintermann ◴[] No.43801650{6}[source]
t4s, couldn't agree more.
18. immibis ◴[] No.43802105{8}[source]
no we drive to work in our h16e to set up the l47n
19. happysadpanda2 ◴[] No.43802208{5}[source]
I read the gp to mean that error.log (being parsed to look for OOM) would have no associations with userSearches.log, in which an end-user searched for OOM
20. alpaca128 ◴[] No.43802848{8}[source]
Or we could use words that don't require people to use Google just to be able to read. A dependence on a search engine for reading text is unnecessary.

If you find typing tedious just use autocomplete or other tools instead of making it the readers' problem.

21. MortyWaves ◴[] No.43802850{3}[source]
Absolutely incredible how dense HN can be and that no one has explained. Obviously that isn’t what they are saying, they are saying it’s profoundly stupid to have the server be controlled by a simple string search at all.