Most active commenters
  • cluckindan(3)
  • galangalalgol(3)

←back to thread

171 points irke882 | 27 comments | | HN request time: 1.645s | source | bottom
1. sugarpimpdorsey ◴[] No.44507048[source]
If we're being honest, YAML is one of the dumbest ideas of the last 20 years to have proliferated. How we got from XML to here I cannot comprehend.

This is not the first RCE involving YAML and it won't be the last.

replies(8): >>44507063 #>>44507118 #>>44507128 #>>44507156 #>>44507406 #>>44507812 #>>44507872 #>>44509145 #
2. ChocolateGod ◴[] No.44507063[source]
Why we settled on a file format that relies on invisible characters I'll never know.
replies(3): >>44507183 #>>44507280 #>>44515549 #
3. fmbb ◴[] No.44507118[source]
A search for XML on cve.org gives

> Showing 1 - 25 of 6,749 results for XML

Searching for YAML:

> Showing 1 - 25 of 288 results for YAML

replies(1): >>44507124 #
4. baq ◴[] No.44507124[source]
Is that from the past two years?
5. szszrk ◴[] No.44507128[source]
That was not RCE. It's not in yaml, it's in Helm's logic.

But glad you vented, I guess.

6. tsimionescu ◴[] No.44507156[source]
While YAML has all sorts of issues and disadvantages compared to XML, security is certainly not one of them. XML is a crazy source of security issues by design, especially with the crazy idea of adding built-in support for URLs that parsers are expected to follow.
7. imiric ◴[] No.44507183[source]
You use invisible characters whenever you press Enter or Space. If you're referring to Tab, many of the most popular programming languages like Go and Python use them as part of their syntax.

The reason YAML was popularized is because it was a response to XML which isn't user friendly to write. It's unfortunate that the spec got so convoluted, and uses a lot of implicit behavior, but I'd rather write YAML than XML, JSON or TOML for things like configuration files. Nowadays there might be better alternatives, but YAML is the de facto standard.

It's also unfortunate that YAML got abused by people who wanted to turn it into a DSL, so we ended up with thousands of lines of Ansible playbooks, CI workflows, and Helm charts, but here we are.

replies(3): >>44507315 #>>44507341 #>>44508467 #
8. qsort ◴[] No.44507280[source]
The gyrations people will go through to avoid using S-expressions...
9. drysart ◴[] No.44507315{3}[source]
It's unfortunate, but inevitable. Every structured text data format that sees widespread use, given enough time, will eventually be turned into a DSL.
replies(1): >>44508395 #
10. mrheosuper ◴[] No.44507341{3}[source]
i always enjoy writting json more. I feel it's easier to translate/integrate json into the code.
replies(2): >>44508370 #>>44508415 #
11. quotemstr ◴[] No.44507406[source]
In what way is this vulnerability YAML-specific?
12. javcasas ◴[] No.44507812[source]
Are we going to blame the next RCE we find in some application on XML just because that application uses XML somewhere?

If so, then I agree on blaming this on YAML.

13. immibis ◴[] No.44507872[source]
NIH syndrome and "inverse second system effect". In the real second system effect, the second system is more complicated because it includes everything that could possibly be perceived as missing in the first system. In the inverse second system effect the first system was perceived as too complicated, not too simple, so the second system is much simpler and doesn't do its job well.

Also this vuln has nothing to do with YAML

replies(1): >>44509110 #
14. cluckindan ◴[] No.44508395{4}[source]
In fact, once a structured text format is used as a data source for any process, it has already become a DSL.
15. cluckindan ◴[] No.44508415{4}[source]
YAML is a superset of JSON, so go right ahead and write your .yml files in JSON.
replies(2): >>44508967 #>>44510583 #
16. sofixa ◴[] No.44508467{3}[source]
> many of the most popular programming languages like Go and Python use them as part of their syntax

Go doesn't use tabs or whitespace as a part of its syntax. It's a part of the formatting, but not the syntax of the language.

Python on the other hand, one extra tab or whitespace can cause havoc.

17. galangalalgol ◴[] No.44508967{5}[source]
Sometimes what makes something great is what it lacks. An automatic transmission, operator overloading, schema extensions, batteries etc.
18. galangalalgol ◴[] No.44509110[source]
It is tangentially related in that yaml became normal to use as a DSL within the devops world. As another post said, everything becomes a DSL eventually because people want to be "fully configurable" not realizing that is roughly the same thing as not being complete yet. But in this case the lack of direct acknowledgement of yaml as an interpreted language with an interpreter that doesn't think of itself as such and hence doesn't have a real sandbox, is what leads us to the present. People didn't use xml as a DSL as often because it was so flexible. That would be like using c++ as a DSL instead to write the interpreter for one.
replies(1): >>44531353 #
19. fapjacks ◴[] No.44509145[source]
I have no horse in that race but just to see people talking about XML like this a quarter of a century after the first time I saw similar comments is just funny, I don't care who you are.
20. baobun ◴[] No.44510583{5}[source]
YAML is actually not a superset of JSON.

https://john-millikin.com/json-is-not-a-yaml-subset

https://news.ycombinator.com/item?id=30052633

replies(1): >>44513268 #
21. cluckindan ◴[] No.44513268{6}[source]
The NO case is not valid JSON.

So that leaves scientific notation.

replies(1): >>44515196 #
22. baobun ◴[] No.44515196{7}[source]
The point is that "going right ahead and write your .yml files in JSON" is not valid. You'd have to restrict yourself to a subset of JSON to not get different semantics.
replies(1): >>44515809 #
23. kubectl_h ◴[] No.44515549[source]
Exactly how I feel about Python!
24. joombaga ◴[] No.44515809{8}[source]
If you configure the parser to treat it as YAML 1.2 then you don't need to restrict yourself to a subset.
replies(1): >>44516079 #
25. deathanatos ◴[] No.44516079{9}[source]
This is a valid JSON value:

  "\ud83d\udca9"
Python's "PyYAML" package will not decode this to the same result as a JSON decoding.

Rust's `serde_yaml` will fail on this.

I don't know about other parsers, but I'd be curious to.

The standard itself isn't well written here, IMO.

> The content of a scalar node is an opaque datum that can be presented as a series of zero or more Unicode characters.

The example here is a "quoted scalar", which can contain the escapes you see. Those escapes represent "Unicode characters", specifically,

> Escaped 16-bit Unicode character.

But "Unicode characters" is never defined by YAML.

Most implementation seem to treat them as Unicode code points, and so thus the resulting string type in almost all cases in something like [UnicodeCodePoint]; in Rust, that means no unpaired surrogates, or we can't convert it to a Rust `String`, which is roughly speaking `[USV]`. In Python, that's workable, since that's Python's `str` datatype, but that means no surrogate decoding occurs.

The grammar also further implies that it's [UnicodeCodePoint] and not [USV], and the prose never restricts unpaired surrogates. (The JSON standard strongly implies the UTF-16 decoding should happen on escaped values, though it too waffles around unpaired surrogates. Whether unpaired surrogates are accepted is variable in JSON.)

But compare with a JSON string: a JSON string decodes to a something like a [USV], so surrogate pairs are decoded to their corresponding USV.

26. moondev ◴[] No.44531353{3}[source]
This is like blaming python problems on yaml because someone embedded a python script in a multiline string.
replies(1): >>44533823 #
27. galangalalgol ◴[] No.44533823{4}[source]
I wasn't blaming yaml at all. Our mistake is thinking we are using it as a configuration file. When we are actually using it as an interpreted language. Not yaml's fault people are writing dsl interpreters unknowingly. It's just related because people who make that mistake are picking yaml. I nearly made the mistake with toml a few years ago. You could even make the mistake with complicated environment variable usage. Whenever your configuration source is flexible enough to create executable primitives it needs to be sanitized. And really that is whenever a configurable value gets used in a conditional, which is often. Especially considering that even numeric values become conditional when they are used in operations that can result in ub or even just exceptions/panics/unhandled errors. Not a yaml exclusive.