Most active commenters
  • xvolter(4)
  • Normal_gaussian(3)
  • geezerjay(3)
  • yepperino(3)
  • JimDabell(3)
  • saurik(3)
  • tptacek(3)

←back to thread

Fixing JSON

(www.tbray.org)
139 points robin_reala | 58 comments | | HN request time: 1.144s | source | bottom
1. outsidetheparty ◴[] No.12327880[source]
Shameful confession: when I was first introduced to JSON, I was convinced it would go nowhere. "XML already does everything JSON does! And there's no way to differentiate between nodes and attributes! And there are no namespaces! And no schemas! What's the point of JSON?" And a couple of years later I looked up from the tangled mess of XSLT I was working on to discover that the rest of the world had moved on.

JSON is just javascript, and it leaves out everything else. That's its entire reason for existing, and it caught on because that's all that 99.44% of anyone needed.

Timestamps you can add today, without changing the protocol; just put them in your data if you need them. So I'm not sure what he's even proposing there.

Schemas: OK, he doesn't like JSON Schema's anyOf. Fair enough. There's no real proposal here for how to fix it, so not much to say here.

Replacing commas with whitespace sounds to me like replacing a very minor irritant with a constant full-body rash. Stuff like his example of "IDs": [ 116 943 234 38793 ] would lead to far more confusion and errors than the occasional stray trailing comma.

So I guess I pretty much vote no on this one thanks for asking

replies(9): >>12327976 #>>12328071 #>>12328074 #>>12328283 #>>12329722 #>>12329776 #>>12330073 #>>12330517 #>>12334062 #
2. rkrzr ◴[] No.12327976[source]
"Timestamps you can add today, without changing the protocol; just put them in your data if you need them. So I'm not sure what he's even proposing there."

He is proposing to add a timestamp type to the grammar. This would have the advantage that there would be one canonical way to have timestamps in your JSON. It would also mean that the parser would already validate them for you and you would not have to do that yourself every time.

I definitely see value in that.

replies(2): >>12328116 #>>12329979 #
3. ubernostrum ◴[] No.12328071[source]
So I guess I pretty much vote no on this one thanks for asking

Yeah, this is precisely the kind of stuff people said would cause JSON to "lose" to XML -- how could you ever build anything without schemas and types and bunches of metadata to ensure you were using it correctly?

And this debate is not new; it's been around a long time, and I wrote about it at length literally ten years ago: http://www.b-list.org/weblog/2006/dec/21/i-cant-believe-its-...

4. Normal_gaussian ◴[] No.12328074[source]
> Replacing commas with whitespace sounds to me like replacing a very minor irritant with a constant full-body rash

Both vivid and accurate.

In order to combat trailing commas I normally place the comma on the following line, e.g.

    { "time": "3 minutes past four"
    , "age": 229
    , "sex": "monoecious species"
    , "appearance": "Tree-like.  It's a tree."
    }

    uint8_t *data    // Yada
          , *buffer  // Ya
          ;

    var javascript
      , variable = 6
      , declarations = [ "This is taking too long", "Yep" ]
      , mix = [ "Flour", "Sugar", "Water" ]
      , types = { 'old'     : (d) => { return d < new Date }
                , 'new'     : (d) => { return d > new Date }
                , 'borrowed': (o) => { return false }
                , 'blue'    : (c) => { return true }
                }
      , regularly = new Date().toISOString()
      ;
    
With such a format there is only ever a problem deleting the first line, which I find is much much harder to do without also noticing what you've done to the larger statement.
replies(4): >>12328715 #>>12329576 #>>12330611 #>>12330704 #
5. geezerjay ◴[] No.12328116[source]
> He is proposing to add a timestamp type to the grammar. This would have the advantage that there would be one canonical way to have timestamps in your JSON.

There is already a canonical way to have timestamps: encode them as defined in RFC 3339.

Done.

The proposals add nothing to JSON and don't make sense. I mean, screwing up with the language just to add a very specific data type that is used in only a specific corner case that isn't even supported in the use case that's provided as an example? Nonsense.

replies(3): >>12329113 #>>12330315 #>>12332303 #
6. xvolter ◴[] No.12328283[source]
I was thinking along very similar lines when reading the article by Tim Bray. JSON is designed for JavaScript compatibility, I almost stopped reading when he said:

> They’re inessen­tial to the gram­mar, just there for JavaScript com­pat­i­bil­i­ty.

The entire article goes on to suggest extremely terrible solutions to the problems he is pointing out.

In order:

Commas: If comma's are an annoyance, is that to mean there is no annoyance with XML markup? You can forget a closing tag just as easily, probably more easily. Just get a JavaScript or JSON linter. Fixed. If you are really annoyed by commas, move to YAML or another format.

Datetimes: Just store your dates in a standardized timezone and format, how about [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). You can also use a [JSON-Schema](http://json-schema.org/) if you really want.

Schemas: There's lots of potentials here. However, again, if you're going to change JSON so much, why not just pick another data format that has better schema support.

replies(5): >>12329120 #>>12329744 #>>12329802 #>>12330331 #>>12334990 #
7. yepperino ◴[] No.12328715[source]
> In order to combat trailing commas I normally place the comma on the following line

Ah okay, so you're that guy whose code I have to reformat in every new job.

replies(3): >>12328949 #>>12329567 #>>12330642 #
8. ketralnis ◴[] No.12328949{3}[source]
If you're having to do it on every job, maybe you're the unusual one
replies(1): >>12329325 #
9. sopooneo ◴[] No.12329113{3}[source]
Are you proposing that timestamps as described in RFC 3339 be put inside quotes and just be string values in JSON? That's the simplest way I can think to do it, but I've had adamant protest to that idea.

If you are not suggesting that timestamp values be wrapped in quotes, then wouldn't you have to worry about every existing parser out there tripping on them?

replies(1): >>12329928 #
10. sopooneo ◴[] No.12329120[source]
For datetimes, do you mean that they be set as strings in the standardized formats such as specified by ISO 8601? Like:

{"item": "lawn chair", "order-timestamp": "2016-08-21T00:58:47Z"}

replies(1): >>12329806 #
11. yepperino ◴[] No.12329325{4}[source]
Weirder still - why do I follow him from job to job?
replies(1): >>12331279 #
12. philovivero ◴[] No.12329567{3}[source]
Ah, so you're the guy whose code I have to re-reformat back to the proper format.
replies(1): >>12330860 #
13. stevewilhelm ◴[] No.12329576[source]
I have seen this approach also used when constructing complex SELECT FROM and WHERE clauses in MySQL
14. rat87 ◴[] No.12329722[source]
> JSON is just javascript,

Today json is not really tied to javascript that much, I deal with it all the time in python that doesn't include or interact with javascript. JSON is just dictionaries and lists (and strings and numbers and nulls) this is pretty easy to express in dynamically typed languages and not that hard in statically typed languages if you give up on a bit of (static) typing. I don't see any reason why it should be limited to a subset of javascript, there are some good reasons for not changing it all though.

15. rat87 ◴[] No.12329744[source]
> JSON is designed for JavaScript compatibility,

Javascript compatibility is the least important part of json, lots of people use json with python or ruby or rust or java or c for things that will never interact with javascript, and anyways you don't want people to parse json with eval or confusing json with things that include string concatenation or function calls adding non javascript syntax helps make that clear.

There is a good reason to not change JSON(backwards compatibility, updating parsers, confusing formats, ect.), since you're not planning on parsing javascript with eval there is no good reason why any changes made in JSON 2.0 have to conform to a subset of javascript.

replies(1): >>12329818 #
16. aarreedd ◴[] No.12329776[source]
Why not just allow trailing commas?
replies(1): >>12329812 #
17. anon1385 ◴[] No.12329802[source]
>JSON is designed for JavaScript compatibility, I almost stopped reading when

Why does Javascript compatibility matter in 2016? The original idea for JSON might have been to parse untrusted data using js eval() but I'd hope that nobody is doing that anymore.

replies(1): >>12329829 #
18. xvolter ◴[] No.12329806{3}[source]
Yes, that would be a strongly recommended way. I could see using a unix timestamp as a numeric field as another way.
19. stevedonovan ◴[] No.12329812[source]
Well, totally. Lua and C don't care. A minor change, breaking the strict-subset-of-javascript which is no longer important.
replies(1): >>12329848 #
20. xvolter ◴[] No.12329818{3}[source]
I think if breaking changes are made, there is no point, it would not be called JSON 2.0; it is an entirely new data format. When there are hundreds to thousands of alternatives, trying to "fix JSON" isn't the solution. JSON works because it is simple and easy to follow, it is robust and flexible, and has no restrictions. There is no good way to store dates in XML or many other formats, without the use of Schemas, and the syntax changes are minimally beneficial. It wouldn't save a lot of bandwidth with gzip, and the text data likely doesn't add up to a lot of disk space when compared to the data being stored; so removing commas which make parsing JSON easy and increases its adoption and usage, just so some people can have less linting errors doesn't seem like a large benefit.
21. xvolter ◴[] No.12329829{3}[source]
I don't believe the original purpose of JSON was to eval untrusted objects, since that would always have posed a security risk. It did however offer a very easy standard way to share and parse data back in 1999. In 2016, I believe continued backwards compatibility is a necessity; otherwise you are just creating a new data storage format. So removing commas doesn't seem like an acceptable improvement, especially when the recommendation comes out of a necessity due to human error while hand-editing. JSON is incredibly easy to read and write by hand, in comparison with XML it is far easier to parse and traverse through code do its significantly simplified structure. If however, you need more functionality, switch to a different format; why bother trying to mangle or force JSON to work where it doesn't. There are other technologies and standards out there, use them.
replies(1): >>12330687 #
22. rossy ◴[] No.12329848{3}[source]
Well, not really. ES5 allows trailing commas in object literals too.
23. Ericson2314 ◴[] No.12329928{4}[source]
The former I assume. It sounds fine to me. if one is worried aboutfragmentstion, create a standard on top JSON that provides rules for dealing with other types. Layering 101, people!
24. andy_ppp ◴[] No.12329979[source]
meteor ejson format adds a consistent timestamp format with breaking current Jason parsers.
25. cs02rm0 ◴[] No.12330073[source]
I loved JSON immediately, having hated XML for so long.

However, I'd agree with your conclusions. The minor irritations outlined here are less bad than breaking javascript compatibility.

26. JimDabell ◴[] No.12330315{3}[source]
> There is already a canonical way to have timestamps: encode them as defined in RFC 3339.

That's not canonical at all. It's not even a de facto use of timestamps in JSON; most specifications I see call for ISO 8601.

> Done.

Well no, not done, because you then have to wrap it in a string, which essentially hides it from the JSON parser altogether and moves responsibility for parsing it out of the JSON parser into your application. As far as the JSON parser is concerned, it could be any old string, not a timestamp.

> a very specific data type that is used in only a specific corner case

I don't see how you can call timestamps a corner case – the article says "Among the RESTful APIs that I can think of, exactly zero don’t have timestamps." – and my experience is the same. Pretty much every API I've worked with has used timestamps in some form or another. They aren't a corner case at all – aside links, they are probably the most common data type without direct support in JSON.

replies(1): >>12330522 #
27. JimDabell ◴[] No.12330331[source]
> If comma's are an annoyance, is that to mean there is no annoyance with XML markup? You can forget a closing tag just as easily

That's not the same thing at all. The commas in JSON are equivalent to the whitespace between attributes in XML. A missing closing tag in XML would be equivalent to forgetting to close an object literal or array in JSON. So no, there's no equivalent XML annoyance to what he's asking for, XML is already fixed in the manner he suggests.

28. smoyer ◴[] No.12330517[source]
Except you can put comments in your Javascript ... I'd love to have comments in JSON. It's not so important for machine generated files but when you hand-craft an example it's nice to annotate it.
replies(1): >>12332787 #
29. geezerjay ◴[] No.12330522{4}[source]
> That's not canonical at all. It's not even a de facto use of timestamps in JSON; most specifications I see call for ISO 8601.

Nonsense. ISO 8601 is an ISO standard that defines representations of dates and times.

Date and time representations aren't timestamps.

RFC 3339 is based on ISO 8601 but is designed specifically to handle timestamps by including provisions that tackle interoperability problems that ISO 8601 does not handle.

> Well no, not done, because you then have to wrap it in a string

Nonsense.

JSON is just the languge that is used as a superset for other domain-specific languages. How the other language defined (and parsed) isn't handled by the JSON parser, obviously.

Considering Tim Bray's example, If a domain-specific language specifies that a "Capture time" string is followed by a string storing an RFC 3339 timestamp, that's precisely what the parser for the domain-specific language is expected to parse. The document remains valid JSON, and the domain-specific language remains valid.

> I don't see how you can call timestamps a corner case

Because it is. It isn't a primitive data type, nor is it required to implement generic data structures. Hell, some programming languages don't even support any standard date and time container or aggregate data type. Timestamps are a very specific data type whose application is limited to very specific corner cases that are already handled quite well by other means.

replies(1): >>12330580 #
30. JimDabell ◴[] No.12330580{5}[source]
> RFC 3339 is based on ISO 8601 but is designed specifically to handle timestamps by including provisions that tackle interoperability problems that ISO 8601 does not handle.

So in other words, what I'm saying isn't "nonsense" at all, and they aren't fundamentally different things as you claim – you just prefer a standard that you think is better?

In any case, you're getting away from the point there. The point was not which standard was better, the point was that RFC 3339 is not canonical.

> > Well no, not done, because you then have to wrap it in a string

> Nonsense.

It's not nonsense in the slightest. You can't put an RFC 3339 timestamp into JSON without wrapping it in a string, at which point it is no longer part of JSON. All JSON sees is a string, not a timestamp.

> The document remains valid JSON, and the domain-specific language remains valid.

I never said that it wasn't valid JSON, my point was that as far as a JSON parser is concerned, it's a string, not a timestamp, so parsing a timestamp has to be handled by your application not the JSON parser.

> It isn't a primitive data type

When the subject of discussion is whether or not it should become a primitive data type, that's circular logic.

> nor is it required to implement generic data structures.

No, but it is required to implement a vast number of JSON-based APIs.

> Timestamps are a very specific data type whose application is limited to very specific corner cases

The "very specific corner cases" being every single RESTful API the author can think of? This is an incredibly common use case, I don't see how you can argue that it's a corner case when it's practically ubiquitous.

replies(1): >>12336581 #
31. truth_sentinell ◴[] No.12330611[source]
That is horrible from a human understanding perspective
replies(2): >>12330633 #>>12332050 #
32. Normal_gaussian ◴[] No.12330633{3}[source]
It makes perfect sense to me.

Which to be honest is what matters. However I would imagine that a lot of the problem grokking it is due to not being used to it.

33. Normal_gaussian ◴[] No.12330642{3}[source]
I highly recommend having a couple of reformatting scripts around - its a great way to find logical errors quickly.

Of course don't commit reformats where it diverges from the codebase.

34. macspoofing ◴[] No.12330687{4}[source]
>It did however offer a very easy standard way to share and parse data back in 1999.

Not 1999. 2004. JSON wasn't on anybody's radar in 1999.

>JSON is incredibly easy to read and write by hand, in comparison with XML

Maybe for small shallow objects (and for those XML is also quite readable). Once size or complexity get a little higher, you're done.

replies(1): >>12330748 #
35. majewsky ◴[] No.12330704[source]
I came across this formatting pattern in Haskell, but I still prefer trailing commas for one reason: I can trivially apply line-wise operations (e.g. sort or align) on the key-value pairs without breaking the syntax. When I sort your first snippet line-by-line, it becomes

  , "age": 229
  , "appearance": "Tree-like.  It's a tree."
  , "sex": "monoecious species"
  { "time": "3 minutes past four"
  }
and the syntax is broken. With trailing commas, the syntax always stays valid:

  {
    "age": 229,
    "appearance": "Tree-like.  It's a tree.",
    "sex": "monoecious species",
    "time": "3 minutes past four",
  }
replies(2): >>12330716 #>>12332601 #
36. tome ◴[] No.12330716{3}[source]
With prefix commas it also always stays valid

    {
    , "age": 229
    , "appearance": "Tree-like.  It's a tree."
    , "sex": "monoecious species"
    , "time": "3 minutes past four"
    }
replies(1): >>12332171 #
37. hutzlibu ◴[] No.12330748{5}[source]
> Once size or complexity get a little higher, you're done.

And what is better than JSON, when it gets complex?

replies(1): >>12330844 #
38. macspoofing ◴[] No.12330844{6}[source]
I meant you're done with reading and writing it by hand.
39. yepperino ◴[] No.12330860{4}[source]
I use industry standard IOCCC formatting rules. Doesn't everyone?

http://www.ioccc.org/2015/endoh1/prog.c

40. ngrilly ◴[] No.12331279{5}[source]
I like where this is going :-)
41. paulddraper ◴[] No.12332050{3}[source]
Why?

It looks easier for me; I'm far more likely to notice a missing comma.

But I'm sure you must have a valid point, if you would elucidate?

42. azinman2 ◴[] No.12332171{4}[source]
Is the first comma legal in Json or JavaScript? I'd have thought that'd be an error.
replies(1): >>12335068 #
43. paulddraper ◴[] No.12332303{3}[source]
Your arguments are good though I wouldn't exactly call it a "corner case".
44. saurik ◴[] No.12332601{3}[source]
You are assuming that trailing commas are ignored as valid, which is true of JavaScript but not of JSON or of C (the other example).
replies(2): >>12334959 #>>12365642 #
45. sangnoir ◴[] No.12332787[source]
> I'd love to have comments in JSON. It's not so important for machine generated files but when you hand-craft an example it's nice to annotate it.

Luckily, Douglas Crockford has an explanation and an almost prophetic solution[1] for your use-case: I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.

Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

1. https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...

46. jabits ◴[] No.12334062[source]
I mostly agree, except I still like and use XML in many cases. XML Schema validation is just too useful as a first pass validation step to verify the "shape" of a message/document.

And to me, with any decent editor with folding, it is easy to read. And very importantly, allows comments. I have never understood all the XML-hate out there...

replies(1): >>12335008 #
47. majewsky ◴[] No.12334959{4}[source]
Yes, I'm aware of this being an error in JSON (which is precisely the point of the discussion). I wasn't aware that it's a problem in C. Maybe gcc is more lenient here.
48. Mikhail_Edoshin ◴[] No.12334990[source]
The issue is not about forgetting something. XML elements are self-contained, you don't need to separate them, thus you don't need to known the context. If you generate XML programmatically, each function can just push its object into the stream and be done. With JSON you need to know if your object is last or first at this level of hierarchy, so you know if you need to insert a comma or not.

Commas were meant to be used in code that you actually type. Using them as separators in a format that is meant to be produced and consumed by computers is clear, simple, and wrong.

49. Mikhail_Edoshin ◴[] No.12335008[source]
> I have never understood all the XML-hate out there...

Me neither. I think it's truly irrational.

50. reaktivo ◴[] No.12335068{5}[source]
Valid but introduces undefined as the first item of the array
51. geezerjay ◴[] No.12336581{6}[source]
> So in other words, what I'm saying isn't "nonsense" at all, and they aren't fundamentally different things as you claim – you just prefer a standard that you think is better?

It's nonsense, because a timestamp isn't a date representation. This was already demonstrated. I don't understand why you decided to ignore this.

> The point was not which standard was better, the point was that RFC 3339 is not canonical.

No, the point is that the ISO standard you've quoted doesn't define timestamps. Hence, the example you provided to refute what I've said was nonsense.

> You can't put an RFC 3339 timestamp into JSON without wrapping it in a string

...and you can't encode a date without representing the year as a number, the month as another number, the day as anohter number, etc etc etc.

You, somehow, miss the point that a primitive data type is not required to represent timestamps.

In fact, you can represent timestamps in JSON by defining an aggregate type.

Timestamps as primitive data types doesn't make any sense if it's possible to use the types that are already available to represent it.

> my point was that as far as a JSON parser is concerned, it's a string

Somehow, you don't understand that JSON is only the superset language, and that JSON-based domain-specific languages represent specific subsets of JSON obtained by imposing other parsing rules.

> When the subject of discussion is whether or not it should become a primitive data type, that's circular logic.

You somehow already forgot that this particular data type is already representable by using another primitive data type.

> No, but it is required to implement a vast number of JSON-based APIs.

No, it's not.

Wrap it in a string. Done.

If that's too hard to do, just specify an aggregate data type.

Is it that hard to understand?

> The "very specific corner cases" being every single RESTful API the author can think of?

Somehow, no RESTful API was barred from being implemented in JSON because of this corner case.

I suspect you are, somehow, confusing "convenience" with "necessity".

52. tptacek ◴[] No.12365642{4}[source]
Trailing commas are allowed in C!
replies(2): >>12367655 #>>12368010 #
53. mzs ◴[] No.12367655{5}[source]
C89 v. C99 difference IIRC

edit: I was wrong: http://www.lysator.liu.se/c/ANSI-C-grammar-y.html#initialize...

replies(1): >>12368022 #
54. saurik ◴[] No.12368010{5}[source]
The contextual example from C was "uint8_t * data, * buffer;" (spaces added to avoid italics), and no: a trailing comma is absolutely not allowed.
replies(1): >>12368092 #
55. saurik ◴[] No.12368022{6}[source]
No: tptacek did not bother paying attention to the context; as far as I know "uint8_t * data, * buffer, ;" (spaces added after * to avoid italics) would never be valid.
replies(1): >>12368084 #
56. tptacek ◴[] No.12368084{7}[source]
How is that the context? In the sense of the intersection between Javascript and JSON, trailing commas have pretty much always worked in C, which is something all of us who write code that generates C rely on, like a lot.
replies(1): >>12368525 #
57. tptacek ◴[] No.12368092{6}[source]
Replied here, to the first time I saw you write this:

https://news.ycombinator.com/item?id=12368022

58. mzs ◴[] No.12368525{8}[source]
I googled and trailing comma was allowed since C89, I was wrong. For code that generates code I always did something like this cause of that mistake:

  { a, b
  , c
  ...
  , z
  };