Most active commenters
  • ckdot2(4)
  • mirekrusin(3)

←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 12 comments | | HN request time: 1.241s | source | bottom
1. ckdot2 ◴[] No.35054435[source]
Please, don't write your own JSON parser/validator. There's JSON Schema https://json-schema.org which has implementations in most languages. You can valiate your JSON by a given, standardized JSON schema file - and you're basically done. After the validation, it's probably good practise to map the JSON to some DTO and may do some further validation which doesn't check the structure of the data but it's meaning.
replies(2): >>35054491 #>>35055093 #
2. mirekrusin ◴[] No.35054491[source]
Json schema doesn't have relation with static type system, ie. in typescript it's much better to use composable, functional combinators at i/o boundaries only and don't do any extra checks anywhere where type system provides guarantees.
replies(2): >>35054548 #>>35054555 #
3. ckdot2 ◴[] No.35054548[source]
I think it's good enough. Besides JSON Schema being a standard instead of custom solution, you also get nice error messages in case there's a validation issue. If your JSON schema file is properly defined it should be safe enough to just map your JSON into some static type DTO afterwards and trust your data and it's types to be valid. In JSON Schema you can validate for strings, numbers, integers, and custom objects. It's quite powerful and - personally - I wouldn't want to implement that kind of stuff on my own.
replies(2): >>35054745 #>>35058179 #
4. bertrand-caron ◴[] No.35054555[source]
For anyone using both TypeScript and JSON schemas, but wanting to use TypeScript as the source of truth, I highly recommend the following library: [ts-json-schema-generator](https://github.com/YousefED/typescript-json-schema).

It does exactly what it says in the box: turns your TypeScript `types` / `interface` into machine-readable JSON schemas.

The library has a few open issues (does not deal well with some edge cases of composing Omit<> on sum types, and does not support dynamic (const) keys), but compared to manually writing JSON schemas, it's been amazing!

EDIT: I should add that the library supports adding further type constraints that are supported by JSON Schema but not by TS by using JSDoc (for instance, pattern matching on strings, ranges on numbers, etc.).

replies(2): >>35054767 #>>35054787 #
5. mirekrusin ◴[] No.35054745{3}[source]
You don't need to implement it on your own, you can use library.

Nice error messages exist there as well.

If you're casting untyped results, you can change one side and not the other and find out about this problem when in production. Or simply any mistake will get unnoticed.

Using typescript first library allows you to do much more - supports opaque types, custom constructors and any imaginable validation that can't be expressed in json schema.

6. mirekrusin ◴[] No.35054767{3}[source]
Adding extra transpilation step doesn't sound like a great solution.

It also doesn't support inlined assertions, referring to existing classes, custom validations, opaque types etc.

7. kristiandupont ◴[] No.35054787{3}[source]
I generally prefer Zod but in cases where I for one reason or another have to rely on JSON Schema, I use this package: https://www.npmjs.com/package/as-typed which infers TS types directly from a such. No extra build steps required. I then use AJV for runtime validation.
8. chriswarbo ◴[] No.35055093[source]
They're not saying you should write your own JSON parser/validator. They're saying that your existing parsing/validation/checking logic (using whatever libraries, standards, etc.) should not have a type signature like this:

  checkAgainstMySchema: JSON -> Boolean
Or this:

  checkedAgainstMySchema: JSON -> JSON
Instead, it's better to use a type signature like this;

  checkAgainstMySchema: JSON -> Either Error MyJSON
(Where MyJSON is some type which wraps-up your data; which could be the raw JSON, or perhaps some domain objects, or whatever)

The reason this is better, is that it's required for your program to work: if your processing functions take a 'MyJSON' as argument, then (a) your program must call the 'checkAgainstMySchema' function; and (b) you can only run your data processing in the successful branch (since that's the only way to get the 'MyJSON' argument you need).

In contrast, the functions which return 'Boolean' and 'JSON' are not required; in the sense that, we could completely forget to do any validation, and still end up with a runnable program. That's dangerous!

replies(1): >>35095875 #
9. lexi-lambda ◴[] No.35058179{3}[source]
Amusingly, the tweet that inspired this blog post—which is linked in the second paragraph of the article—is specifically about how automatically generating a JSON parser from your datatypes means you don’t have to implement that kind of stuff on your own, and there is no possibility of some separate “schema” going out of sync with your application logic.

Of course, if you want to share the schema with downstream clients so that other programs can use it, that is a great use case for something like JSON Schema. It is a common interface that allows two different programs—quite possibly written in completely different languages—to communicate using the same format. That’s great! But it’s only half the story, because just having the schema doesn’t help you in any way to make sure the code actually respects that schema. That’s where integration with the language’s type system can help, perhaps by automatically generating types from the schema and then generating parsing/serialization functions that use those generated types.

replies(1): >>35095782 #
10. ckdot2 ◴[] No.35095782{4}[source]
I think there's a misunderstanding here. I'm not only saying "use JSON schema files to define your expected payloads", I'm also saying, "use one of the existing JSON schema validators for your programming language". For most programming languages there's a library for that. So, if you don't need to write any code anymore that "respects that schema", the whole discussion becomes kind of obsolete.
11. ckdot2 ◴[] No.35095875[source]
OK, there's a lot of discussions about if booleans, Eithers/Optionals or exceptions should be used. And I'd say it's probably best to use what's most common in the programming language to be used. What I'm saying here is, I don't want to implement that method "checkAgainstMySchema". Because I know there's already a library for that.
replies(1): >>35109184 #
12. chriswarbo ◴[] No.35109184{3}[source]
> I'd say it's probably best to use what's most common in the programming language to be used

Sure, I agree (or perhaps: what's considered "best practice"; or whatever our existing codebase is doing)

> there's a lot of discussions about if booleans, Eithers/Optionals or exceptions should be used

That's just an implementation detail, and misses the point. For example, all of those can be used to 'validate'; e.g.

- A function/method 'v1: JSON -> Boolean'

- A function/method 'v2: JSON -> JSON', which may throw exceptions

- A function/method 'v3: JSON -> Optional JSON'

- A function/method 'v4: JSON -> Either Error JSON'

The reason these are all bad has nothing to do with the language features or error-handling mechanisms employed. The reason they are bad is that they are all completely unnecessary.

For example, here are a bunch of programs which the above validators. They're all essentially equivalent, and hence have the same fundamental flaw:

  function trigger1(userInput: JSON) {
    if (v1(userInput)) {
      print "UNAUTHORISED, ABORTING"
      sys.exit(1)
    }
    else {
      launchMissiles(authorisation=userInput)
    }
  }

  function trigger2(userInput: JSON) {
    try {
      launchMissiles(authorisation=v2(userInput))
    }
    catch {
      print "UNAUTHORISED, ABORTING"
      sys.exit(1)
    }
  }

  function trigger3(userInput: JSON) {
    v3(userInput) match {
      case None => {
        print "UNAUTHORISED, ABORTING"
        sys.exit(1)
      }
      case Some(validated) => {
        launchMissiles(authorisation=validated)
      }
    )
  }

  function trigger4(userInput: JSON) {
    v3(userInput) match {
      case Left(error) => {
        print ("UNAUTHORISED, ABORTING: " + error)
        sys.exit(1)
      }
      case Right(validated) => {
        launchMissiles(authorisation=userInput)
      }
    )
  }
The reason they're all flawed is that validation can be skipped. In other words, you can write any validation logic; implemented with any mechanism you like; in any language; but your colleague's codde might never call it! All of the above 'trigger' functions could be replaced by this, and it will still work:

  function trigger(userInput: JSON) {
    launchMissiles(authoriser=userInput)
  }
In contrast, the 'parse' approach cannot be skipped. Here are some examples:

- A function/method 'p1: JSON -> Either Error MyJSON'

- A function/method 'p2: JSON -> Optional MyJSON'

- A function/method 'p3: JSON -> MyJSON', which may throw exceptions

Here are their corresponding 'trigger' functions:

  function trigger5(userInput: JSON) {
    p1(userInput) match {
      case Left(error) => {
        print ("UNAUTHORISED, ABORTING: " + error)
        sys.exit(1)
      }
      case Right(parsed) => {
        launchMissiles(authorisation=parsed)
      }
    }
  }

  function trigger6(userInput: JSON) {
    p2(userInput) match {
      case None => {
        print "UNAUTHORISED, ABORTING"
        sys.exit(1)
      }
      case Some(parsed) => {
        launchMissiles(authorisation=parsed)
      }
    }
  }

  function trigger7(userInput: JSON) {
    try {
      launchMissiles(authorisation=p3(userInput))
    }
    catch {
      print ("UNAUTHORISED, ABORTING: " + error)
      sys.exit(1)
    }
  }
These alternatives are much safer, since the 'launchMissiles' function now takes a 'MyJSON' value as argument; so we can't do `launchMissiles(authorisation=userInput)` (since 'userInput' has the type JSON, which isn't a valid input). Our colleages cannot skip or forget to call these p1/p2/p3 functions, since that's they only way they can turn the 'userInput' value they have, into a 'MyJSON' value they need.

> I don't want to implement that method "checkAgainstMySchema". Because I know there's already a library for that.

No, there isn't. I think you may be confused about how such 'parser' functions should be implemented. Nobody is saying to ignore existing libraries, or roll our own JSON grammars, or whatever. It's purely about how your project's datatypes are constructed. For example, something like this:

  function parseMyThing(json: JSON) {
    if (SomeExistingJSONSchemaLibrary.validate(json, SomeParticularSchemaMyApplicationIsUsing)) {
      return Right(SomeDatatypeIHaveWritten(...))
    }
    else {
      // Or use exceptions, or Optional, or whatever; it doesn't matter
      return Left("Invalid")
    }
  }
(If all of your project's datatypes, schemas, class, etc. were already provided by some existing library, then that project would be a bit pointless!)