←back to thread

Parse, don't validate (2019)

(lexi-lambda.github.io)
398 points declanhaigh | 1 comments | | HN request time: 0s | source
Show context
ckdot2 ◴[] No.35054435[source]
Please, don't write your own JSON parser/validator. There's JSON Schema https://json-schema.org which has implementations in most languages. You can valiate your JSON by a given, standardized JSON schema file - and you're basically done. After the validation, it's probably good practise to map the JSON to some DTO and may do some further validation which doesn't check the structure of the data but it's meaning.
replies(2): >>35054491 #>>35055093 #
chriswarbo ◴[] No.35055093[source]
They're not saying you should write your own JSON parser/validator. They're saying that your existing parsing/validation/checking logic (using whatever libraries, standards, etc.) should not have a type signature like this:

  checkAgainstMySchema: JSON -> Boolean
Or this:

  checkedAgainstMySchema: JSON -> JSON
Instead, it's better to use a type signature like this;

  checkAgainstMySchema: JSON -> Either Error MyJSON
(Where MyJSON is some type which wraps-up your data; which could be the raw JSON, or perhaps some domain objects, or whatever)

The reason this is better, is that it's required for your program to work: if your processing functions take a 'MyJSON' as argument, then (a) your program must call the 'checkAgainstMySchema' function; and (b) you can only run your data processing in the successful branch (since that's the only way to get the 'MyJSON' argument you need).

In contrast, the functions which return 'Boolean' and 'JSON' are not required; in the sense that, we could completely forget to do any validation, and still end up with a runnable program. That's dangerous!

replies(1): >>35095875 #
ckdot2 ◴[] No.35095875[source]
OK, there's a lot of discussions about if booleans, Eithers/Optionals or exceptions should be used. And I'd say it's probably best to use what's most common in the programming language to be used. What I'm saying here is, I don't want to implement that method "checkAgainstMySchema". Because I know there's already a library for that.
replies(1): >>35109184 #
1. chriswarbo ◴[] No.35109184[source]
> I'd say it's probably best to use what's most common in the programming language to be used

Sure, I agree (or perhaps: what's considered "best practice"; or whatever our existing codebase is doing)

> there's a lot of discussions about if booleans, Eithers/Optionals or exceptions should be used

That's just an implementation detail, and misses the point. For example, all of those can be used to 'validate'; e.g.

- A function/method 'v1: JSON -> Boolean'

- A function/method 'v2: JSON -> JSON', which may throw exceptions

- A function/method 'v3: JSON -> Optional JSON'

- A function/method 'v4: JSON -> Either Error JSON'

The reason these are all bad has nothing to do with the language features or error-handling mechanisms employed. The reason they are bad is that they are all completely unnecessary.

For example, here are a bunch of programs which the above validators. They're all essentially equivalent, and hence have the same fundamental flaw:

  function trigger1(userInput: JSON) {
    if (v1(userInput)) {
      print "UNAUTHORISED, ABORTING"
      sys.exit(1)
    }
    else {
      launchMissiles(authorisation=userInput)
    }
  }

  function trigger2(userInput: JSON) {
    try {
      launchMissiles(authorisation=v2(userInput))
    }
    catch {
      print "UNAUTHORISED, ABORTING"
      sys.exit(1)
    }
  }

  function trigger3(userInput: JSON) {
    v3(userInput) match {
      case None => {
        print "UNAUTHORISED, ABORTING"
        sys.exit(1)
      }
      case Some(validated) => {
        launchMissiles(authorisation=validated)
      }
    )
  }

  function trigger4(userInput: JSON) {
    v3(userInput) match {
      case Left(error) => {
        print ("UNAUTHORISED, ABORTING: " + error)
        sys.exit(1)
      }
      case Right(validated) => {
        launchMissiles(authorisation=userInput)
      }
    )
  }
The reason they're all flawed is that validation can be skipped. In other words, you can write any validation logic; implemented with any mechanism you like; in any language; but your colleague's codde might never call it! All of the above 'trigger' functions could be replaced by this, and it will still work:

  function trigger(userInput: JSON) {
    launchMissiles(authoriser=userInput)
  }
In contrast, the 'parse' approach cannot be skipped. Here are some examples:

- A function/method 'p1: JSON -> Either Error MyJSON'

- A function/method 'p2: JSON -> Optional MyJSON'

- A function/method 'p3: JSON -> MyJSON', which may throw exceptions

Here are their corresponding 'trigger' functions:

  function trigger5(userInput: JSON) {
    p1(userInput) match {
      case Left(error) => {
        print ("UNAUTHORISED, ABORTING: " + error)
        sys.exit(1)
      }
      case Right(parsed) => {
        launchMissiles(authorisation=parsed)
      }
    }
  }

  function trigger6(userInput: JSON) {
    p2(userInput) match {
      case None => {
        print "UNAUTHORISED, ABORTING"
        sys.exit(1)
      }
      case Some(parsed) => {
        launchMissiles(authorisation=parsed)
      }
    }
  }

  function trigger7(userInput: JSON) {
    try {
      launchMissiles(authorisation=p3(userInput))
    }
    catch {
      print ("UNAUTHORISED, ABORTING: " + error)
      sys.exit(1)
    }
  }
These alternatives are much safer, since the 'launchMissiles' function now takes a 'MyJSON' value as argument; so we can't do `launchMissiles(authorisation=userInput)` (since 'userInput' has the type JSON, which isn't a valid input). Our colleages cannot skip or forget to call these p1/p2/p3 functions, since that's they only way they can turn the 'userInput' value they have, into a 'MyJSON' value they need.

> I don't want to implement that method "checkAgainstMySchema". Because I know there's already a library for that.

No, there isn't. I think you may be confused about how such 'parser' functions should be implemented. Nobody is saying to ignore existing libraries, or roll our own JSON grammars, or whatever. It's purely about how your project's datatypes are constructed. For example, something like this:

  function parseMyThing(json: JSON) {
    if (SomeExistingJSONSchemaLibrary.validate(json, SomeParticularSchemaMyApplicationIsUsing)) {
      return Right(SomeDatatypeIHaveWritten(...))
    }
    else {
      // Or use exceptions, or Optional, or whatever; it doesn't matter
      return Left("Invalid")
    }
  }
(If all of your project's datatypes, schemas, class, etc. were already provided by some existing library, then that project would be a bit pointless!)