The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.
The point is to parse the input into a structure which always upholds the predicates you care about so you don't end up continuously defensively programming in ifs and asserts.
I follow "Parse, Don't Validate" consistently in Go. For example, if I need to parse a JSON payload from an end-user for Foo, I define a struct called FooRequest, and I have exactly one function that creates a FooRequest instance, given a JSON stream.
Anywhere else in my application, if I have a FooRequest instance, I know that it's validated and well-formed because it had to have come from my FooRequest parsing function. I don't need sum types or any special language features beyond typing.
Don't define reasons why the input is invalid, instead have a target struct/object, and parse the input into that object.
if(!needToDoTheThing()) return;
DoTheThing();
We could have written it this way: if(needToDoTheThing()) {
DoTheThing();
}
else {
return;
}
The later is closer to how pattern match looks like. But in my experience, the majority of programmers prefer early return. I regularly see people "refactor" if-else to if-early-return, but I've never seen the opposite.In order to be useful, type systems need to be simple, but there's no such restrictions on rules that govern our expectations of data correctness.
OP is delusional if they think that their approach can be made practical. I mean, what if the expectation from the data that an value is a prime number? -- How are they going to encode this in their type systems? And this is just a trivial example.
There are plenty of useful constraints we routinely expect in message exchanges that aren't possible to implement using even very elaborate type systems. For example, if we want to ensure that all ids in XML nodes are unique. Or that the last digit of SSN is a checksum of the previous digits using some complex formula. I mean, every Web developer worth their salt knows that regular expressions are a bad idea for testing email addresses (which would be an example of parsing), and it's really preferable to validate emails by calling a number of predicates on them.
And, of course, these aren't the only examples: password validation (the annoying part that asks for capital letter, digit, special character? -- I want to see the author implement a parser to parse possible inputs to password field, while also giving helpful error messages s.a. "you forgot to use a digit"). Even though I don't doubt it's possible to do that, the resulting code would be an abomination compared to the code that does the usual stuff, i.e. just checks if a character is in a set of characters.
def TRUE(a, b):
return a
def FALSE(a, b):
return b
def IF(cond, a, b):
return cond(a, b)
assert IF(TRUE, 1, 2) == 1
assert IF(FALSE, 1, 2) == 2
This gives you the conditional statement in most languages ("cond ? a : b" or "a if cond else b").Dependent type checkers may be hard to implement, but the typing rules are fairly simple, and people have been using this correct by construction philosophy using dependently-typed languages for a while now.
There's nothing delusional about that.
People get too caught up in thinking that the type _has_ to express intricate properties, it doesn't. How am I going to express the expectation that something is prime? With the following closed API:
module Prime where
data PrimeNumber
parsePrime :: Int -> Maybe PrimeNumber
toInt :: PrimeNumber -> Int
Now the problem is that _leaving_ this API forgets information. Whether or not that is a problem is a different question, and very dependent on the context.The same applies to your comment about passwords. One can quite easily create a closed module that encapsulates a ValidPassword type that simply performs runtime character tests on a string.
I want to stress that this approach is making a trade off (as I earlier mentioned about leaving the API forgetting information, forcing you to re-parse). However, this puts this design somewhere in the middle of the spectrum. At one extreme end we have primitive obsession and shotgun parsing everywhere, with this we push the parsing into a sane place and try and hold on to these parsed values as long as possible, and at the extreme end we need dependent types or sophisticated encodings where the value carries a lot more information (and here we get towards propositions as types)
I had to work with this for a while because I wanted to implement Hello World in Javascript. https://github.com/quchen/lambda-ski/blob/master/helloworld/...
## type Silly = Foo | Bar Int | Qux String Silly
## Constructors
def Foo(onFoo, onBar, onQux):
return onFoo()
def Bar(arg0):
return lambda onFoo, onBar, onQux: onBar(arg0)
def Qux(arg0, arg1):
return lambda onFoo, onBar, onQux: onQux(arg0, arg1)
## Values of Silly type are Foo, Bar(x) and Qux(x, y)
## Destructor
def match_Silly(silly, onFoo, onBar, onQux):
return silly(onFoo, onBar, onQux)
You can make a whole language on top of that if you don't mind effectively disabling your CPU's branch predictor.> in my mind, the difference between validation and parsing lies almost entirely in how information is preserved
“parse don’t validate” is a pithy and easy to remember maxim for this preservation.
Because validation is implicitly necessary for parsing to a representation which captures your invariants anyway, by banning validation as a separate concept you ensure sole validation doesn’t get reintroduced, because any validation step outside of a wider parsing process is considered incorrect.
In TypeScript we can define
type Prime = number
function isPrime(value: number) value is Prime {
// run sieve
}
From here, you may have e.g. function foo(value: Prime, ...) {
}
And it will be typed checked. function fooOrFail(v: number) {
if (isPrime(v))
foo(v)
else
throw new TypeError()
}
I'd summarise boolean blindness as: implicit (often unsafe) coupling/dependencies of method results; which could instead be explicit data dependencies. That article's example is 'plus x y = if x=Z then y else S(plus (pred x) y)', which uses an unsafe 'pred' call that crashes when x is 'Z'. It avoids the crash by branching on an 'x=Z' comparison. The alternative is to pattern-match on x, to get 'Z' or 'S x2'; hence avoiding the need for 'pred'.
Another alternative is to have 'pred' return 'Maybe Nat'; although that's less useful when we have more constructors and more data (e.g. the 'NonEmptyList' in this "parse, don't validate" article!)
If you nest if/else, you'll quickly approach a point where you have to keep a complex logic tree in your head to determine which states the system could be in inside of any given branch. If you use guard clauses and return early, you'll keep this complexity down to a minimum, since the list of possible states changes linearly with your code instead of exponentially.
I know not everybody likes it, but I think this makes cyclomatic complexity an extremely valuable metric for measuring "ease-of-reading".
Kind of yes, but this discussion is much dependent on definitions of `parse` and `validate`, which the article does not explicitly elaborate on. The chapter "The power of parsing" captures this difference implicitly "validateNonEmpty always returns (), the type that contains no information". Validation, in the context of all of this, can be defined as "checking conformance to a set of rules" while parsing is mostly synonymous with deserialization.
In most practical application you explicitly do not want to only validate inputs as in you have no need to perform any computation on invalid input anyway. Sometimes you explicitly want to analyze invalid inputs, maybe try and recover some information or do some other magic. Sure then, go and validate input and do that magic on invalid input. In most cases you want to simply reject invalid inputs.
However, when you think about it, that is what parsing does. Validation happens during parsing implicitly: parser will either return a valid object or throw an error, but parsing has an added benefit that the end result is a datum of a known datatype. Of course it only really works in statically typed languages.
The thing is that it is rather easy to conflate the two. Take for example the following JSON `{"foo": "bar", "baz": 3}`. A "parser" can return 1) a list of `(String, String, String)` 3-tuples for (type, key, value) that downstream has to process again 2) full blown `FoobarizerParams` object or something in between.
There are plenty of cases in real-world code where an array that's part of a struct or object may or may not contain any elements. If you're just parsing input into that, it seems like you'd either still end up doing an equivalent of checking whether the array is empty or not everywhere the array might be used later, even if that check is looking at an "array has elements" type flag in the struct/object, and so you're still maintaining a description of ways that the input may be invalid. But I'm not a world-class programmer, so maybe I'm missing something. Maybe you mean something like for branches of the code that require a non-empty array, you have a second struct/object and parser that's more strict and errors out if the array is empty?
Don't have to be 100% immutable or perfect ADTs: see Rust, Swift, Kotlin. Even TypeScript can do this, albeit it's uglier with untagged unions and flow typing instead of pattern matching.
If you're talking about consuming the looser input and producing a definitely-correct output, already, then you're talking about parsing, not validation. Most validation occurs naturally during parsing.
const published = posts.filter(post => !post.draft);
Empty lists are actually generally not a big deal - they are just lists of size 0, and they generally follow all the same things you can do with non-empty lists. The fact that a "head" function throws an error on an empty list, is really just a specific form of the more general observation that: any array would throw an index-out-of-bounds exception when given an index that's... out of of bounds. So any time you are dealing with arrays, you probably need to think about, "what happens if I try to index something that's out of bounds? is that possible?"
In this particular contrived example, all that mattered was the head of the array. But what if you wanted to pick out the 3rd argument in a list of command line arguments, and instead the user only gave you 2 inputs? If 3 arguments are required, then throw an IOError as early as possible after failing to parse out 3 arguments; but once you pass the point of parsing the input into a valid object/struct/whatever, from that point forward you no longer care about checking whether the 3rd input is empty or not.
So again, it depends on your scenario. Actually the more interesting variant of this issue (in OO languages at least) is probably handling nulls, as empty lists are valid lists, but nulls are not lists, and requires some different logic usually (and hence why NullPointerExceptions aka NPEs are such a common failure mode).
(In fact you could use an invalid password just fine: unless you're doing something really weird, your code would not misbehave because it's too short or missing digits and symbols. It's only due to security reasons that you choose to reject that string.)
But that doesn't mean that `string -> Password` isn't parsing! As long as you're outputting distinct types for ValidPassword and InvalidPassword, you are still following the advice of this article, because you can make all your internal code use ValidPassword as a type and you will not need to ever check for password validity again.*
Compare that to e.g. adding a { IsValid = true } field to the object, which would require you to defensively sprinkle `if (user.Password.IsValid)` every time you try to actually use the password field.
* One weakness arising from the fact that this is degenerate parsing, i.e. ValidPassword is just a string, is that a very stubborn fool could build a ValidPassword from any arbitrary string instead of using the proper parse function. Depending on your language, this can be prevented by e.g. hiding the constructor so that only parsePassword has access to it.
validateEmail :: String -> IO ()
and a 'parsing' function validateEmail :: String -> Either EmailError ValidEmail
The property encoded by the ValidEmail type is available throughout the rest of the program, which is not the case if you only validate.I genuinely wonder how one would write a proof in something like Agda, that
parseJson("{foo:"+encodeJson(someObject)+"}")
always succeedsWhile the article is titled "parse don't validate" I like it's first point of make illegal states unrepresentable much better.
This is what Applicative Functors were born to do. Here's a good article on it: https://www.baeldung.com/vavr-validation-api
Check the types:
public Validation<Seq<String>, User> validateUser(...)
Even though it's called "validation", it's still the approach the OP recommends.It reads as "If you have a Seq of Strings, you might be able to construct a User, or get back validation errors instead".
Contrast this with the wrong way of doing things:
User user = new User(seqOfStrings);
user.validate();
My bad.
> People get too caught up in thinking that the type _has_ to express intricate properties
Where do you get this from? Did you even read what you are replying to? I never said anything like that... What I'm saying is that the approach taken by OP is worthless when it comes to real-life uses of validation.
So, continuing with your example: you will either end up doing validation instead of parsing (i.e. you will implement parsePrime validator function), or your will not actually validate that your input is a prime number... The whole point OP was trying to make is that they wanted to capture the constraints on data in a type describing those constraints, but outside of trivial examples s.a. non-empty list, that's used by OP, that leads to programs that are either impossible or are extremely complex.
> One can quite easily create a closed module that encapsulates a ValidPassword
And, again, that would be doing _validation_ not parsing. I'm not sure if you even understand what the conflict here is, or are you somehow agreeing with me w/o saying so?
Really? How is that degenerate? Compared to what?
My guess is that you just decided to use a dictionary word you don't fully understand.
> In fact you could use an invalid password
Where does this nonsense come from? No. I cannot use invalid password. That's the whole point of validation: making sure it doesn't happen. What kind of bs is this?
> But that doesn't mean that `string -> Password` isn't parsing!
It's doing nothing useful, and that's the whole point. You just patted yourself on the head in front of the mirror for using some programming technique that you believe makes you special, but accomplished nothing of value. That was the original point: if you want results, you will end up doing validation, there's no way around it. You renaming of types is only interesting to you and a group of people who are interested in types, but doesn't advance the cause of someone who wants their password validated.
You just admitted in this sentence that the use of opaque types achieves nothing of value. Which was my point all along: why use them if they are useless? Just to feel smart because I pulled out an academia-flavored ninety-pound dictionary word to describe it?
Now, are there really tools to make type systems with dependent types simple to prove? In reasonable time? How about the effort developers would have to put into just trying to express such types and into verifying that such an expression is indeed accomplishing its stated goals?
Just for a moment, imagine you filing a PR in a typical Web shop for the login form validation procedure, and sending a couple of screenfulls of Coq code or similar as a proof of your password validation procedure. How do you think your team will react to this?
Again, I didn't say it's impossible. Quite the opposite, I said that it is possible, but will be so bad in practice that nobody will want to use it, unless they are batshit crazy.
I have no problems with the way you want to interpret this claim. But, really, I'm responding to the article linked to this thread, which isn't about at which point in application to perform the said validation or parsing.
Yes, it's fine, if you want to validate your input in this way -- I have no problems with it. It's just that you are doing validation, not parsing, at least not in the terms OP used them.
https://fsharpforfunandprofit.com/posts/designing-with-types...
https://ybogomolov.me/making-illegal-states-unrepresentable/
Meanwhile, people with more than one bit in their worldview RAM can fall back to validation when it's the only option that makes sense for their domain, and use parsing when it's appropriate, which is, notwithstanding your handful of frankly niche examples compared to the vast bulk of CRUD code, most of the time in practice.
Am I mistaken?
My mistake in the above snippets is precisely that TypeScript can not make the type more specific, i.e. Number to Prime, because `type Prime=number` is only creating an alias. I am not creating a type that is a more specific version of number but an alias.
Had I actually created a proper type, the parsing would have been correct. The parsing component is happening in the outer function because at some point I need to make the generic input more specific, and then allow it to flow through the rest of the program. Am I mistaken?
A 'NotEmpty a' is just a subclass of a potentially empty 'a'. You also get the desirable behavior, in this scenario, of automatic upcasting of a 'NotEmpty a' into a regular old 'a'.
In Java, you'd implement this by making a class with a private constructor, no mutator methods, and a static factory method that throws an exception if the parsing fails. Since the only way to get an instance of the class is through the factory method, you've made illegal states unrepresentable and know that the class always holds to its invariants. No methods on instances of that class will throw exceptions from then on, so you've successfully applied "Parse, Don't Validate" without needing sum types.
The point of the article isn't the particular implementation in Haskell, it's the concept of pushing all data error states to the boundaries of your code, which applies anywhere as long as you translate it into the idioms of your language.
It absolutely is.
> I have no idea why would you question that.
I did not question [that they were different approaches], I explained, through example and counter-example, why they were the same approach. I will try again.
Alexis wrote both 'validate' and 'parse' examples in ML-style types:
validateNonEmpty :: [a] -> IO () // ML-typed 'validate'
parseNonEmpty :: [a] -> IO (NonEmpty a) // ML-typed 'parse'
More from the article: The difference lies entirely in the return type: validateNonEmpty always returns (), the type that contains no information, but parseNonEmpty returns NonEmpty a, a refinement of the input type that preserves the knowledge gained in the type system. Both of these functions check the same thing, but parseNonEmpty gives the caller access to the information it learned, while validateNonEmpty just throws it away.
I chose OO-style types for my samples, because there's a large fraction of HN users who dismiss ML-ish stuff as academic, or "practically useless outside of the most trivial cases". // OO-typed 'validate' (my straw man)
class User {
// returns void aka '()' aka "the type that contains no information"
void validateUser() throws InvalidUserEx {...}
}
/* OO-typed 'parse' (as per my baeldung link)
* "gives the caller access to the information it learned"
* In this case it gives back MORE than just the User,
* it also gives back 'why it went wrong', per your request above for password validation
* (In contrast with parseNonEmpty which just throws an exception.)
*/
class UserValidator {
Validation<Seq<String>, User> validateUser(...) {...}
}
> But, ML-style types have very limited expressive powerHindley-Milner types are a godddamned crown-jewel of computer science.
The example was written rather badly, though. It should have pointed out that the module was exporting the type and a couple helper functions, but not the data constructor.
But despite that, the key point was correct. Validating is examining a piece of data and returning "good" or "bad". Parsing is returning a new piece of data which encodes the goodness property at the type level, or failing to return anything. It's a better paradigm because the language prevents you from forgetting what situation you're in.
const info = type({ name: "string>0", email: "email" })
Yes, indeed. This is quite useful! But crabbone isn’t entirely wrong that it isn’t quite what the original article was about.
I’ve written quite a bit of code where constructive data modeling (which is what the original article is really about) was both practical and useful. Obviously it is not a practical approach everywhere, and there are lots of examples where other techniques are necessary. But it would be quite silly to write it off as useless outside of toy examples. A pretty massive number of domain concepts really can be modeled constructively!
But when they can’t, using encapsulation to accomplish similar things is absolutely a helpful approach. It’s just important to be thoughtful about what guarantees you’re actually getting. I wrote more about that in a followup post here: https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...
Not at all, the article is about pushing complexity to the "edges" of your code so that the gooey center doesn't have to faff around with (re-)checking the same invariants over and over... but its examples are also in Haskell, in which it would be weird to do this without the type system.
In python or java or whatever you'd just parse your received_api_request_could_be_sketchy or fetched_db_records_but_are_they_really into a BlessedApiRequest or DefinitelyForRealDBRecords in their constructors or builder methods or whatever, disallow any other ways of creating those types, and then exclusively using those types.
edit: wait, actually no we agree, I must have glossed over your second sentence, sorry
Instead it's parsing. It takes in a value of one type and returns a value of a different type that is known good. Or it fails. But what it never does is let you continue forward with an invalid value as if it was valid. This is because it's doing more than just validation.
SQL constraints are certainly useful. But they don’t really solve the same problem. SQL constraints ensure integrity of your data store, which is swell, but they don’t provide the same guarantees about your program that static types do, nor do they say much at all about how to structure your code that interacts with the database. I also think it is sort of laughable to claim that XSL is a good tool for solving just about any data processing problem in 2023, but even if you disagree, the same points apply.
Obviously, constructive data modeling is hardly a panacea. There are lots of problems it does not solve or that are more usefully solved in other ways. But I really have applied it to very good effect on many, many real engineering problems, not just toys, and I think the technique provides a nice framework for reasoning about data modeling in many scenarios. Your comments here seem almost bafflingly uncharitable given the article in question doesn’t make any absolutist claims and in fact discusses at some length that the technique isn’t always applicable.
See also: my other comment about using encapsulation instead of constructive modeling (https://news.ycombinator.com/item?id=35059113) and my followup blog post about how more things can be encoded using constructive data modeling than perhaps you think (https://lexi-lambda.github.io/blog/2020/08/13/types-as-axiom...).
Just to be clear, I actually really like the idea of parsing the input into a structure. I do the same thing in a lot of my code. I just don't see how it removes the need to also perform validation in many (maybe most) cases as soon as one gets beyond contrived examples.
The empty array example seems to be a can of worms. Maybe it's specific to the kinds of software that I've written, but in most of the cases I can think of, I wouldn't know if it was OK for a particular array within a structure to be empty until after the code had made some other determinations and branched based on them. And yet, like the example, once it got to the real handling for that case, it would be a problem if the array were empty. So the image in my mind is many layers of parsing that are much more complicated and confusing to read than validating the length of the array.
I still think it's a great idea for a lot of things, just that the "parse, don't validate" name seems really misleading. I might go with something like "parse first, validate where necessary".
However, TypeScript does not really provide any facility for nominal types, which in my opinion is something of a failure of the language, especially considering that it is at odds with the semantics of `class` and `instanceof` in dynamically-typed JavaScript (which have generative semantics). Other statically typed languages generally provide some form of nominal typing, even gradually typed ones. Flow even provided nominal types in JavaScript! But TypeScript is generally also quite unsound (https://twitter.com/lexi_lambda/status/1621973087192236038), so the type system doesn’t really provide any guarantees, anyway.
That said, TypeScript programmers have developed a way to emulate nominal typing using “brands”, which does allow you to obtain some of these benefits within the limitations of TS’s type system. You can search for “TypeScript branded types” to find some explanations.
This is similar, and is indeed quite useful in many cases, but it’s not quite the same. I explained why in this comment: https://news.ycombinator.com/item?id=35059886 (The comment is talking about TypeScript, but really everything there also applies to Java.)
Because you wrote a validation function, the exact thing OP told you not to do. Hooray?!
The goal of OP was to create a type that incorporates constraints on data, just like in their example about the non-empty list they created a type that in the type itself contains the constraints s.t. it's impossible to implement this type in a way that it will have an empty list.
You did the opposite. You created a type w/o any constraints whatsoever, and then added a validation function to it to make sure you only create values validated by that function. So... you kind of proved my point: it's nigh impossible to create a program intelligible to human beings that has a "prime number" type, and that's why we use validation -- it's easy to write, easy to understand.
Your type isn't even a natural number, let alone a prime number.
When I wrote GP I had in mind branding as the "right" way to get those benefits - though I was unaware of the name - however I see that it is still limited by TS' compiler's limitations.
So then going back to the initial snippets, my issue is that the Prime type is essentially behaving like newtype, thus the inner calls can not actually rely on the value actually being prime, yes?
I have to admit that quite a few of the things in the blog are beyond my current understanding. Do you have any recommended reading for post grads with rudimentary understanding of Haskell who would like to get deeper into type systems?
See this fantastic reply by the author:
If I'm understanding the difference correctly, it's that the constructive data modeling approach can be proven entirely in the type system without any trust in the library code, while the Java approach I recommended depends on there being no other way to construct an instance of the class, which can be tricky to guarantee. Is that accurate?
validateEmail : String -> String -- post-condition: String contains valid email
whereas parse looks like: parseEmail : String -> Either EmailError ValidEmail
There is no problem using `ValidEmail` abstraction. The problem is type stability, when your program enters a stronger state at runtime (i.e. certain validations are performed at runtime) it's best to enter a strong state at compile time (stronger types) so that compiler can verify these conditions. If you remain at String, these validations (that a string is valid email) have no compile-time counterpart so there is no way for compiler to verify. So use `ValidEmail` instead.The problem with your `Prime` type is that it is just a type alias: a new way to refer to the exact same type. It’s totally interchangeable with `number`, so any `number` is necessarily also a `Prime`… which is obviously not very helpful. (As it happens, the Haskell equivalent of that would be basically identical, since Haskell also uses the `type` keyword to declare a type alias.)
As for recommended reading, it depends on what you’d like to know, really. There are lots of different perspectives on type systems, and there’s certainly a lot of stuff you can learn if you want to! But I think most working programmers probably don’t benefit terribly much from the theory (though it can certainly be interesting if you’re into that sort of thing). Perhaps you could tell me which things you specifically find difficult to understand? That would make it easier for me to provide suggestions, and it would also be useful to me, as I do try to make my blog posts as accessible as possible!
What on Earth are you talking about? What dynamic enforcement?
> Though I think it is a bit silly to suggest that I have “never even considered”
In the context of this conversation you showed no signs of such concerns. Had you have such concerns previously, you wouldn't have arrived at conclusions you apparently have.
> a dynamically-typed language
There's no such thing as dynamically-typed languages, just like there aren't blue or savory programming languages. Dynamically-typed is just a word combo that a lot of wannabe computer scientists are using, but there's no real meaning behind it. When "dynamic" is used in the context of types, it refers to the concrete type obtained during program execution, whereas "static" refers to the type that can be deduced w/o executing the program. For example, union types cannot be dynamic. Similarly, it's not possible to have generic dynamic types. Every language thus has dynamic and static types, except, in some cases, the static analysis of types isn't very useful because the types aren't expressive enough, or the verification is too difficult. Conversely, in some languages there's no mechanism to find out exact runtime types because the information about types is considered to be extraneous to the program and is removed from the runtime.
The division that wannabe computer scientists are thus trying to make between "dynamically-typed" and "statically-typed" lies roughly along the lines of "languages without useful static analysis method" and "languages that may be able to erase types from the runtime in most cases". Where "useful" and "most cases" are a matter of subjective opinion. Often times such boundaries lead claimers to ironically confusing conclusions, s.a. admitting that languages like Java aren't statically typed or that languages like Bash are statically typed and so on.
Note that "wannabee computer scientist" applies also to people with degrees in CS (I've met more than a dozen), some had even published books on this subject. This only underscores the ridiculous state in which this field is.
> discusses at some length that the technique isn’t always applicable.
This technique is not applicable to overwhelming majority of everyday problems. It's so niche it doesn't warrant a discussion, but it's instead presented as a thing to strive for. It's not a useful approach and at the moment, there's no hope of making it useful.
Validation, on the other hand, is a very difficult subject, but, I think that if we really want to deal with this problem, then TLA+ is a good approach for example. But it's still too difficult to the average programmer. Datalog would be my second choice, which also seems appropriate for general public. Maybe even something like XSL, which, in my view lacks a small core that would allow one to construct it from first principles, but it's still able to solve a lot of practical tasks when it comes to input validation.
ML-style types aren't competitive in this domain. They are very clumsy tools when it comes to expressing problems that programmers have to solve every day. We, as community, keep praising them because they are associated with the languages for the "enlightened" and thus must be the next best thing after sliced bread.
On what grounds did you decide that this is the requirement for validation? That's truly bizarre... Sometimes validating functions return booleans... but there's no general rule that they do.
Anyways, you completely missed the point OP was trying to make. Their idea was to include constraints on data (i.e. to ensure data validity) in the type associated with the data. You've done nothing of the kind: you created a random atomic type with a validation method. Your type isn't even a natural number, you definitely cannot add other natural number to it or to multiply etc...
Worse yet, you decided to go into a language with subtyping, which completely undermines all of your efforts, even if you were able to construct all of those overloads to make this type behave like a natural number: any other type that you create by inheriting from this class has the liberty to violate all the contracts you might have created in this class, but, through the definition of your language, it would still be valid to say that the subtype thus created is a prime number, even if it implements == in a way that it returns "true" when compared to 8 (only) :D
Some dude comes up with another data definition language (DDL) that uses ML-style types. Everyone jumps from their seats in standing ovation. And in the end we get another useless configuration language that cannot come anywhere close to the needs of application developers, and so they pedal away on their squared-wheel bicycles of hand-rolled very custom data validation procedures.
This is even more disheartening because we already have created tools that made some very good progress into systematic input validation. And they were with us since the down of programming (well, almost, we had SQL since early 70's, then we also had Prolog, then we had various XML schema languages, and finally TLA+). It's amazing how people keep ignoring solutions that achieved so much compared to ensuring that a list isn't empty... and yet present it as the way forward...
Perhaps you are one, perhaps you are not, I don’t know, but either way, you certainly write like one. If you want people to take you seriously, I think it would behoove you to adopt a more leveled writing style.
Many of the claims in your comment are absurd. I will not pick them apart one by one because I suspect it will do little to convince you. But for the benefit of other passing readers, I will discuss a couple points.
> What on Earth are you talking about? What dynamic enforcement?
SQL constraints are enforced at runtime, which is to say, dynamically. Static types are enforced without running the program. This is a real advantage.
> There's no such thing as dynamically-typed languages, just like there aren't blue or savory programming languages. […] The division that wannabe computer scientists are thus trying to make between "dynamically-typed" and "statically-typed" lies roughly along the lines of "languages without useful static analysis method" and "languages that may be able to erase types from the runtime in most cases".
I agree that the distinction is not black and white, and in fact I am on the record in various places as saying so myself (e.g. https://twitter.com/lexi_lambda/status/1219486514905862146). Java is a good example of a language with a very significant dynamic type system while also sporting a static type system. But it is certainly still useful to use the phrase “dynamically-typed language,” because normal people know what that phrase generally refers to. It is hardly some gotcha to point out that some languages have some of both, and there is certainly no need to insult my character.
> This technique is not applicable to overwhelming majority of everyday problems. It's so niche it doesn't warrant a discussion, but it's instead presented as a thing to strive for. It's not a useful approach and at the moment, there's no hope of making it useful.
This is simply not true. I know because I have done a great deal of real software engineering in which I have applied constructive data modeling extensively, to good effect. It would be silly to list them because it would simply be listing every single software project I have worked on for the past 5+ years. Perhaps you have not worked on problems where it has been useful. Perhaps you do not like the tradeoffs of the technique. Fine. But in this discussion, it’s ultimately just your word against mine, and many other people seem to have found the techniques quite useful—and not just in Haskell. Just look at Rust!
> Datalog would be my second choice, which also seems appropriate for general public.
The idea that datalog, a first-order relational query language, solves data validation problems (without further clarification) is so laughable that merely mentioning it reveals that you are either fundamentally unserious or wildly uninformed. It is okay to be either or both of those things, of course, but most people in that position do not have the arrogance and the foolishness to leave blustering comments making an ass of themselves on the subject on an internet forum.
Please be better.
My man... if I, the author of my program, was constructing the input, I wouldn't need no validation. Input isn't meant to be constructed by the program's author, it's supposed to be processed...
Parsing is an additional job on top of validation - providing type-level evidence that the data is good. That's what makes it valuable. It's not some theoretical difference in power. It's better software engineering.
I find it quite interesting though I never had the time to study it further until now, so any recommendations are appreciated!
> To some readers, these pitfalls may seem obvious, but safety holes of this sort are remarkably common in practice. This is especially true for datatypes with more sophisticated invariants, as it may not be easy to determine whether the invariants are actually upheld by the module’s implementation. Proper use of this technique demands caution and care:
> * All invariants must be made clear to maintainers of the trusted module. For simple types, such as NonEmpty, the invariant is self-evident, but for more sophisticated types, comments are not optional.
> * Every change to the trusted module must be carefully audited to ensure it does not somehow weaken the desired invariants.
> * Discipline is needed to resist the temptation to add unsafe trapdoors that allow compromising the invariants if used incorrectly.
> * Periodic refactoring may be needed to ensure the trusted surface area remains small. It is all too easy for the responsibility of the trusted module to accumulate over time, dramatically increasing the likelihood of some subtle interaction causing an invariant violation.
> In contrast, datatypes that are correct by construction suffer none of these problems. The invariant cannot be violated without changing the datatype definition itself, which has rippling effects throughout the rest of the program to make the consequences immediately clear. Discipline on the part of the programmer is unnecessary, as the typechecker enforces the invariants automatically. There is no “trusted code” for such datatypes, since all parts of the program are equally beholden to the datatype-mandated constraints.
They are both quite useful techniques, but it’s important to understand what you’re getting (and, perhaps more importantly, what you’re not).
Take for example APIGatewayProxyEvent [1], which has a property `queryStringParameters` with type:
export interface APIGatewayProxyEventQueryStringParameters {
[name: string]: string | undefined;
}
You can then create a branded type like type AuthCodeEvent = APIGatewayProxyEvent & {
queryStringParameters: {
code: string;
state: string;
};
};
The branded type here means that as soon as you verify that the event has that structure above, and you can assume that it is correct in the code handles these specific cases.Though as the blog author mentioned in the other chain, the TS compiler is not particularly sound, so it's probably entirely possible to mess the structure and break the type without the compiler knowing about it.
[1] https://github.com/DefinitelyTyped/DefinitelyTyped/blob/mast...
You only check it if it makes a difference to validity or not. There's no scenario where you keep the array and a parallel flag - either an empty array is invalid in which case you refuse to construct if it's empty, or an empty array is valid in which case you don't even check. Same thing for if you're checking whether it's got an even number of elements, got less than five elements, etc. - you don't keep the flag around, you refuse to construct your validated structure (even if that "validated structure" is actually just a marker wrapper around the raw array, if your type system isn't good enough to express the real constraint directly).
What context is it exactly where they don't matter?
I can tell you in practice, in the real world, they very much do.
> They are useless if you want to ensure that a given number is a prime number.
It's not useless. The point is that once you have type `PrimeNumber` that can only be constructed after being validated, you then can write functions exist in a reality where only PrimeNumber exists.
https://play.haskell.org/saved/gRsNcCGo
> They are useless if you want to ensure that a given number is a prime number.
This is wrong. In the example above `addPrimes` will only take prime numbers.
As such if I make a Jira story that says "add multiply/subtract functions using the PrimeNumber type" I'll know that implementation is simplified by only being able to concern itself with prime numbers.
> Still, perhaps you are skeptical of parseNonEmpty’s name. Is it really parsing anything, or is it merely validating its input and returning a result? While the precise definition of what it means to parse or validate something is debatable, I believe parseNonEmpty is a bona-fide parser (albeit a particularly simple one).
> Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output.
The OP is saying that a validator is a function which doesn't return anything, whereas parsing is a function which returns data. (Or in other words, validation is when you keep passing around the data in the old type, and parsing is when you pass around a new type). It is true that there is code inside the parser which you can call "validation", but the OP is labeling the function based on its signature. This is made more obvious towards the end of the article:
> Use abstract datatypes to make validators "look like" parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to "fake" a parser from a validator.
They are talking about the interface, not the implementation. They are saying that you should pass around a parsed type, even if it's only wrapping a raw value, because it carries proof that this data has been validated. They are saying that you shouldn't be validating this data in lots of different places.
> It may not be immediately apparent what shotgun parsing has to do with validation—after all, if you do all your validation up front, you mitigate the risk of shotgun parsing. The problem is that validation-based approaches make it extremely difficult or impossible to determine if everything was actually validated up front or if some of those so-called “impossible” cases might actually happen. The entire program must assume that raising an exception anywhere is not only possible, it’s regularly necessary.
Obviously not a closed API since the playground only gives you one module, but I wrote an example on the Haskell playground:
safeDiv :: (Monad m, Alternative m) => Int -> Int -> m Int
safeDiv x y = do
guard (y /= 0)
pure (x `div` y)
main :: IO ()
main = do
print $ safeDiv @Maybe 1 0
print $ safeDiv @[] 1 0
-- print =<< safeDiv @IO 1 0 -- guard throws an error in IO
Try it out at https://play.haskell.org/saved/a6VsE3uQIf you’d like to learn Haskell, I think https://www.cis.upenn.edu/~cis1940/spring13/ is still a pretty nice resource. It is quick and to the point, and it provides some exercises to work through. There are lots of things in the Haskell ecosystem that you could explore if you wanted to after getting a handle on the basics.
If you want to learn about programming languages and type systems, you could read Programming Languages: Application and Interpretation (https://cs.brown.edu/courses/cs173/2012/book/), which has a chapter on type systems. Alternatively, if you want a more thorough treatment of type systems, you could read Types and Programming Languages by Benjamin Pierce. However, both PLAI and TAPL are textbooks, and they are primarily intended to be used as supporting material in a university course with an instructor. I think PLAI is relatively accessible, but TAPL is more likely to be a challenge without some existing background in programming languages.
> You come off as a crank. [... because of X, Y ,Z ]
..
> Please be better. [in the following manner: ... even if takes summarizing what was said]