I did something like this for my language. I built an interpreter and tested it by writing input programs and expected outputs. The test suite feeds the program through the interpreter and compares the actual and expected outputs. It's just like Ruby's executable RSpec. It's so nice. Every time I add a feature, I write an example program for it and that program automatically tests the language, its features, its semantics... With good comments they could conceivably be used to teach the language.
I eventually added support for real unit tests to my test suite as well. I started testing parts of the runtime through them. Those turned out to be a lot messier than I'd hoped. Hopefully I'll be able to improve them over time by applying the principles outlined in the article.