What is it that characterizes domain specific languages? Before you trot out something like “Programs written in them read like domain experts talk”, take a look at some examples of code written in domain specific languages:
R$* $: < $1 >
R$+ < $* > < $2 >
R< $* > $+ < $1 >
R<> $@ < @ >
R< $+ > $: $1
R@ $+ , $+ @ $1 : $2
R@ $+ : $+ $@ <@ $1> : $2
R$+ @ $+ $: $1 <@ $2>
R$+ < $+ @ $+ > $1 $2 <@ $3>
R$+ <@ $+ > $@ $1 <@ $2>
>gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577
If you’re reading this on the front page, try and work out what the ones you recognise do before you dip below the fold…
… Right, that’s scared the dilettantes off.
What do they do?
The first snippet is a regular expression. It should be an efficient matcher for double quoted strings which can span multiple lines and have backslash escaped internal quote marks. It’s not exactly easy to read because it’s optimized to reduce backtracking in failing cases. A simpler, but slower form would look like
/(?ms)"((?:\\.|[^\\"])*)"/. I say ‘should’ because I wrote it from first principles without recourse to the manual except to check on the syntax of
(?>...) – I’ve been using regular expressions for so long now, they’re getting pretty instinctive1. Every regular expression is a program written in a language whose domain is concisely expressing the rules to match strings. So long as what you’re matching doesn’t require you to worry about balancing bracket like things, a regex should suit you fine.
The second snippet is a sample of sendmail.cf ruleset, this is the one that canonicalizes email addresses into a form that the rest of the rulesets can work on. Once upon a time I could have taken you through it line by line and explain what it did, but the world has changed since then. Sendmail’s domain was routing email back in the days when there were a mllion and one different email address formats and networks. Who uses addresses like
kremvax!ivanucbvax.UUCP@, @email@example.com@ or (if you’re feeling sufficiently evil) @firstname.lastname@example.org@ these days? Nowadays all email addresses look pretty much the same and routing is substantially less complicated, so tools like Postfix, Exim and qmail are far, far easier to configure because the domain has shrunk to the point where you can get away with a mere configuration file.
The last one’s a BLAST query2 which can be run against the genome of your choice. Again, if you’re a geneticist (or, more likely a geneticist’s computer), this makes perfect sense. If you’re not a geneticist however, it’s the next best thing to gobbledegook.
Searching for the odd one out…
What characterises Domain Specific Languages isn’t their readability, it’s their narrow focus on their domain. The sendmail config file’s the odd one out here because it turns out that you can implement a Universal Turing Machine in a sendmail.cf file file. While Turing completeness might be fun to prove, it can be problematic in a Domain Specific Language. Ask James Duncan Davidson about accidentally making Ant Turing complete some time – there’s a reason he’s run away to be a photographer you know.
Domain Agnostic Languages
General purpose languages like Smalltalk, Lisp, Perl, Ruby, Python or *spit* Java aren’t domain specific languages in the same sense (unless you reckon that ‘writing general applications’ is narrow enough to be their domain). Instead they’re domain agnostic. General purpose languages support programming ‘in the language of the domain’. So ActiveRecord’s combination of well designed class methods let you describe the relationships between database tables by describing the relationships between objects. Meanwhile in Perl, Jifty has similar capabilities and Seaside meanwhile morphs programming for the web into something that feels remarkably like programming for the desktop. Paul Graham would probably tell you that Arc does the same thing with Lisp (he was also the inspiration for Seaside).
The more malleable of these languages let the cunning programmer embed full blown DSLs, complete with their own syntax and semantics within them. Lisp’s macros are the most obvious example of this, but Smalltalk has some pretty amazing tricks up its sleeve too – check out Scratch or Etoys. As I understand it, you can have a different compiler for every browser window if you want. Compared to these languages, Ruby, Python, Perl et al speak the language of the domain with more or less thick accents.
Do I even have a point?
Well, sort of. It’s easy and tempting to rail against the way everything and its brother is a DSL now, but that ship has probably already sailed. We can hope that as the underlying idea of the DSL – the problem shapes the tool – gains wider currency people will incorporate it in their practice and become better at their craft as a result. We old farts will continue to chunter about the young folks of today not realising what it was like when we had to whittle our own bits and write a DSL to build the parser that would let us build the DSL we really wanted. But maybe it’s good that people get starry eyed. It’s a step down the road; even if they never take the next one, at least they’ve moved in a good direction.
The moment you take a chunk of code and give it a name, you’re in the business of language design. Use your power wisely and the programming zeitgeist will beat a path to your door – especially if you have DHH’s seemingly instinctive marketing nous. Use it less wisely and you’re welcome to your ball of mud.
With his usual aplomb (and in the very first comment), chromatic pointed out a glaring cock up in the original wording of this post so I’ve tightened it up a bit.
1 I owe the vast bulk of my understanding of the guts of regex engines to reading the first edition of Mastering Regular Expressions, Jeffrey Friedl’s masterly explanation of the subject. I understand it’s only got better with subsequent editions3.
2 Oops, no it’s not. It’s “protein sequence in fasta format in which amino acids are represented by 1 letter codes. You can use it as a BLAST query but the options are endless. It is like a calling a comma separated text file as an Excel input only.” Thanks to Darked for pointing that one out to me in the comments. Remember kids, Google + a merely cursory knowledge of a domain will only get you so far. I’d maintain that fasta format can still be thought of as a domain specific language, but it has more in common with XML or HTML than, say SQL.
3 I just noticed the first edition was published in 1997 and I bought it as soon as it came out. I feel old.