Just A Summary

Piers Cawley Practices Punditry

Domain Agnostic Languages 11

Posted by Piers Cawley Sat, 19 May 2007 23:52:00 GMT

Windmill tilting time again I’m afraid. Blame chromatic and David A. Black.

What is it that characterizes domain specific languages? Before you trot out something like “Programs written in them read like domain experts talk”, take a look at some examples of code written in domain specific languages:

/(?ms)"((?>[^\\"]+)?(?>\\.[^\\"]*)*)/
S3
R$*                     $: < $1 >
R$+ < $* >                 < $2 >
R< $* > $+                 < $1 >
R<>                     $@ < @ >
R< $+ >                 $: $1
R@ $+ , $+        @ $1 : $2
R@ $+ : $+         $@ <@ $1> : $2
R$+ @ $+                $: $1 <@ $2>
R$+ < $+ @ $+ >            $1 $2 <@ $3>
R$+ <@ $+ >             $@ $1 <@ $2>
>gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577
MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFE
NELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLG
SVTENVIKKSNKPVLVVKRKNS

If you’re reading this on the front page, try and work out what the ones you recognise do before you dip below the fold…

... Right, that’s scared the dilettantes off.

What do they do?

The first snippet is a regular expression. It should be an efficient matcher for double quoted strings which can span multiple lines and have backslash escaped internal quote marks. It’s not exactly easy to read because it’s optimized to reduce backtracking in failing cases. A simpler, but slower form would look like /(?ms)"((?:\\.|[^\\"])*)"/. I say ‘should’ because I wrote it from first principles without recourse to the manual except to check on the syntax of (?>...) – I’ve been using regular expressions for so long now, they’re getting pretty instinctive1. Every regular expression is a program written in a language whose domain is concisely expressing the rules to match strings. So long as what you’re matching doesn’t require you to worry about balancing bracket like things, a regex should suit you fine.

The second snippet is a sample of sendmail.cf ruleset, this is the one that canonicalizes email addresses into a form that the rest of the rulesets can work on. Once upon a time I could have taken you through it line by line and explain what it did, but the world has changed since then. Sendmail’s domain was routing email back in the days when there were a mllion and one different email address formats and networks. Who uses addresses like kremvax!ivan@ucbvax.UUCP, pdcawley@uk.ac.nott.cs or (if you’re feeling sufficiently evil) pdcawley@uk.org.bofh.org.uk these days? Nowadays all email addresses look pretty much the same and routing is substantially less complicated, so tools like Postfix, Exim and qmail are far, far easier to configure because the domain has shrunk to the point where you can get away with a mere configuration file.

The last one’s a BLAST query2 which can be run against the genome of your choice. Again, if you’re a geneticist (or, more likely a geneticist’s computer), this makes perfect sense. If you’re not a geneticist however, it’s the next best thing to gobbledegook.

Searching for the odd one out…

What characterises Domain Specific Languages isn’t their readability, it’s their narrow focus on their domain. The sendmail config file’s the odd one out here because it turns out that you can implement a Universal Turing Machine in a sendmail.cf file file. While Turing completeness might be fun to prove, it can be problematic in a Domain Specific Language. Ask James Duncan Davidson about accidentally making Ant Turing complete some time – there’s a reason he’s run away to be a photographer you know.

Domain Agnostic Languages

General purpose languages like Smalltalk, Lisp, Perl, Ruby, Python or *spit* Java aren’t domain specific languages in the same sense (unless you reckon that ‘writing general applications’ is narrow enough to be their domain). Instead they’re domain agnostic. General purpose languages support programming ‘in the language of the domain’. So ActiveRecord’s combination of well designed class methods let you describe the relationships between database tables by describing the relationships between objects. Meanwhile in Perl, Jifty has similar capabilities and Seaside meanwhile morphs programming for the web into something that feels remarkably like programming for the desktop. Paul Graham would probably tell you that Arc does the same thing with Lisp (he was also the inspiration for Seaside).

The more malleable of these languages let the cunning programmer embed full blown DSLs, complete with their own syntax and semantics within them. Lisp’s macros are the most obvious example of this, but Smalltalk has some pretty amazing tricks up its sleeve too – check out Scratch or Etoys. As I understand it, you can have a different compiler for every browser window if you want. Compared to these languages, Ruby, Python, Perl et al speak the language of the domain with more or less thick accents.

Do I even have a point?

Well, sort of. It’s easy and tempting to rail against the way everything and its brother is a DSL now, but that ship has probably already sailed. We can hope that as the underlying idea of the DSL – the problem shapes the tool – gains wider currency people will incorporate it in their practice and become better at their craft as a result. We old farts will continue to chunter about the young folks of today not realising what it was like when we had to whittle our own bits and write a DSL to build the parser that would let us build the DSL we really wanted. But maybe it’s good that people get starry eyed. It’s a step down the road; even if they never take the next one, at least they’ve moved in a good direction.

The moment you take a chunk of code and give it a name, you’re in the business of language design. Use your power wisely and the programming zeitgeist will beat a path to your door – especially if you have DHH’s seemingly instinctive marketing nous. Use it less wisely and you’re welcome to your ball of mud.

Updates

With his usual aplomb (and in the very first comment), chromatic pointed out a glaring cock up in the original wording of this post so I’ve tightened it up a bit.

1 I owe the vast bulk of my understanding of the guts of regex engines to reading the first edition of Mastering Regular Expressions, Jeffrey Friedl’s masterly explanation of the subject. I understand it’s only got better with subsequent editions3.

2 Oops, no it’s not. It’s “protein sequence in fasta format in which amino acids are represented by 1 letter codes. You can use it as a BLAST query but the options are endless. It is like a calling a comma separated text file as an Excel input only.” Thanks to Darked for pointing that one out to me in the comments. Remember kids, Google + a merely cursory knowledge of a domain will only get you so far. I’d maintain that fasta format can still be thought of as a domain specific language, but it has more in common with XML or HTML than, say SQL.

3 I just noticed the first edition was published in 1997 and I bought it as soon as it came out. I feel old.

DSLs, Fluent Interfaces, and how to tell the difference

Posted by Piers Cawley Thu, 15 Mar 2007 15:39:00 GMT

I’m getting heartily fed up of people banging on about Domain Specific Languages. It seems that every time someone writes a Ruby library that uses class methods, symbols and hashes reasonably sensibly they get delusions of grandeur and call the result a Domain Specific Language (or maybe an ‘embedded’ DSL).

In a sense, they’re right, but it’s a pretty compromised language simply because you’re stuck with the Ruby parser. Scheme and Lisp hackers probably look at (say) ActiveRecord and sneer. Heck, even Perl programmers have grounds for getting their sneer on

Now, before any Ruby programmers go getting on their high horse about Perl programmers being indisciplined louts, may I refer you to Getopt::Euclid, an alternative to Perl’s Getopt::Long library.

Getopt, or something like it, is pretty much a universal among programming languages. It’s the library that makes it ‘easy’ to write commandline programs with unix style switches. It’s often one of those functions that ends up taking an ugly argument string which defines all the possible flags your command could have. As interfaces go, it’s often actively user hostile – the argument string is a DSL, but it’s one that took its design cues from the notorious sendmail.cf.

So, Damian Conway fixed it. To use Getopt::Euclid you just import the library and then write your command’s documentation using Perl’s inline POD with a couple of extra Euclidean extensions and you’re done. Getopt::Euclid treats your documentation as its specification and builds its option parser from that.

Now that’s what I call a DSL. Entirely embedded in the Domain Specific (natural) Language of documenting a command line program.

Damian’s a genius at this sort of thing. Check out List::Maker, where he finds a little used part of perl’s syntax and wedges in a bunch of cunning ways of building lists, including something that looks remarkably like list comprehensions.

And this is in Perl 5; the version of the language that doesn’t have explicity support for syntax modification.

Another example of the kind of thing that’s possible without monkeying with the parser is the stuff Jifty (originally JFDI), in particular the way Jifty Schemas work.

What’s my point?

I’m not saying don’t take the time to make your interfaces ‘language like’. However, there’s a lot to be learned from the way other languages have approached the idea of the DSL or ‘little’ language. Implementing something like Jifty’s Schema’s in Perl is far from easy (though the techniques needed are getting better understood all the time) and involve ferreting around in dusty corners of an already arcane syntax, but the beauty of getting it right is that you simply don’t have to care about how its implemented. The neat bit, the bit that’s worth pinching is the syntax of the resulting DSLs…

Oh yes, and, while you’re about it, take a look at what Why the Lucky Stiff is doing with hpricot, definitely one of those libraries that goes out of its way to make life easy for its users.



Just A Summary