Just A Summary

Piers Cawley Practices Punditry

Domain Agnostic Languages

Windmill tilting time again I’m afraid. Blame chromatic and David A. Black.

What is it that characterizes domain specific languages? Before you trot out something like “Programs written in them read like domain experts talk”, take a look at some examples of code written in domain specific languages:

<pre> /(?ms)"((?>[^\\"]+)?(?>\\.[^\\"]*)*)/ </pre>

<pre> S3 R$* $: < $1 > R$+ < $* > < $2 > R< $* > $+ < $1 > R<> $@ < @ > R< $+ > $: $1 R@ $+ , $+ @ $1 : $2 R@ $+ : $+ $@ <@ $1> : $2 R$+ @ $+ $: $1 <@ $2> R$+ < $+ @ $+ > $1 $2 <@ $3> R$+ <@ $+ > $@ $1 <@ $2> </pre>

<pre> >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFE NELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLG SVTENVIKKSNKPVLVVKRKNS </pre>

If you’re reading this on the front page, try and work out what the ones you recognise do before you dip below the fold…

… Right, that’s scared the dilettantes off.

What do they do?

The first snippet is a regular expression. It should be an efficient matcher for double quoted strings which can span multiple lines and have backslash escaped internal quote marks. It’s not exactly easy to read because it’s optimized to reduce backtracking in failing cases. A simpler, but slower form would look like /(?ms)"((?:\\.|[^\\"])*)"/. I say ‘should’ because I wrote it from first principles without recourse to the manual except to check on the syntax of (?>...) – I’ve been using regular expressions for so long now, they’re getting pretty instinctive1. Every regular expression is a program written in a language whose domain is concisely expressing the rules to match strings. So long as what you’re matching doesn’t require you to worry about balancing bracket like things, a regex should suit you fine.

The second snippet is a sample of sendmail.cf ruleset, this is the one that canonicalizes email addresses into a form that the rest of the rulesets can work on. Once upon a time I could have taken you through it line by line and explain what it did, but the world has changed since then. Sendmail’s domain was routing email back in the days when there were a mllion and one different email address formats and networks. Who uses addresses like kremvax!ivanucbvax.UUCP@, @pdcawley@uk.ac.nott.cs@ or (if you’re feeling sufficiently evil) @pdcawley@uk.org.bofh.org.uk@ these days? Nowadays all email addresses look pretty much the same and routing is substantially less complicated, so tools like Postfix, Exim and qmail are far, far easier to configure because the domain has shrunk to the point where you can get away with a mere configuration file.

The last one’s a BLAST query2 which can be run against the genome of your choice. Again, if you’re a geneticist (or, more likely a geneticist’s computer), this makes perfect sense. If you’re not a geneticist however, it’s the next best thing to gobbledegook.

Searching for the odd one out…

What characterises Domain Specific Languages isn’t their readability, it’s their narrow focus on their domain. The sendmail config file’s the odd one out here because it turns out that you can implement a Universal Turing Machine in a sendmail.cf file file. While Turing completeness might be fun to prove, it can be problematic in a Domain Specific Language. Ask James Duncan Davidson about accidentally making Ant Turing complete some time – there’s a reason he’s run away to be a photographer you know.

Domain Agnostic Languages

General purpose languages like Smalltalk, Lisp, Perl, Ruby, Python or *spit* Java aren’t domain specific languages in the same sense (unless you reckon that ‘writing general applications’ is narrow enough to be their domain). Instead they’re domain agnostic. General purpose languages support programming ‘in the language of the domain’. So ActiveRecord’s combination of well designed class methods let you describe the relationships between database tables by describing the relationships between objects. Meanwhile in Perl, Jifty has similar capabilities and Seaside meanwhile morphs programming for the web into something that feels remarkably like programming for the desktop. Paul Graham would probably tell you that Arc does the same thing with Lisp (he was also the inspiration for Seaside).

The more malleable of these languages let the cunning programmer embed full blown DSLs, complete with their own syntax and semantics within them. Lisp’s macros are the most obvious example of this, but Smalltalk has some pretty amazing tricks up its sleeve too – check out Scratch or Etoys. As I understand it, you can have a different compiler for every browser window if you want. Compared to these languages, Ruby, Python, Perl et al speak the language of the domain with more or less thick accents.

Do I even have a point?

Well, sort of. It’s easy and tempting to rail against the way everything and its brother is a DSL now, but that ship has probably already sailed. We can hope that as the underlying idea of the DSL – the problem shapes the tool – gains wider currency people will incorporate it in their practice and become better at their craft as a result. We old farts will continue to chunter about the young folks of today not realising what it was like when we had to whittle our own bits and write a DSL to build the parser that would let us build the DSL we really wanted. But maybe it’s good that people get starry eyed. It’s a step down the road; even if they never take the next one, at least they’ve moved in a good direction.

The moment you take a chunk of code and give it a name, you’re in the business of language design. Use your power wisely and the programming zeitgeist will beat a path to your door – especially if you have DHH’s seemingly instinctive marketing nous. Use it less wisely and you’re welcome to your ball of mud.

Updates

With his usual aplomb (and in the very first comment), chromatic pointed out a glaring cock up in the original wording of this post so I’ve tightened it up a bit.

1 I owe the vast bulk of my understanding of the guts of regex engines to reading the first edition of Mastering Regular Expressions, Jeffrey Friedl’s masterly explanation of the subject. I understand it’s only got better with subsequent editions3.

2 Oops, no it’s not. It’s “protein sequence in fasta format in which amino acids are represented by 1 letter codes. You can use it as a BLAST query but the options are endless. It is like a calling a comma separated text file as an Excel input only.” Thanks to Darked for pointing that one out to me in the comments. Remember kids, Google + a merely cursory knowledge of a domain will only get you so far. I’d maintain that fasta format can still be thought of as a domain specific language, but it has more in common with XML or HTML than, say SQL.

3 I just noticed the first edition was published in 1997 and I bought it as soon as it came out. I feel old.

Published on Sat, 19 May 2007 18:52:00 GMT by Piers Cawley under . Tags , ,

If you liked this article you can add me to Twitter
  • Gravatar

    By chromatic Sat, 19 May 2007 19:41:31 GMT

    Did you mean to imply that sendmail.cf is in the language of the problem domain of routing e-mail?

    Domain-specific language — the domain-driven design or ubiquitous language kind — is a design and communication technique, not a technical artifact.

    Blurring the two into something that appears to be a technical artifact seems to me to lose the important communication aspects of domain-specific language shared between developers, testers, and customers.


  • Gravatar

    By Peter Bell Sat, 19 May 2007 20:15:55 GMT

    I think you’re right. We’ve all been in the business of language design ever since we started wrapping functions and passing explicit arguments (allowing for a simplistic subset of abstract grammars to be implemented).

    I think it is nice that people are now talking more explicitly about DSLs so they can make intelligent choices about concrete syntax based on their use cases rather than just assuming they should write an API, a DTD or a DB Schema based on what they are used to. Tooling is getting there for external DSLs, in-language DSLs are becoming better understood. Now all we need are some better answers to handle issues like evolving of collections of non-orthogonal DSLs over time. But if we didn’t have that to worry about, what would we do while our computers were writing our applications for us?!

    Nice blog.

    Best Wishes,
    Peter


  • Gravatar

    By Piers Cawley Sun, 20 May 2007 01:32:40 GMT

    I said:

    … the underlying idea of the DSL – programming in the language of the problem domain …

    So chromatic asked:

    Did you mean to imply that sendmail.cf is in the language of the problem domain of routing e-mail?

    Oh, hell. No I didn’t – it’s a language that attempts to ‘fit’ its domain. Its domination of the space so long seems to imply that it succeeded in that, at least for a while.

    Thanks for pointing that out though – I bet your authors love it when you do this to them during editing.


  • Gravatar

    By chromatic Sun, 20 May 2007 03:19:54 GMT

    I want to talk about software design in terms of its maintainability and suitability to the problem domain, and I think a large part of that comes from choosing the right symbol names.

    That’s independent of language and (with some respect to the differences between nouns and verbs) largely independent of syntax.

    Yet the syntax of real, recognizable DSLs such as in your examples varies wildly.

    I’m not sure that forcing everything into the everything-is-a-symbol key-value-block structure leads to good APIs, let alone good DSLs.

    Try writing Logo in such a fashion, for example. (Try writing good rhyming poetry in Esperanto for that matter.)

    Thus if the syntax of the Ruby pseudo-DSLs never varies from the syntax itself and if the symbols of DSLs do not need to read like mildly-annotated bare words in a natural language, I’m not sure exactly what the identifiable characteristics of a DSL really is anymore, and that’s really too bad, because both the notion of pervasive domain-appropriate language and the applicability of a domain-specific programming language used to be interesting, valuable concepts.

    (I’m not sure anyone likes my single-sentence “You may have undermined your point here” comments though.)


  • Gravatar

    By Darked Sun, 20 May 2007 06:07:06 GMT

    Small correction:


    >gi|2501594|sp|Q57997|Y577_METJA PROTEIN
    MSVMYKK

    is what it says it is: protein sequence in fasta format in which amino acids are represented by 1 letter codes. You can use it as a BLAST query but the options are endless. It is like a calling a coma separated text file as an Excel input only.

    D.


  • Gravatar

    By Paddy3118 Sun, 20 May 2007 06:07:21 GMT

    I am trying to find out what you answered to your own question “What is it that characterizes domain specific languages”, and rooted out “it’s their narrow focus on their domain”. I think it is a bit more than that: “It is their designers focus on a specific domain”. In the Digital Design and Verification world we have languages such as VHDL and Verilog that are devoloped for a task but are turing complete, and sometimes used for tasks that are usually the domain of more general purpose languages. I have seen complex interpreters written in VHDL to create testbenches for example. VHDL and Verilog are used by a mulltitude of tools from a number of vendors and open-source projects.

    My definition leads to awkward issues too: what of fortran? Its domain could be thought of as scientific programming.

    I know, how about “It is their designers focus on a specific domain, and their limited number of different implementations”.

    - Paddy.


  • Gravatar

    By Adrian Howard Sun, 20 May 2007 07:02:24 GMT

    You might want to take a look at Martin Fowler’s categorisation of DSLs into ‘internal’ and ‘external’ (see DomainSpecificLanguages ).


  • Gravatar

    By riffraff Sun, 20 May 2007 08:27:34 GMT

    I put one of my 2c on paddy’s opinion. The fact that you can build a regex engine in mathematica doesn’t make it a text processing language, yet it is quite sure that it was designed for a specific domain.

    Designed for something, not limited to it.

    On the other hand, I don’t think the number of implementations is important. As pointed out, regexen are the most popular DSL and there are a bunch of implementations.


  • Gravatar

    By Piers Cawley Sun, 20 May 2007 10:41:12 GMT

    Oh, chromatic, one of these days I’m going to have to put your name at the start of a sentence and offend either your sensitivities by capitalizing you, or mine by not doing so. Until then however…

    I suppose it comes back to the analogy of programmer as toolbuilder. The most powerful tools available to us aren’t the ones that already exist, but the ones we make ourselves and the ideas and patterns that enable us to shape the tool for the task. It hadn’t quite clicked with me that the problem with calling the Railish idioms DSLs is the narrowing of horizons that go with it. It is, however, encouraging to see initiatives like Rspec which, through judicious use of higher order messages enables a much more fluent environment for writing tests: I’d far rather write:

    describe ArticlesController, " GET index" do
    it “should be successful” do
    get “index”
    response.should be_success
    end

    if “should set @articles” do Article.should_receive(:find).and_return(mock(‘articles’)) get “index” assigns[:articles].should_not be_nil

    end

    than yet another method with an infinite number of underscores in its name and a sprinkling of assert_whatever methods. RSpec changes the language of testing very neatly and, for my money, feels Right as a result. It’s the sort of thing I could imagine Damian Conway doing in Perl…

    Sure, it would be nice to get rid of the extraneous do, but that’s a ruby artifact that’s next to impossible to work around and RSpec’s supposed to be rubyish.

    The state of the art does evolve, thank goodness.

    Paddy: I wouldn’t append the ‘limited number of different implementations’ part. Consider the sheer number of Regex and BLAST implementations. And that’s before we get onto counting yacc analogs.

    I think the boundaries start to blur when you get to Turing complete specialist languages though, but whilst inventing language taxonomies can be fun, I’m not overly sure it’s useful.


  • Gravatar

    By Piers Cawley Sun, 20 May 2007 10:45:49 GMT

    Darked: Thanks for that. That’ll teach me to pick a domain I’m barely even half smart about.


  • Gravatar

    By Edoc Wed, 23 May 2007 12:30:30 GMT

    The rebol language (http://www.rebol.com) is very simple, yet has well-integrated support for DSLs. The rebol people call it “dialecting”, but it’s the same idea. Some of rebol features are in fact accessed via DSLs, e.g. GUI construction. The following is a working example of a simple GUI:

    view layout [
        title “Sample GUI
        a: area
        b: button “Fetch URL” [
            r: read http://www.bofh.org.uk
            a/text: r
            show a
        ]
        button “Close” [unview]
    ]

    Here’s a Dr. Dobb’s Journal article on REBOL.

        http://www.ddj.com/184404172

    Note the DSL examples toward the bottom, illustrating that commands such as the following are perfectly valid rebol code:

        Turn on porch light at 7:30 pm


  • Gravatar

    By Robert Fischer Mon, 09 Feb 2009 18:00:37 GMT

    There’s something uniquely awesome about a blog post whose footnotes has footnotes.


  • Gravatar

    By Robert Fischer Mon, 09 Feb 2009 18:07:31 GMT

    Also, note that OCaml and other variant-type based languages are very well suited for DSL writing. Over on my “7 Actually Useful Things You Didn’t Know Static Typing Could Do: An Introduction for the Dynamic Language Enthusiast”, I snark that what people are calling “DSLs” are called “readable code” in OCaml-land. I also give an example of a date DSL which ends with things like “let example1 = date 5 Days Ago”.

    I’m also fascinated that RSpec came up as an argument for DSLs. RSpec would be an excellent DSL, except that it’s got quotes and commas and spaces in weird places. It actually is pretty unreadable as anything other than RSpec code (where you factor out the "describe"s and "it"s as structural), and even then it’s really not expressing much.


Comment Domain Agnostic Languages

Trackbacks are disabled

Powered by Publify – Thème Frédéric de Villamil | Photo Glenn