Just A Summary

Piers Cawley Practices Punditry

Martin Fowler's big mouthful

Martin Fowler is writing a book about Domain Specific Languages and, because you could never accuse Martin of a lack of ambition, he’s trying to write it in a reasonably (implementation) language agnostic fashion.

It’s fairly easy to write an implementation language agnostic book about old school DSLs, what used to be called little languages – there’s a fairly well established literature and theory to do with lexing, parsing and interpreting. These are all about algorithms, and algorithms are implementation language neutral by their very nature.

Where Martin has his work cut out for him is trying to talk about what he calls ‘internal DSLs’ and what I’ve been calling ‘pidgins’. These are the sorts of languages where you don’t write a lexer or parser but instead build a family of objects, methods, functions or whatever other bits and pieces your host language provides in order to create a part of your program that, while it is directly interpreted by the host language, feels like it’s written in some new dialect.

The Lisp family of languages can be said to be all about this. A good ‘bottom up’ lisp programmer will shape a language to fit the problem space, essentially building a new lisp which makes it easy to solve the problem at hand. Lisp’s minimal syntax, powerful macros and the way it blurs the boundary between code and data really support this style.

Once you move from Lisp to more ‘syntaxy’ languages, things get hairier. As Martin himself says

Another issue with book code is to beware of using obscure features of the language, where obscure means for my general reader rather than even someone fluent in the language I’m using. […] this is much harder for a DSL book. Internal DSLs tend to rely on abusing the native syntax in order to get readability. Much of this abuse involves quirky corners of the language. Again I have to balance showing readable DSL code against wallowing in quirk.

He’s dead right. When I’m thinking about writing a pidgin in Ruby for instance, my first thought is usually to start with some kind of tabula rasa object which I can use to instance_eval a block. That lets me start to shape my language by lexically scoping the change:

in_pidgin do

But, though it’s easy to illustrate what I’d do with my tabula rasa, the implementation is somewhat tricky, and the tricks needed are unique to Ruby.

That sort of construct’s not really available to someone trying to write a pidgin in Java or Perl. In Perl, there are other odd corners of the language that can be abused to good effect. Dynamic scoping can let you ‘inject’ methods into a block even though there’s no Perl equivalent to instance_eval, or you can do some quite staggering things with the otherwise really annoying Perl function prototypes. For instance, here’s part of a Jifty definition of a persistent object:

column title => 
       type is 'text',
       label is 'Title',
       default is 'Untitled post';

column body => 
       type is 'text',
       label is 'Content',
       render_as 'Textarea';

Doesn’t look much like Perl does it? But it’s parsed and executed by perl with no source filters or eval STRING in sight. And there’s no unsightly :symbols scattered about the place either come to that.

These things all work by making the language do something unexpected, and generally, the way to do that is by knowing your host language inside out and playing with it. One of Damian Conway’s more inspired moments in recent years was List::Maker, in which the good doctor managed to find a corner of Perl where he could wedge a proper old school, complete with full on parser to build the AST, Little Language right in the heart of Perl without it looking like he was taking a plain old string and interpreting it. So, having found this odd little corner, he proceeded to implement a remarkably neat tool for building complex lists that are beyond the capabilities of Perl’s .. operator.

@odds   = <1..100 : N % 2 != 0 >;

@primes = <3,5..99> : is_prime(N) >;

@available = <1..$max : !allocated{N} >

You may not think that’s all that sexy, but, and trust me on this, it’s just gorgeous. Yet more proof that Damian Conway is an (evil) genius.

Frankly, once you’ve seen the best of the pidgins available in Perl, some of highly praised ‘DSLs’ in Ruby start to look a bit ordinary. Ruby makes a great deal of stuff that a pidgin breeder needs to do really easy. In Perl it’s often rather hard with a huge amount of hoopage to deal with. But some of the things that are hard in Perl are impossible in Ruby.

Anyhoo… coming back to my point. I do find myself wondering if Martin’s bitten off more than he can chew in attempting to write a book that covers implementing pidgins without getting bogged down in the nitty gritty of individual languages. The problem he’s facing is that different languages don’t just have different quirks, they have different idioms too. What reads naturally in the context of a Ruby program will read very weirdly in, say Java or a lisp. Any patterns of implementation beyond broad (but important) strokes like “Play to your host language’s strengths” will surely end up as language specific patterns. Designing and implementing a good pidgin is hard. Doing it effectively means getting down and dirty with your host language and its runtime structures. And that’s not the sort of thing you can cover effectively in a language agnostic book.

Martin, if you’re reading this, good luck. I think you’re going to need it. I look forward to being proved wrong.

Published on Fri, 18 Jan 2008 17:19:00 GMT by Piers Cawley under , . Tags

If you liked this article you can add me to Twitter
  • Gravatar

    By Daniel Berger Sat, 19 Jan 2008 02:09:22 GMT

    I’d be curious to see how the Jifty example was implemented. Looks pretty good, actually.

  • Gravatar

    By she Sat, 19 Jan 2008 05:48:31 GMT

    hey piers, hope you dont mind a dissenting opinion about one thing (whereas I think the rest of the article is nice)

    It is about the word “pidgin”.
    I think the word is not good to address “internal DSLs”. The whole DSL area seems to confuse people. I read someone else claiming that BLAST is a DSL (but BLAST is just a simple, “stupid” dataset!)

    So the term DSL is already inflated due to various reasons (i myself prefer to define DSL as something from humans, for humans, and thus nicely readable but capable or “programmed” to do something useful), and now instead of referring as “internal DSL” someone is using “pidgin”.
    Now I dont at all think you should change your way to label this ;) but I want to present my point of view, that this adds to confusion in general about DSL and DSL usage.

    Anyway… I hope it won’t become like Monads in Haskell in the long run ;)

  • Gravatar

    By Mark Snyder Sat, 19 Jan 2008 11:24:55 GMT

    Hmm, odd nomenclature… most of the literature I have encountered has settled on “embedded DSLs.”

    This technique of building an AST under the covers is used quite often in the Haskell community. In Haskell, monads and phantom types are the “corners” of the language that makes this possible. And both are mechanisms which – to agree with you – are very specific to Haskell. Take a look at “Domain Specific Embedded Compilers” by Daan Leijen and Erik Meijer if you are interested.

    As for Martin Fowler, I appreciated some of his earlier writings on UML and Object Oriented Architecture, but I am increasingly getting the feeling that he is out of his depth when discussing programming language issues.

  • Gravatar

    By Piers Cawley Sat, 19 Jan 2008 20:48:54 GMT

    She said:

    I read someone else claiming that BLAST is a DSL (but BLAST is just a simple, “stupid” dataset!)

    Um… the person who described BLAST as a DSL was me. And I still stand by that. Just because a language isn’t (usually) executable doesn’t disqualify it as a language.

    As for why ‘pidgin’, it’s precisely to get away from the over inflation of the term DSL and an attempt to unpack the way that these things are an amalgam of the grammar of a host and vocabulary from the domain and the host language. I blogged about it back in August.

  • Gravatar

    By Was it a cat i saW Sun, 20 Jan 2008 01:16:16 GMT

    I agree with Mark: people are now using “embedded DSLs”. Why pollute the namespace with yet one more term that must be explained every time you use it?

  • Gravatar

    By Piers Cawley Sun, 20 Jan 2008 03:45:32 GMT

    Why use pidgin? Precisely because it doesn’t muddy the water around a disputed term like DSL – which is becoming so overused in certain circles as to become almost meaningless.

    If you don’t want to use the term, you’re welcome not to, but I shall continue to use it, if only because, once it’s been explained, it’s shorter to type than ‘embedded DSL’ and the temptation to write ‘DSL’ where I mean ‘embedded DSL’ won’t be there at all.

  • Gravatar

    By Daniel Berger Sun, 20 Jan 2008 07:23:53 GMT

    I vote for DSI – Domain Specific Interface. As chromatic often says, it’s just an interface. But it’s, you know, domain specific. And more descriptive than ‘pidgin’. Plus, you still get a nice, short acronym. :)

  • Gravatar

    By Piers Cawley Sun, 20 Jan 2008 15:22:10 GMT

    Except that sometimes that sells what’s being done short. Admittedly, something like ActiveRecord’s family of class methods isn’t much more than a Domain Specific Interface (and a pretty good one at that), but pidgins like the one used in Jifty’s Schema definitions end up building an Abstract Syntax Tree (or something very like it) which is interpreted by different tree walking Interpreters depending on whether it’s being used to build the database, generate accessor methods or automagically build HTML forms and that’s language technology that is. The tree may be being built in a somewhat unorthodox fashion, but the backend(s) have all the trappings of a language.

  • Gravatar

    By Serguei Son Wed, 24 Jun 2009 16:17:28 GMT

    Another example of your idea that writing a DSL often calls for exploitation of language corners is C++ Boost library. It makes apparently impossible things easy for the user through techniques (mostly template meta-programming) that look like magic to the uninitiated.

Comment Martin Fowler's big mouthful

Trackbacks are disabled

Powered by Publify – Thème Frédéric de Villamil | Photo Glenn