Just A Summary

Piers Cawley Practices Punditry

That's not fluent... 13

Posted by Piers Cawley Thu, 15 Mar 2007 23:33:00 GMT

So, I’m not a fan of static typing. It’s okay in the likes of Haskell which does type inferencing and generally goes out of its way to reduce programmer pain, but Java? C#? No ta. It’s awfully tempting to conclude that anyone who chooses to use those languages deserves to be pointed out and laughed at.

It’s especially hard to resist that temptation, when a C# blogger plays right into my hands by describing the following code:

<pre> Pattern findGamesPattern = Pattern.With.Literal(@"<div") .WhiteSpace.Repeat.ZeroOrMore .Literal(@"class=""game""").WhiteSpace.Repeat.ZeroOrMore.Literal(@"id=""") .NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore) .Literal(@"-game""") .NamedGroup("content", Pattern.With.Anything.Repeat.Lazy.ZeroOrMore) .Literal(@"<!--gameStatus") .WhiteSpace.Repeat.ZeroOrMore.Literal("=").WhiteSpace.Repeat.ZeroOrMore .NamedGroup("gameState", Pattern.With.Digit.Repeat.OneOrMore) .Literal("-->"); </pre>

as “a very nice regular expression wrapper which allows you to define a regex using a readable syntax exposed via a very elegant fluent interface”. It seemed so self-evidently silly that I read back in his blog to see if he was taking the piss. Depressingly, he appears to be serious. Ah well, I shall at least resist the temptation of supplying his name and URL.

Worse, he’s using this regular expression (Rendered in Perl’s /x style, which ignores whitespace and allows comments in the body of the regex)

<pre> qr/ <div \s* # can match the empty string! class="game" \s* id="(\d+)-game" (.*?) <!-- gameStatus \s* = \s* (\d+) --> /msx </pre>

to parse XML (one of the canonical no nos that one). Badly. For instance, it will match this drivel:

<pre> <divclass="game"id="0-game"<!--gameStatus=10--> </pre>

but it won’t match this perfectly valid xml:

<pre> <div id="1-game" class="game"> stuff <!--gameStatus=10-->

On reflection, that might not be matched for a couple of reasons – it depends whether Anything includes "\n".

All of which is beside my main point. The sheer wordiness of the ‘fluent’ regex wrapper serves merely to obfuscate the intent of the pattern. You are tempted to conclude that, if it has that many words in it, it must be correct. This isn’t so much fluent as effluent.

Meanwhile, in that bastion of ‘line noise’ that is Perl, Damian Conway’s Perl Best Practices recommends writing all but the most trivial of regular expressions using the a standard set of switches (msx) and using whitespace to break the pattern up into logical chunks with comments where necessary. Given that almost everyone uses PCRE nowadays, you can follow the same good practice in your language of choice.

Yes, the regular expression language is terse. Yes it can be opaque until you take the time to learn its grammar. But there aren’t that many rules to learn. The language isn’t complex, only some of the things that people use it for.

What is fluency then?

An interface isn’t fluent because it’s wordy. Fluency is about writing message protocols that make it easy for the user to solve her problem and clearly express her intent in the same bit of code. Frankly, plain old regular expressions are a damned sight more fluent in all their terseness than the above bad joke.

Comments

Leave a response

  1. Avatar
    Dave Cross about 12 hours later:

    Ah well, I shall at least resist the temptation of supplying his name and URL

    But by including a large quotation from the post, you make it pretty simple to track him down :-)

  2. Avatar
    Piers Cawley about 13 hours later:

    There you go, stating the obvious again.

  3. Avatar
    Garth about 21 hours later:

    I think the C# blogger should learn the true power of regex and its application rather than trying to wrap it into a clunky and verbose mush. It’s really scary that someone decided to do something like this, but I’m not surprised it came from the .Net world. Entertaining read. tho’

  4. Avatar
    Kris 1 day later:

    I recall reading that post the other day and wondering to myself: “Why would anyone write that much code just to run a regex?” new Regex.Match() seems much easier. You may be bit over the top with “It’s awfully tempting to conclude that anyone who chooses to use those languages deserves to be pointed out and laughed at.” but whatever. Trying to write desktop apps in ruby would probably deserve some pointing and giggling.

  5. Avatar
    Reinier Zwitserloot 2 days later:

    Idiots exist. News at 11!

    Seriously though – every other ‘java/ C#’ bashing post amounts to nothing more than taking the mickey out of some well-intentioned poor sap’s attempt at learning programming. Yes, the C# ‘library’ shown here is ridiculous, but what, exactly, does this prove?

    The problem is simply that C# and especially Java get taught a lot as first course in lots of computer science curricula, AND it’s the “go to” language for jobs. Want a job? Learn java, or C#/.NET. Maybe PHP. Definitely don’t learn ruby, haskell, python, lisp, or smalltalk, because you’ll be hard pressed to find any ordinary job request that lists those particular languages.

    End result: Your average programmer wannabe overwhelmingly flocks to java or C#. This, however, has nothing to do with the strengths or weaknesses of that language.

    Take, for example, your own ludicrous notion that THIS particular disaster is a good reason to “point and laugh” at statically typed languages. Excuse me? What the hell is wrong with you? I point and laugh at YOU for deigning yourself above an open mind.

    This kind of “reasoning” does nothing to further programming languages in general. For example, in explicit static languages (Like C# and Java), refactorings basically can’t possibly go wrong. It is trivial to ‘come from’ on any code block (figure out where the code block is being called from). These things are fundamentally undoable in many languages that you seem to like (python and any other all objects are dictionaries languages, like e.g. Javascript or ruby).

    Is that property worth all the supposed ‘pain’ of explicit static typing? I don’t really know. It depends on the project; for some projects I can’t live without it, for others there’s no appreciable benefit to the extra code analysis.

    But, perhaps, think about this, as closing thought: GWT (Google Widget Toolkit, a java to javascript compiler) is capable of determining which code needs to be ‘compiled’ into the javascript files with such fine granularity that the raw size of the JS files produced by it are pretty much impossible to beat. The reason for this granularity is simply because in java it is trivial to determine which code blocks do, and which code blocks couldn’t possibly be, run – which means you can import huge libraries and use only tiny fractions of them, with no appreciable impact on code size. Tell that to the dojo boys, who have had to resort to compilers of their own that create a custom monolithic version, which still depends in large part on yourself to specify what you do and don’t use.

    Long story short: Do not laugh at languages, lest people ignore you as yet another fanboy. Even (especially? given their status as everyone’s favourite toilet) java and C# have plenty of nuggets to learn from.

  6. Avatar
    Piers Cawley 2 days later:

    I believe that the Ruby/Cocoa bridge is getting better all the time, Adobe built Lightroom with Lua, Civilization IV is at least partially implemented in Python.

    Dynamically typed languages are definitely gaining traction on the desktop.

    Certainly Cocoa plays well with dynamic languages – after all, ObjectiveC is remarkably dynamic for a C-like language; you can declare pretty much everything in the id type if you want and, arguably, you should.

  7. Avatar
    Piers Cawley 2 days later:

    Thank you, Mr Zwitserloot for you considered response to what was, essentially, a throwaway rant.

    However, I am always amused to see people arguing that statically typed languages are preferable for refactoring reasons. They seem to forget that automated refactoring is a child of Smalltalk, a language that couldn’t be more dynamic if it tried. Sure, static typing helps, but it’s not necessary.

    The real issue I have with most statically typed languages is that their type systems are so bloody awful. Haskell’s been a real eye opener here. Static typing that gets out of the way. Lovely.

    As for why this particular disaster is something to beat C# with? Well, why not? It plays well to my particular prejudices.

  8. Avatar
    Reinier Zwitserloot 2 days later:

    Smalltalk pioneered pretty much the entire ‘Use an IDE and make sure it deserves that “I” in the name’ stuff, but given that smalltalk is such a nice language, that should be ample suggestion that they are on to something there.

    Python, JS, and Ruby make it intractably difficult to do a lot of the things you want to do, and for some reason Haskell isn’t popular enough, I guess, to warrant the work required to build such a thing.

    As far as java (and C#)‘s type system: Yes, it is bloody awful! So much more can be done there. I thought for a moment there Sun saw the light (generics is surely a move towards a type system that, ya know, represents types, instead of only a subset!) but current lack of traction on a NonNull marker for all types and the closure idiocy (CICE are the properly typed closure material, not BGGA closures. Google CICE closures for more info :-P) so there’s apparently a bit of a hole opening up here, and I don’t see C# filling it up.

    Opportunities abound!

  9. Avatar
    Piers Cawley 3 days later:

    My gut tells me that, out of Python, JS and Ruby, JavaScript’s going to be the easiest to get across the refactoring rubicon (Extract Method) with. I have a proof by implementation that you can cross it in Perl.

    I believe there is a refactoring tool for ruby that uses a customized interpreter to provide the necessary hooks, but I’ve not investigated very far there. I hear good things about the latest builds of Netbeans though.

  10. Avatar
    Jon Galloway 5 days later:

    Your previous post on fluent interfaces was pretty persuasive. I was disappointed that you ignored the main point of my post and provided a critique which only works when done out of context; I’d much preferred for you to tear up my more central message so we could get a better idea of what you’re saying.

    Here’s the original post:
    http://weblogs.asp.net/jgalloway/archive/2006/12/06/a-simple-example-of-a-fluent-interface.aspx

    The bit on regular expressions was little more than a footnote to point out some applications of fluent interfaces. My main example talked about chained image manipulations, an operation which doesn’t have a declarative expression language analogous to regex for pattern matching (at least, not in common use). I also mentioned Rhino Mocks, which exposes a fluent interface to mock setup. Your criticism isn’t applied to any of those uses, which is unfortunate because I think you’d make a much better point by attacking the concept rather than the execution.

    I think you could have gone farther in attacking the concept of a fluent regex builder; you just tore up the sample code I quoted which is built on top of a regex builder. It seems similar to dismissing the concept of e-mail because some e-mails contain poor grammar. While I understand that you are saying that the verbose interface may disguise logic errors, you’ve done little beyond pointing out the logic error in the regex. Pointing out an implementation bug may point out that the API is error prone, but definitely doesn’t invalidate the API. As an aside – I just quoted the regex sample to illustrate the idea, but it looks as if the author tested the regex against a block of HTML, so it probably “worked” based on an incomplete test which didn’t include incorrect matches.

    Off the subject of fluent interfaces and on to concept of regex builders: I think we’ll just have to agree to disagree there. I’ve written more than my share of regex’s since I was first exposed to them in AWK ten years ago. I very much like the concept and can hold my own when it comes to writing them. I think the language is very powerful but unnecessarily terse. Many developers consider regex to be a “write-only” language because it’s very difficult to discern the original developer’s intent. Another problem (and I realize this is due to my predisposition towards static typing) is that I don’t like encoding logic in a big old string. In concept, I’d rather put as much of the logic in logical constructs and just put string constants in strings, which is why I like the concept of Joshua’s regex builder. I wrote about that back in 2005: http://weblogs.asp.net/jgalloway/archive/2005/11/02/429218.aspx

  11. Avatar
    Piers Cawley 5 days later:

    I didn’t take issue with the bulk of the post because there’s nothing to take issue with (apart from an implementation suggestion I’ve just made in a comment there).

    Your claim that the regex builder is “readable syntax exposed via a very elegant fluent interface” just struck me as ludicrous. I fail to see how it’s supposed to help me divine the programmers intent any more clearly than a real regular expression.

    And, in a language like Perl or Ruby, a regular expression isn’t a big old string – it’s a regular expression. Once you start encoding regular expressions in plain old strings you end up with ugliness like "<div\\s*class=\"game\"\\s*id=\"(\\d+)-game\"(.*?)<!--gameStatus\\s*=\\s*(\\d+)-->" where a large part of the ugliness comes from the need to string escape everything. Thankfully it’s not as bad as dealing with escaping in a shell script, but it’s still not fun.

    As I pointed out in the original post, modern regular expressions are not your father’s regular expressions. It’s a small thing, but the ‘new’ x switch for Perl compatible regular expressions enable programmers to format regular expressions in logical chunks, with comments to explain intent where necessary. Or you chunk expressions into multiple explanatorily named variables and compose them.

    Or you can wait for Perl 6 Rules to get implemented.

  12. Avatar
    Jon Galloway 7 days later:

    Thanks, I see your point a lot better now. I’m used to working in a language which, though it has nice library level support for RegEx, sees the actual expressions as nothing more than strings of text (which, as you point out, need to be escaped). In that world, a builder abstracts the ugliness of long escaped RegEx strings. I see how that looks pretty stupid when you work in a language which sees RegEx’s as more than just a string.

    Microsoft’s been adding a lot of dynamic-ish features to .NET with LINQ; seems like a RegEx implementation would be in order.

  13. Avatar
    fasteez 7 months later:

    as said, the only “improvement” with the regexp “fluent interface” is that you end up with using ‘first class objects’ of your host language. True you lost some readability and shortness compared to perl regexp but you gain some grammar feature inside your ide as opposed to dealing with handcoded escaped string.

    The cool point with these fluent thing, like LINQ, is that it builds an internal representation of the “thing” you want to express, it’s like thinking your programming in terms of AST. So your user can build embedded expressions easily.

    my 2 cents

Comments



Just A Summary