A Handy Builder Pattern
I'm working on a web service, and that means that I need to build lots and
lots of mildly different looking HTTP requests with various combinations of
headers and requested URLs. The camel's back got broken this morning when I
realised I didn't want to be writing a method called
ssl_request_from_uk_with_bad_cert, which builds me an HTTP::Request with a
particular combination of headers, that I can use with Plack::Test to
test our webservice. The method name describes what's wanted, but the code is
sopping wet and in desperate need of DRYing up.
My first cut was to write a simple request builder:
package RequestBuilder;
use Moose;
with 'MooseX::OneArgNew' => {
type => 'HTTP::Request',
init_arg => '_req',
}
has _req => (
is => 'ro',
isa => 'HTTP::Request',
handles => [qw(header headers)],
);
sub ssl_request {
my $class = shift;
$class->new(HTTP::Request->new(
GET => 'https://localhost/',
[
SSLSessionID => 'deadbeef',
],
);
}
sub from_uk {
my $self = shift;
$self->header('X-IP-Is-UK-Combined' => 'yes');
return $self;
}
sub with_bad_cert {
my $self = shift;
$self->header('SSLClientCertStatus' => 'NoClientCert');
return $self;
}
sub final {
my $self = shift;
$self->_req;
}
Now I have a builder, I can compose meaningful fragments to build a final request. So I might write:
my $r = RequestBuilder->ssl_request
->from_uk
->with_bad_cert
->final;
But things got sticky when I wanted to to take a valid request and bend it out of shape to ensure that it failed correctly. I was writing things like:
use RequestBuilder;
sub valid_request {
my($self, $is_final) = @_;
my $r = RequestBuilder->ssl_request->from_uk;
return $is_final ? $r->final : $r;
}
test "we reject requests from outside the UK" => sub {
my $self = shift;
my $r = $self->valid_request(0)->from('Turkmenistan')->final;
...
};
which works, but is uglier than a very ugly thing indeed. What I wanted was some way of having the finalization magically happen at the point of use but have some way of extending any intermediate results. The interface I came up with looks like this:
use RequestBuilder;
sub valid_request {
build_request {
$_->ssl_request->from_uk;
}
}
test "we reject requests from outside the UK" => sub {
my $self = shift;
my $r = build_request {
$self->valid_request->from('Turkemenistan');
};
...
};
When the builder is returned from the outermost build_request block, it gets
finalized.
"How does that work then?" I hear you ask. It's quite simple, once you know about Moose::Exporter, old fashioned perl prototypes and the Tao of dynamic scope. We just add something like the following to the end of our RequestBuilder:
our $building;
use Moose::Exporter;
Moose::Exporter->setup_import_methods(
as_is => ['build_request'],
);
sub build_request (&) {
my $block = shift;
my $builder = do {
local $building = 1;
local $_ = __PACKAGE__;
$block->();
};
return $building ? $builder : $builder->_req;
}
So, build_request has a & prototype, which means it takes a block as its
first argument. Perl treats an initial prototyped block argument as slightly magical and
doesn't require the use of the sub keyword (though you can use it if you
want). It then uses a do block to dynamically set $building to 1 and $_
to be the current __PACKAGE__ name (because $_ is shorter to type
than RequestBuilder) before calling the block to get a builder. Then it
checks whether we're still building. If we are, it returns the builder. If we
aren't, then the block was the outermost block, so build_request returns the
built request.
I'm not quite ready to extract this pattern into a parameterized role - I need to make sure it's robust enough, but it's certainly something to think about when you next need to make a builder for your tests.
Asynchronous Streams
In Higher Order Javascript, I introduced Streams and showed how to use them to implement a
lazy sort. I think that's neat all by itself, but it's not directly useful in
the asynchronous, event driven execution environment that is the average web
page. We'd like a structure where we spend less time twiddling our thumbs as
we wait for force to return something to us.
Non blocking streams
What if we change the protocol of our stream to something more asynchronous?
Obviously, we'd still have a head element which is immediately available and
some kind of promise to compute the next stream. But rather than promising to
return a new stream, the promise of a non-blocking stream is a promise to
call a function we supply with the value it computes. Here's a
CoffeeScript implementation of what we're
talking about:
the_empty_stream =
is_empty: true
continue_with = (b) -> window.setTimeout((=> b()), 0)
class NonBlockingStream
constructor: (@head, promise) ->
this.force_into = (block) ->
if promise.length == 0
continue_with -> block promise()
else
continue_with -> promise block
is_empty: false
I've written NonBlockingStream to work with both 0 argument functions (like
the promise of an ordinary stream) and with 1 argument promises that will be
responsible for calling their block when appropriate. force_into doesn't
block by virtue of the fact that we execute our promise using
setTimeout - a timeout of 0 milliseconds doesn't mean execute the block
immediately but asks for it to be executed as soon as possible. The fat arrow
(=>) used to build the function passed to setTimeout ensures that the
code will be executed with this bound to the stream instead of the global
window object.
This new non blocking stream is a much better citizen on a webpage. It yields to the event loop at every opportunity and gets out the way of other, possibly more important events. But so far we know how to use it for things like finding primes or the top 5 entries of 1000. Not exactly useful in the browser…
Streams in the real world
Listen to any web usability guru for more than about ten minutes and they'll tell you that pagination is evil. If you have a resource (say a blog index page) that logically should have 1000 entries on it, then breaking it up into multiple pages is a sin. The reader should simply be able to scroll through all 1000 entries. But most users don't scroll through every entry, and rendering 1000 entries is time consuming. Sites like Google reader and Twitter solve this by doing 'just in time' fetching.
Suppose we wanted to re-engineer this blog to use the endless page pattern, then I might think of using streams. We'd like to serve up an index page that sets up all the headers and navigational stuff, but which populates its articles via an unbounded stream of articles. Let's say that an article looks something like this:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>The current article</title>
<link rel="next" href="the-next-article.xhtml" />
</head>
<body>
<article>
<h2>The current article</h2>
<p>Yada yada yada...</p>
</article>
</body>
</html>
You can think of this as a kind of stream. The 'head' of the stream is the
contents of the html body tag, and the promise is the <link rel="next" ... /> tag in the head. Let's write a function which, given a document like
this will make us a stream. We'll use jQuery because, well, why not?
$ = jQuery
doc2stream = (data) ->
new NonBlockingStream $("body", data), (block) ->
next = $("head link[rel=next]", data).attr('href')
if next
$.ajax
url: next
success: (data) -> block(doc2stream data)
dataType: 'xml'
else
block the_empty_stream
Note that our promise doesn't call the block it's forced with immediately. Instead it fires off an asynchronous request for the next article in the stream with a success callback that (finally) calls the block.
Putting it together
The trick now is to get the stream up and running from our article index. Let's assume we have an initial page along the lines of:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Some blog articles</title>
<link rel="start" href="the-first-article" />
<script
type="text/javascript"
src="http://code.jquery.com/jquery.min.js"></script>
</head>
<body>
<header>
<h1>Some blog articles</h1>
</header>
<section id="articles">
<footer>
<p>If you can see this, I'm probably fetching a real article</p>
</footer>
</section>
<script
type="text/javascript"
src=".../asynch-fetcher.js"></script>
<footer>
...
</footer>
</body>
</html>
This is where we shove all the navigational bits and pieces of our blog, the links to atom feeds, sidebars with associate links and whatever. But it's devoid of content. We need to fix that by finishing off our asynch-fetcher.coffee1. First, we need to setup our stream:
articles = new NonBlockingStream '', (block) ->
$.ajax
url: $("head link[rel=first]").attr('href')
success: (data) -> block doc2stream data
dataType: 'xml'
We can't simply call doc2stream because that expects to find the link to the
next article in a [rel=next] link in the head, but we're using a 'first'
link here. So we make a stream with the empty string a as a dummy head and a
promise to fetch the article linked to by our [rel=first] and turn that into
a stream via doc2stream
Next we need a function to update the articles variable and extract the
article element from the head of a stream and insert it in our #articles
section just before the footer. Once this is defined we force the first
article into it.
show_next = (stream) ->
articles = stream
$("#articles footer").before(
$("article", stream.head).clone().addClass("last-art")
)
articles.force_into show_next
We also arrange for show_next to add a last-art class to the newly
inserted article, which we'll use as a target in the watcher function we set
up to handle fetching new articles as they are needed:
$.fn.is_in_view = () ->
$(this).position().top < ($("body").scrollTop() + $(window).height())
watcher = ->
last_art = $(".last-art")
if last_art.is_in_view()
last_art.removeClass('last-art');
articles.force_into (str) ->
unless str.is_empty
show_next str
window.setTimeout watcher, 100
else
window.setTimeout watcher, 100
window.setTimeout watcher, 100
Our heuristic for judging when to fetch the next article is simple: if the
article tagged with the last-art class is in the viewport, then it's time to
go about fetching the next one. This assumes that our writing is compelling
enough that by the time the reader gets to the bottom of an article, the next
one will have been succesfully fetched. This may be an optimistic heuristic,
but we're all about "for the purposes of illustration" here.
To get this to work, we add a simple is_in_view method to jQuery.fn. This
tests if the selected element's top is placed higher than the bottom of the
viewport. With that in place, we can write watcher which checks if the last
article is in view. If it when it is, watcher removes the last-art marker class and
kicks off the process of fetching the next article. We use window.setTimeout to
ensure that we keep fetching next articles as long as their are articles to
fetch and our reader is reading them.2
Notes
-
A similar stream could be set up for each article's comments, after which
we might find that we should parameterize
show_nextandwatcherin some fashion. - Real world use would probably involve serving up a few articles in the body of a blog front page - if only so that Google has something to index. tag indices or search results could be served up empty and populated on demand.
- I've not (yet) redone this blog to use this pattern, but I've tested the code presented here and it does work.
-
For caching purposes, it may better for searchs and the like to return
a small lump of JSON with a
headlink to the statically cached article document and apromiselink to the next lump of JSON in the results. If we write doc2stream right, it should be possible to completely isolate the rest of the page from this decision, which seems like a win to me.
Higher Order Javascript 9
To my surprise, several people have asked for the slides from my Øredev talk on Higher Order Javascript, and I’ve followed my usual practice of saying “Sorry, no”. Slide decks are a terrible teaching medium – they’re fine if they come with the presenter, but if they contain enough information to read as if they were a book, then I’m prepared to bet that they made a terrible presentation. Good presentations have a synergy; slides illustrate what the speaker is saying and neither the speech nor the slides should really stand alone. After all, if either could, why bother with the other?
But, the requests mean that some people found the information useful, so here’s the content (or something like it) written up as a blog post instead.
Higher order what?
Yes, yes. I know. I’m a Perl programmer. What am I doing writing about higher ordered javascript? In part it’s because Higher Order Javascript was the title of the talk that Giles Bowkett was due to give before he had to bale out and, while I’m sure I could give a Higher Order Perl talk, I doubt that it would have gone down particularly well in the webprogramming track of a conference that was heavily slanted towards Java, .NET and Agile. Nowadays Perl gets even less respect than Javascript and I simply wasn’t prepared to go kicking that particular stone uphill with a bunch of programmers who made their minds up on that particular score ages ago.
Anyway, higher ordered programming is language agnostic. So long as your language has first class functions (and ideally lexical closures) then you’re good to go. I first learned about higher order functions from Mark Jason Dominus’s Higher Order Perl talk when he was still calling it “Stolen Secrets of the Wizards of the Ivory Tower” and some time before he’d written the book on higher order programming in perl. In that talk, Dominus pointed me towards Abelson and Sussman on the Structure and Interpretation of Computer Programs (one of the great texts on programming – everyone who is serious about their craft should read it). Since then I’ve mucked about in Smalltalk, Ruby, Haskell, Scheme, Javascript and Clojure as well as Perl and higher order techniques had stood me in good stead in all of them.
Before we start
There will be code snippets here which I don’t pretend are written in what is normally considered good javascript style. In particular there’s no error checking and variable names are short and almost devoid of meaning. The short variable names help avoid line wrapping headaches within the blog format and the lack of error checking is to enable us to focus on the higher ordered techniques that we’re interested in. In the real world, we don’t live on the happy path and you are expected to think before you code. And write tests.
I do try to follow the Haskell convention that a list or array of stuff will be called something like xs and x will the variable used to hold the current element of xs as we iterate over it.
I’ve not seen that in Fowler…
Higher ordered programming is just another way of removing repetition from your program by extracting the ‘shape’ of an algorithm into a separate function. In the presentation I asked for a show of hands from people who’d done higher ordered programming before. A scattering of hands went up. Then I asked who’d used jQuery’s $.each(function {...}) and most hands in the room went up. Which means that most of the people in the room had used higher order programming and never realised it.
If you look at it right, $.each is just the result of applying the Extract Method refactoring on a for loop in a way that’s impossible to do in, say, Java. Here’s a barebones version of $.each
each = function (xs, f) {
var i, len, each;
for(i = 0, len = xs.length; i < len; i++) {
f(xs[i]);
}
}The jQuery each does rather more than this, but at it’s core it’s about turning an iterative loop into a function that takes a collection and a callback function which is applied to every element in the collection. What could be easier?
Higher Leverage Javascript
What Javascript’s designers thinking? Seriously, if you’re going to provide a lovely language with first class functions, why on earth would you want to spell the keyword that makes a function ‘function’? Lisp is stuck with lambda because, well, because it’s a bloody old language and they didn’t know any better then. Arguably it’s why they ended up with the macro capabilities they did – anything to avoid spelling out lambda every time. Anything longer than Lambda’s just silly. So, for the rest of the examples I’m switching to CoffeeScript. There’s a crib sheet over on that site which should allow you to work out what’s going on, and if you take away nothing else from this article than “Investigate CoffeeScript”, you’ll be ahead of the game.
Refactoring with higher order functions
Consider sum and prod below:
sum = (xs) -> r = 0 (r = r + x) for x in xs r prod = (xs) -> r = 1 (r = r * x) for x in xs r
Notice how they both have the same shape. They both set an initial result and then, for each element in the collection, the intermediate result with an element of the collection using an operator and, when they’ve finished iterating over the collection, they return the last value of the intermediate result.
Let’s apply Extract Temporary a few times to make the repetition even more obvious:
sum = (xs) -> init = 0; f = (a,b) -> a + b r = init (r = f(r,x)) for x in xs r prod = (xs) -> init = 1; f = (a,b) -> a * b r = init (r = f(r,x) for x in xs r
Now we can just Extract Method and replace the repetition:
inject = (i, f, xs) -> r = i (r = f(r,x)) for x in xs r sum = (xs) -> inject 0, ((a,b) -> a+b), xs prod = (xs) -> inject 1, ((a,b) -> a*b), xs
If we were writing this in a language that didn’t have an OO feel to it, we’d probably call the extracted function foldl, but the OO languages that already have it, it tends to be called inject or, in Smalltalk inject: into: and implemented as part of the enumerating protocol of a collection. Smalltalkers will tell you that inject:into: is the fundamental method for enumerating a collection, pretty much everything else in the protocol can be implemented in terms of it (so it’s a good idea to make it as fast as you possibly can). In Ruby, meanwhile, inject is implemented in terms of each, which seems the wrong way around for me. Here’s a few typical functions for messing with collections, implemented in terms of inject:
grep = (pred, xs) -> inject [], ((rs, x) -> if pred x then rs + [x] else rs), xs map = (f, xs) -> inject [], ((rs, x) -> rs + [f x]), xs each = (f, xs) -> inject undefined, ((_, x) -> f x; _), xs
I see repetition!
Have you noticed it? Everything written in terms of inject has the form:
something = (..., xs) -> inject ..., xs
If we weren’t higher order programmers, we should probably just shrug our shoulders and give it up as a bad job, but we are higher order programmers, and we’re working in a language that allows functions to return functions. We’d like to be able to write:
sum = injector 0, (a,b) -> a + b
where injector is a function like inject but which returns a function of one argument which will iterate over the supplied collection.
We could write injector directly, but where’s the fun in that? Instead let’s write:
with_list = (f) ->
arity = f.length
(args...) ->
if arity > 0 && args.length != arity - 1
throw "Oops!"
(xs) -> f args..., xsSo, with_list is a function which takes a function, f, of n arguments an returns a function of n – 1 arguments (args) that returns a function of one argument (xs) that applies f to a list of arguments made up of appending xs to args.
So that’s code that operates on code to write code that writes code. Confused? Don’t worry – it gets worse.
Once we have with_list we can dispose of sum and prod by writing:
injector = with_list inject sum = injector 0, (a,b) -> a + b prod = injector 1, (a,b) -> a * b
It doesn’t help us out with grep, map and each, but I’m not feeling clever enough to fix that at the moment so I’ll leave it as an exercise for the interested reader.
Functional vs. OO Style.
So far, I’ve been writing everything in a very functional style. If I’m honest, I prefer it that way for a lot of things, but I’m aware that I’m not in the majority on this front and some people would rather see inject and co as methods on objects like Allan Kay intended. So let’s see if we can write something which will allow us to transform our functions into methods. Essentially, we need to get rid of the last argument (xs) and make the functions operate on this instead.
We already have with_list which transforms a function on n arguments into a function of n – 1 arguments. Maybe we can write with_this like so:
with_this = (f) -> (args...) -> f args..., this
And, because I’m feeling mischievous, why not write:
Function.prototype.to_method = with_this with_this
And then we have:
Array::sum = sum.to_method() Array::prod = prod.to_method() Array::inject = inject.to_method() Array::grep = grep.to_method() Array::map = map.to_method() Array::each = map.to_method()
Hopefully it’s obvious what’s going on here. If it isn’t, you might want to reread everything so far and press on when you’ve got the hang of it. Remember, higher order programming arises from applying the following principles:
Principle 1: Functions can take functions as arguments
Principle 2: Functions can return new functions
That’s all there is too it. Everything else is just working through the implications. Just remember that if counter_from = (n) -> () -> n = n + 1, then c1 = counter_from 1 and c2 = counter_from 2 are two different functions that do not share the same n. Easy, no?
Right, now you’ve got that straight, let’s move on to
Memoization
Let us suppose that you are a naïve mathematician who has decided to implement a function to find the nth number in the fibonacci sequence. You know that Fib0 is 0, Fib1 is 1 and that Fibn is Fibn-1 + Fibn-2 otherwise. So you write:
fib = (n) ->
switch n
when 0 then 0
when 1 then 1
else fib(n-1) + fib(n-2)Obvious. Everything’s fine for small values of n, but it takes an age, and an awful lot of memory to find Fib50. What can be wrong.
Because you know something about higher order functions, you write:
calls = 0; counted = (f) -> (args...) -> calls += 1 f args... fib = counted fib
You worry a little about that free variable, calls, but you’re busily debugging and you’re going to throw it away once you’ve worked out what’s going on. Instead you run a test program:
calls = 0;
print "#{fib 30} called 'fib' #{calls} times\n"
calls = 0;
print "#{fib 31} called 'fib' #{calls} times\n"And you are shocked when the output is:
832040 called 'fib' 2692537 times 1346269, called fib 4356617 times
Clearly you do not have an efficient algorithm.
You could, at this point, sit down and think and come up with a more efficient algorithm (which, frankly, isn’t all that hard in the simple case of finding fibonacci numbers, but which can get harder in the case of, say, the Ackermann function). But, dammit, you’ve reduced the problem of finding Fibn to adding together two numbers you’ve already found, why not just make the computer remember that?
memoize1 = (f) ->
results = {}
return (arg) ->
if arg of results
results[arg]
else
results[arg] = f arg
fib = memoize1 fibAnd now when you run your test you get:
832040, called fib 31 times 1346269, called fib 1 times
I think we can safely say that that’s an improvement. And it’ll work for any function of one argument, where the argument is of a type that can be used as a key in a Javascript dictionary. Hmm… that’s possibly not as wide as we’d like.
However, usually when we go to memoize a function, we know something about that function and its expected arguments. If we can write a function to normalize the function arguments into a valid key, then we can write a more general memoize that takes the function to memoize and an optional normalization function:
memoize = (f, normalize) ->
results = {}
normalize ||= (x) -> x
(args...) ->
key = normalize args...
if key of results
results[key]
else
results[key] = f args...Of course, given enough thought and/or access to Wikipedia, we could well come up with the closed form function to find Fibn
fib = (n) ->
Math.round(
Math.pow(
(1+Math.sqrt 5) / 2,
n
)
/ Math.sqrt 5
)But for sheer readability, I’ll take the memoized form.
I promise to pay the bearer on demand…
Suppose you have 1000 things which are sortable, and you want to find the first ten. You could sort the entire list and just grab the first ten elements of the sorted list. But that’s an O(nlog2n) algorithm. What if we had a means of only sorting the first m elements and then stopping (or pausing)?
We do. Or we will. And to do this, we’re going to use a functional data structure called a Stream.
If you’ve spent any time around functional programmers, you will have come across a fundamental data structure called the Pair, which is usually used int list structure. A Pair is simply a pair of pointers. When pairs are used in what’s called list structure, the first (head, or car) pointer points to the value of the cell, and the second (tail or cdr) points to the next element in the list, or to a special value called nil or ‘the empty list’. Essentially a singly linked list then. When you’re following most introductions to lisp, the pair is pretty much all you’ll need to use because given that simple structure, you can build pretty much any structure you desire. Real world lisp programmers other, more specialised (faster) data structures available, but anything new or complicated tends to get built with pairs (including the data structure we’re actually interested in right now).
A stream is like a pair in that it has a head value and a tail. But a stream’s tail isn’t another stream, it’s a promise to compute another stream. So the head is akin to a lump of silver or gold – something with inherent value – and the tail is like a bank note, which is simply a ‘promise to pay the bearer on demand’ with another stream.
Show me the code!
This is all a bit abstract, so let’s look at a stream implementation.
the_empty_stream =
is_empty: true
take: (n) -> []
class Stream
constructor: (@head, p) ->
tail = p
forced = false
this.force = ->
if forced then tail
else
forced = true
tail = tail()
take: (n) ->
if n == 0
[]
else
[this.head].concat this.force().take n - 1
is_empty: false
force = (s) -> s.force()
take = (n, s) -> s.take n
is_empty = (s) -> s.is_emptyThere you go. Admittedly, force is a little tricky. Calling stream.force() makes the stream evaluate the promise to calculate the tail, and returns that value. It’s written the way it is so that, once we’ve evaluated the promise function, we drop our reference to it, which should make our memory usage a little more efficient.
None of this makes it clear how we’ll actually use streams, so let’s add a few utility functions:
take_stream = (n, s) ->
if is_empty(s) || (n == 0)
the_empty_stream
else
new Stream s.head, -> take_stream n-1, force s
filter_stream = (pred, s) ->
if is_empty s
the_empty_stream
else if pred s.head
new Stream s.head, ->
filter_stream pred, force s
else
filter_stream pred, force s
map_stream = (f, s) ->
if is_empty s
the_empty_stream
else
new Stream f(s.head), ->
map_stream f, force s
cat_streams = (s, t) ->
if is_empty s
t
else
new Stream s.head -> cat_streams force(s), t
print_stream = (s) ->
if is_empty s
return
print "#{s.head}\n";
t = force s
if t.is_empty then return
else print_stream t
Array::as_stream = ->
switch this.length
when 0 then the_empty_stream
else
[x,xs...] = this
new Stream x, -> xs.as_stream()
array2stream = (xs) -> xs.as_stream()It doesn’t look like much so far, but there’s more than enough to solve our top ten out of a thousand problem.
First let’s build ourselves an array of 1000 integers chose at random.
make_rands = (ceil) -> new Stream Math.floor(Math.random() * ceil), -> make_rands max big_list = take 1000, make_rands 1000
The function make_rands returns an infinite stream of random numbers and we take the first 1000. What we want to do now is a partial quicksort. First, we write qspart which partitions an array of integers into three arrays. All the members of the original array that are less than its first element, all the members that are equal to it and all the members that are greater than it.
qspart = (xs) ->
[p, ys...] = xs
inject(
[[], [p], []],
((bins, each) ->
i =
if each < p then 0
else if each > p then 2
else 1
bins[i].push each
bins
),
ys
)This would turn the list [2,3,5,2,1,0] into [[1,0],[2,2],[3,5]]. Now we just have to work out how to make that into a sorted stream.
make_sorted_stream = (xs) ->
switch xs.length
when 0 then the_empty_stream
else
[ls,[p,ps...],hs] = qspart xs
cat_streams(
make_sorted_stream(ls),
new Stream p, ->
cat_streams(
ps.as_stream(),
make_sorted_stream(hs)
)
)We’re partitioning the list into a list of those less than the pivot (ls), a list of elements equal to the pivot ([p,ps]) and a list of those greater than the pivot (rs). Then we build a stream by concatenating the sorted stream built on ls with stream whose head is the pivot and which promises to supply a stream made by concatenating the stream made from all the other elements equal to the pivot with the sorted stream made on rs. We’re reasonably confident that this should terminate because the sorted stream on an empty array is the empty stream, and every sub list made by qspart is shorter than the list in its argument.
If we wrap qspart to get some kind of trace then get the top 3 entries from a list of 10 random numbers, like so:
qs = ((f) -> (xs) ->
res = f xs;
print "# qspart([#{xs}] -> #{res[0]}|#{res[1]}|#{res[2]}\n"
res)(qs)
print_stream(
take_stream(
3,
make_sorted_stream(
take(10, make_rands 50)
)
)
)We see output like:
# qspart([41,7,6,13,43,47,23,15,34,22]) -> 7,6,13,23,15,34,22|41|43,47 # qspart([7,6,13,23,15,34,22]) -> 6|7|13,23,15,34,22 6 # qspart([13,23,15,34,22]) -> |13|23,15,34,22 7 # qspart([23,15,34,22]) -> 15,22|23|34 # qspart([15,22]) -> |15|22 13
Which gives us some idea of the shape of the computation. Notice how we only call qspart when necessary and we’re doing the minimum amount of work needed to get the first three entries from the sorted list.
Higher Order Javascript in the Real World
Now you know what higher order programming looks like, you’ll start seeing it everywhere in libraries like jQuery, YUI and Node.js. I’ve not covered what’s called ‘continuation passing style’, but it’s almost impossible to do asynchronous stuff without it.
So, don your HOF sensing glasses and go play.
One last thing
Why does this stuff get called higher order programming?
I think the idea is that functions that manipulate functions are of a higher order than functions that manipulate other ‘lower’ data structures. I tend to think that it belongs in a class with ‘metaprogramming’ – words and phrases that may lead one to infer that the meta- or higher order programmer is doing something inherently harder than the bread and butter programming indulged in by all those plain old programmers.
When it comes down to it, that higher order functions are just functions and metaprogramming is just programming and the sooner we learn that, the faster we can improve our programming practice.
Update
For those of you who are finding the CoffeeScript a wee bit daunting (I personally like the lack of noise, but your mileage may vary), there’s a thread on sitepoint which translates many of the examples here into more idiomatic Javascript than the CoffeeScript compiler can manage. You may find it useful.
I’ve written a followup post on asynchronouse streams which discusses using higher order techniques in the browser window to do something a little more ‘real’ than the examples shown here.
Class decomposition and a handy delegation pattern 3
There’s something satisfying about reaching the point when you can’t decompose an object any further and all your methods are tiny and do one thing – it’s especially gratifying when you learn something new in the process. Sadly, it doesn’t happen as often as I’d like, there’s usually annoying bits and pieces where you have to placate the language in some fashion that breaks the flow of what you’re writing.
As I get a better handle on the way MooseX::Declare has changed Perl, I’m finding I have to do much less in the way of placation.
Here’s an example. For context, I’m writing a traffic shaping tool. The basic client interface needs to look something like:
$policy->current_weight # => a percentage between 0 and 100
Not much of an interface really. The requirements state that we should be able to specify weights with 15 minute granularity for every day of the week. Our problem becomes one of mapping from a time to a number between 0 and 7 (days) * 24 (hours) * 4 (quarters) - 1 and looking up the weight in an array.
Here’s my first cut:
use MooseX::Declare;
class WeightVector {
use DateTime;
has vector => (
isa => 'Array[Int]',
is => 'ro',
required => 1,
);
method current_weight {
my $now = DateTime->now;
my $offset = ($now->wday_0 * 7 * 24 + $now->hours) * 4 + int($now->minutes / 15);
return $self->vector->[$offset];
}
}Which is, I suppose, perfectly respectable. However, current_weight isn’t filling me with delight. First it finds the current time, then it converts the time into an offset, then it uses the offset to lookup the weight in the vector. Let’s introduce a method to find the weight at a specific time"1":#decomp-motivation, the relevant code becomes:
method current_weight {
$self->weight_at(DateTime->now);
}
method weight_at (DateTime $time) {
my $offset = ($now->wday_0 * 7 * 24 + $now->hours) * 4 + int($now->minutes / 15);
return $self->vector->[$offset];
}And again, we could rest here, but again, we’re doing two things. We’re converting from a time to an offset, then we’re looking up the value in the vector. Type conversions tend to happen again and again, so it’s good if we can specify them separately. We could write a time_to_offset helper method, but we’re in Mooseland now; there’s a better way. Let’s introduce a formal Moose type and define a coercion for it. Here’s the type definition stanza of the code. I’ve taken the opportunity to add types which do bounds checking for the vector as well, while I’m about it.2
use MooseX::Declare;
class WeightVector {
use DateTime;
use Moose::Autobox;
use MooseX::Types -declare => [qw(SlotOffset VectorOfWeights PercentageInt)];
use MooseX::Types::Moose qw(ArrayRef Int);
use constant TOTAL_SLOTS = 7 * 24 * 4;
BEGIN {
subtype PercentageInt,
as Int,
where { 0 <= $_ && $_ <= 100 },
message { "$_ does not is not an integeter between 0 and 100" };
subtype VectorOfWeights,
as ArrayRef[PercentageInt],
where { $_->length == TOTAL_SLOTS }
message { "Vector must have ".TOTAL_SLOTS." entries, not ".$_->length };
subtype SlotOffset,
as Int,
where { 0 <= $_ && $_ < TOTAL_SLOTS };
class_type 'DateTime';
coerce SlotOffset,
from 'DateTime',
via { ($_->wday_0 * 7 * 24 + $_->hours) * 4 + int($_->minutes / 15) };
# Let's allow clients not to care about using DateTime by allowing
# them to simply pass the results of calling 'time()' - It's not like it's
# still 1970...
coerce SlotOffset
from subtype(as => Int, where { $_ > TOTAL_SLOTS }
via { to_SlotOffset(DateTime->from_epoch(epoch => $_) };
}
has vector => (
is => 'ro',
isa => VectorOfWeights,
required => 1,
}
...Now, if we were programming in plain old Moose, we could rewrite weight_at like so:
sub weight_at {
my $self = shift;
my $offset = to_SlotOffset(shift);
$self->vector->[$offset]
}Which would be pretty sweet, but we’re using MooseX::Declare; there’s an even better way:
method weight_at (SlotOffset $offset does coerce) {
$self->vector->[$offset];
}Sweet!
We could stop there, but I had an insight. What we’ve got here is basically a wrapper around a delegation to our vector, and Moose’s new native types feature let us express the delegation to the vector quite neatly, like so:
has vector => (
isa => 'VectorOfWeights',
is => 'ro',
required => 1,
traits => ['Array'],
handles {
weight_at => 'get',
},
);
...
around weight_at (SlotOffset $offset does coerce) {
$self->$orig($offset);
}This could be overkill when vector is a simple ArrayRef as we have here, but the pattern of delegating declaratively in the attribute definition and then munging arguments in an around handler is applicable to more than just argument transformation. A typical delegation pattern involves having the delegating object passing itself in as an argument to the method delegated to. The nature of Moose’s handles declarations makes that impossible to do within the attribute declaration, but it’s easy to fix with an around helper:
around delegated_method (Any @args) {
$self->$orig($self, @args);
}(If you’re wrapping more than one method in this fashion, you should probably consider using a plain old Moose style around handler, which lets you wrap multiple methods with around @delegated_methods => sub {...}
So, at the end of all that, and after we’ve extracted our Type declarations into WeightVector::Types, we have:
use MooseX::Declare;
class WeightVector {
use WeightVector::Types qw(VectorOfWeights PercentageInt SlotOffset);
has vector => (
isa => 'VectorOfWeights',
is => 'ro',
required => 1,
traits => ['Array'],
handles {
weight_at => 'get',
},
);
method current_weight {
$self->weight_at(time());
}
around weight_at (SlotOffset $offset does coerce) {
$self->$orig($offset);
}
}And we’ve pushed all knowledge of DateTime off onto our type declarations and gained a boatload of handy bounds checking. We’ve also got a new tool for handling tricky delegation setups in the handles/around combo.
Notes
Motivation
I realise that this looks like a radical decomposition of the class with very little motivation, but it was driven by tests and by some other requirements that I’ve removed from the body of the post. In particular, the type coercions were driven by the need to build particular vectors for testing, a key method being:
method set_weight (PercentageInt $weight,
SlotOffset $from does coerce,
SlotOffset $to does coerce)
{
...
}Type coercion is wonderful
Generally, I’m not a fan of static typing. I’m from the “duck type all the way” school of programming,3 so most of my method declarations have no type declarations. But type declarations, especially ones that coerce, make so much sense on methods that make up the public protocol of a class. I only use type declarations on internal methods when I need a narrower coercion, or if I’m using MooseX::Multimethods, which I still haven’t used for anything but exploration.
Updates
Thanks to Chris Dolan for spotting that I’d got the SlotOffset coercion completely wrong. The real code’s doing the right thing, but that’s what comes of recreating code from memory.
1 This was actually motivated by trying to write tests to verify that the weights were correctly set.
2 I’m declaring these in a BEGIN block of the class itself mostly for explanatory purposes – there’s a good case for moving them out into a separate file and pulling it in with use.
3 Except during my periodic attempts to learn Haskell. I’ve learned Haskell at least three times now.
Twice now 4
In Ruby, when you’re doing division on integers, things can get a little counter intuitive. For instance, 6/4 is obviously 1. At least, it is until you decide that you’d rather have numbers that behave a little more like ‘real’ numbers and you do require 'mathn', a module in the Ruby standard library (ie, it comes as part of ruby). Then you enter a whole new world of rational maths, where 6 / 4 returns 3/2.
Several very fine and useful Ruby gems rely on the workings of mathn, including ruby-units, which is a spiffy tool for avoiding problems when one team is working in kilometres and the other in miles and it’s no fun at all when your space probe is suddenly incommunicado.
Other fine and dandy Ruby gems include ultrasphinx and webrat. Both of these two (and no doubt others) rely on the the fact that 302/100 == 3.
Hmm… can you see my problem?
Please, if you’re working on a gem that you intend to publish widely, then adopt the practice of never trusting that dividing an integer by another integer will return a third integer. You’re not even making yourself a hostage to some other gem, you’re making yourself a hostage to the standard library. Always do (an_integer / another).to_i and your code will be so much more robust.
I’ve got a pull request and lighthouse ticket in for webrat and, once I’ve hit ‘publish’ on this post, I shall be doing the same thing for ultrasphinx, but I’m sure there are other gems out there with the same problems. Please people, check your assumptions.
London.pm Presentation Video
Back in (crikey) February, I gave a talk at the London Perl Mongers’ technical meeting about Moose for Ruby Programmers and wrote it up here. Mike Whittaker was in the front row of the audience with his iPhone and, a couple of minutes in, started a voice recording and gave me a copy.
So… finally… I’ve taken the time I should have been using to write another article for The H and wrestled the slides and the audio into something like sync and uploaded the results to Vimeo for your viewing pleasure.
An introduction to MooseX::Declare from Piers Cawley on Vimeo.
Thinking about the virtues 3
I got a bit of stick on IRC last night for some of the choices I’d made when I was writing Test::Class::Sugar, in particular because one of the prerequisites is chromatic’s handy and opinionated Modern::Perl module. The ‘controversial’ aspect of Modern::Perl is that, when you use it, your code won’t run on any Perl before Perl 5, version 10.
The thing is, I don’t care about older perls any more. Version 10 features like ~~ and // are too convenient to fart about writing circumlocutions just to run on a version of Perl that I have no intention of ever using again.
Actually, that’s not quite true, those version 10 features are too convenient for me to fart about working around their absence. If you think that a module I write is useful enough that you want it to work on version 8, then of course I’ll accept your patch. But don’t be surprised if, when I start adding new features, I break the backward compatibility.
Also, on the happy day that version 10.1 escapes the pumpking patch, I’ll be setting that as my minimum perl version even if chromatic doesn’t bump the version number in Modern::Perl.
It’s all about the virtues
Your context is different from mine. I’m writing in Perl again for my own amusement more than anything. There are developments in modern Perl – tools like Moose and Devel::Declare – that I think are exciting and important. The Announcements project I started was as much about playing with the new tools as it was about trying to write something of wider utility. Test::Class::Sugar arose as a direct result of attempting to write Announcements and the desire to write test classes without hoopage. My principle drive then, is impatience to get Test::Class::Sugar to the point where I can get back to writing Announcements.
But then laziness and hubris kick in. So the code needs some polish. The parser and the code generator need to be disentangled, I need to get Adrian Howard to apply the little patch I had to make to Test::Class. Laziness demands I document it.
Impatience tells me to lean on the features of modern perl – that way I can get back to being a user of the new library as quickly as possible. Laziness tells me that I’m not going to need backwards compatibilty. Hubris tells me my work is good enough that someone who does will like it enough to send me a patch.
Everybody wins.
Magic vs Mundane: Keeping them apart
In which your correspondent does magical battle with the guts of Perl and emerges bloodied, but unbowed with a useful principle to code by.
Skip to the conclusion if you’re uncomfortable with the guts of the Perl runtime
Test::Class had me tearing my hair out earlier. There I was, happily transforming
test something {
ok 1;
};into something very like1:
*test_something =
Sub::Name::subname(
'test_something'
=> sub : Test { ok 1 }
);through the magic of Devel::Declare, but Test::Class didn’t seem to be playing fair. Instead of letting my tests run happily, it was complaining that it:
cannot test anonymous subs – you probably loaded a Test::Class too late (after the CHECK block was run). See ‘A NOTE ON LOADING TEST CLASSES’ in perldoc Test::Class for more details
The thing is, I wasn’t loading Test::Class too late. The problem is that, at the point I applied the Test attribute to my sub, the sub didn’t have a name and, because of the constraints you’re operating under when you’re using Devel::Declare to do code transformation, there was no obvious way to give it a name in time.
Incompatible magics
The trouble is, Test::Class does what it does through the magic of compile time code attributes, and, further, it relies on the fact that if a perl subroutine that gets inserted into the symbol table like this:
sub has_a_name {...}Then, when you get hold of a reference to that code by other means (say, in the subroutine that handles the setting of an attribute, that code ref knows its own name. However, if a subroutine that ends up in the symbol table like this:
*anonymous_ref = sub {...};Doesn’t know its name, unless you take advantage of the Sub::Name module.
So, in my generated code, I was giving my coderef a name, but it was happening to late. At the point that Test::Class::Test method was seeing the coderef, the coderef was anonymous.
My magic and Test::Class’s magic were incompatible.
The thing is, both sorts of magic are really just sugar for some pretty mundane donkey work. Test::Class does what it does through attributes because no flesh and blood programmer in their right mind would want to write something like this every time they wanted to write a test method:
sub test_something {
...
}
__PACKAGE__->mark_as_special_method('test_something', 'test', '3');In fact, mark_as_special_method doesn’t even exist as its own subroutine. The code that marks a method as special is just part of the body of the Test attribute handler.
Conclusion
Which brings me neatly to my conclusion.
When you’re designing a module that does anything magical, consider starting with a mundane core API that handles the business side of things. Then layer your magic on top of that API. Then document the API and the magic. Obviously the magic bits go up front in the docs, and the API goes in its own section (or even podfile) down at the bottom, where only eejits like me, who want their magic to work slightly different to yours, will bother reading it.
Obviously, I’m motivated by an issue I’m having with a particular module from CPAN, but the principle of separating the magic and the mundane is applicable everywhere. It’s called Separation of Concerns, or The Single Responsiblity Pattern. I call it a Just Story.
You’ll find the pattern in well designed websites that are using unobtrusive javascript to wave an AJAX wand over the site. You’ll see it woven through books like The Structure and Interpretation of Computer Programmers – where it’s called an Abstraction Barrier.
Patches sent
It turns out to be very easy to add mark_as_special_method (though I actually wrote it as ‘add_testinfo’ in the patch) to Test::Class. It’s about as straightforward an Extract Method refactoring as I’ve ever done – even without automated tools, I managed not to fuck it up. There’s a patch in Adrian Howard’s inbox, and I’m hopeful that it’ll be applied soon.
1 Not exactly – that’s the result of calling the shadowed test subroutine which was the result of the code transformation.
Check out the osfameron fork of Devel::Declare for the beginnings of some decent documentation which explains what’s going on.
Writing parsers for fun and convenience 4
One aspect of coming back to Perl for ‘recreational’ programming is that if, like me, you’ve declared war on @_ and boilerplate code, then testing can be somewhat trying. The Perl testing framework that best fits my head is Test::Class, which is an excellent Perlish implementation of xUnit style testing. If you’re unfamiliar with the, library, Ovid is writing what’s shaping up to be an excellent series of introductory articles about it at http://www.modernperlbooks.com/.
The problem I’m having with Test::Class at the moment is that I can’t write:
use MooseX::Declare
class Test::Person
extends Test::Class
{
use Test::Most;
method class_under_test {'Person'}
method startup : Test(startup => 1) {
use_ok $test->class_under_test
}
...
}Test::Class is doing too much in its initialization phase, and relies too heavily on code attributes, for it to play well with MooseX::Declare. Drat.
On reflection though, this might be a good thing, because maybe MooseX::Declare isn’t really what’s needed. What I’d like to write is something like:
use ...;
testclass Test::Person
exercises Person
{
startup class under test should be usable (1 test) {
use_ok $test->class_under_test
}
}And have the library ‘…’ expand the testclass declaration into something that looks like the first code snippet. After all, if MooseX::Declare can work without source filters, it should be possible to come up with something nicely declarative for specifying test classes.
Obviously, there’s nothing on CPAN that does this yet though. So I went fossicking through MooseX::Declare to see how it works1 and discovered thing of Lovecraftian beauty that is…
Devel::Declare
Devel::Declare is possibly the most hostilely documented library I’ve ever come across. Its documentation only begins to make sense when you already understand enough about how it works that you don’t really need the docs. What it does is to let you declare your own Perl keywords. You could, for instance use it to introduce given/when into versions of Perl that don’t have it yet. You declare your keywords and associate them with parsers. When, during its compilation phase, perl hits one of your keywords in the right context, it hands off to your parser which can then do what the hell it likes in the way of code transformation, before handing control back to Perl, which then parses the transformed code as if that was what was there all along.
So, to want to transform that testclass syntax I just pulled out of my ass into a real Test::Class package, I just need to write an appropriate parser and code generator, perform the appropriate Devel::Declare incantations, and I’m laughing.
Making progress
So far, I’ve got to the point where I have a working testclass keyword, but nothing yet for the ‘inner’ bits (setup, test, teardown, etc). I can write:
testclass Test::Person
exercises Person
{
...
}
testclass Test::Person::Employee
extends Test::Person
exercises Person
{
...
}and, as I write this, I’m realising that the syntax I’d cooked up for using extra test helper modules:
testclass AnotherTest
helpers -More, -Exception, Carp
{
# use Test::More;
# use Test::Exception;
# use Carp;
...
}would probably read better as:
testclass AnotherTest
+uses Carp
{
# use Test::Most;
# use Carp;
...
}and also that I want this:
testclass exercises Person {
...
}to build me a Test::Person class.
What’s still blowing my mind about Devel::Declare’s possibilities is that I’m no longer constrained to writing a Domain Specific Pidgin which works by building a tower of proxy objects and weird evaluation contexts to produce something that’s legal code in the host language, but which has the feel of another language. With Devel::Declare, I control the horizontal and the vertical until I choose to hand control back to Perl. Right now that means my error reporting is disgracefully bad, but it also means that I can roll a syntax that makes sense without worrying about how I’m going to get perl to parse it.
One of the things I find frustrating about writing RSpec specifications is that describe and it both want to be first class keywords – it feels like you should be able to write:
describe SomeClass, "in some context"
before each
# set things up
end
it "should do something or another"
...
end
endBut, because RSpec works by taking advantage of Ruby’s block magic, you have to write:
describe SomeClass, "in some context" do
before :each do
# set things up
end
it "should do something or another" do
...
end
endI definitely prefer the version without the extraneous dos and the gratuitious : before each in the before declaration. Does anyone feel like writing devel/declare.rb?
Show us the code!
If you want to see the current state of my Test::Class::Sugar art, the place to look is http://www.github.com/pdcawley/test-class-sugar. At the time of writing it relies on http://www.github.com/rafl/devel-declare and doesn’t have anything so useful as documentation, a Makefile.PL or even any tests beyond the collection of code samples that is t/initial.t. Expect all those when and if I push it to CPAN.
Caveats
Yes, I know that this sort of metasyntactic abstraction is trivial in a Lisp. I just happen to like syntax, okay?
Update 20090312
use Test::Class::Sugar
testclass exercises Foo
+uses -Warn
{
...
}Now generates
{
package Test::Foo;
use base qw/Test::Class/;
use Test::Most;
use Test::Warn
...
}So that’s one hurdle jumped. And I now know how to write the various method helpers and, when I get the appropriately shaped tuits, I shall actually write the damned things.
Then all I have to do is document it.
And write up a proposal about it for YAPC::Europe.
Update 20090314
I now know what a plan looks like:
test with multiple assertions << 3 {...}And, more importantly, I’ve implemented, and documented everything and am almost good to cut a 0.001 distribution. I need a few ducks up on CPAN, but once that’s done, we’re good and I can get on with parameterizing some of the assumptions that are hard coded at the moment.
1 Something I swore blind I wasn’t going to do in my London.pm presentation. Seems my word isn’t to be trusted…
Perl: Test Infected since 1987 3
Here’s something interesting. This is the test directory from the very first version of Perl, released in 1987 and described by Larry as ‘a “replacement” for awk and sed’. Day one of Perl, and we already have the bare bones of the Test Anything Protocol and a prototype for Test::Harness in the TEST file.
If we truffle about in the various other branches we find other useful milestones for module developers:
- 5.000, in 1994 came with the first iteration of
h2xswhich could be used to generate the basic boilerplate for a perl module. Even today, with more sophisticated module starters available, you won’t find a CPAN module of repute that doesn’t follow the basic directory structure laid down in this utility.ExtUtils::MakeMakergenerates a Makefile with a test target which runs all the tests in thetsubdirectory - 5.002, in 1996,
h2xsstarts stubbing test.pl - 5.003_12, late 1996, first version of CPAN in the Perl distribution. From day one, CPAN would not install a module if any tests failed, unless you forced it.
Meanwhile, Ruby:
- Has only recently embraced a language test suite
- Appears to have no standard layout for gem distributions
- Doesn’t run tests as part of the installation process for a new gem
Is it any wonder that chromatic gets a little cranky when sweeping claims are made about how spiffy Ruby’s testing culture is?
There are those who claim that CPAN is Perl’s shining glory, but it’s not really the collection of servers, it’s the toolchain that enables it, and that toolchain can exist because so many libraries follow a pretty minimal set of conventions.
I’d love to see the Ruby community settle on a similar, single, set of conventions for the way things should work. Start with a guarantee of a either a top level Rakefile or setup.rb with build, test and install tasks. Make rubygems run the tests before installation, if the target is available, and halt the installation if they fail. Make it easy to send reports of test failures to module authors (look at the Perl CPAN and CPAN Testers sites, and their associated tooling for ideas).
I know… I should STFU And Write Some Code.
Update
Further investigation shows that gem install -t whatever does run the tests as part of the installation process. The capability is there, but it’s turned off by default. How depressing.
