Asynchronous Streams

Piers Cawley 2010-11-242010-11-24

In Higher Order Javascript, I introduced Streams and showed how to use them to implement a lazy sort. I think that's neat all by itself, but it's not directly useful in the asynchronous, event driven execution environment that is the average web page. We'd like a structure where we spend less time twiddling our thumbs as we wait for force to return something to us.

Non blocking streams

What if we change the protocol of our stream to something more asynchronous? Obviously, we'd still have a head element which is immediately available and some kind of promise to compute the next stream. But rather than promising to return a new stream, the promise of a non-blocking stream is a promise to call a function we supply with the value it computes. Here's a CoffeeScript implementation of what we're talking about:

the_empty_stream =
    is_empty: true

continue_with = (b) -> window.setTimeout((=> b()), 0)

class NonBlockingStream
  constructor: (@head, promise) -> 
    this.force_into = (block) ->
      if promise.length == 0
        continue_with -> block promise()
      else
        continue_with -> promise block
  is_empty: false

I've written NonBlockingStream to work with both 0 argument functions (like the promise of an ordinary stream) and with 1 argument promises that will be responsible for calling their block when appropriate. force_into doesn't block by virtue of the fact that we execute our promise using setTimeout -- a timeout of 0 milliseconds doesn't mean execute the block immediately but asks for it to be executed as soon as possible. The fat arrow (`=>`) used to build the function passed to setTimeout ensures that the code will be executed with this bound to the stream instead of the global window object.

This new non blocking stream is a much better citizen on a webpage. It yields to the event loop at every opportunity and gets out the way of other, possibly more important events. But so far we know how to use it for things like finding primes or the top 5 entries of 1000. Not exactly useful in the browser…

Streams in the real world

Listen to any web usability guru for more than about ten minutes and they’ll tell you that pagination is evil. If you have a resource (say a blog index page) that logically should have 1000 entries on it, then breaking it up into multiple pages is a sin. The reader should simply be able to scroll through all 1000 entries. But most users don’t scroll through every entry, and rendering 1000 entries is time consuming. Sites like Google reader and Twitter solve this by doing ‘just in time’ fetching.

Suppose we wanted to re-engineer this blog to use the endless page pattern, then I might think of using streams. We’d like to serve up an index page that sets up all the headers and navigational stuff, but which populates its articles via an unbounded stream of articles. Let’s say that an article looks something like this:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>The current article</title>
    <link rel="next" href="the-next-article.xhtml" />
  </head>
  <body>
    <article>
      <h2>The current article</h2>
      <p>Yada yada yada...</p>
    </article>
  </body>
</html>

You can think of this as a kind of stream. The 'head' of the stream is the contents of the html body tag, and the promise is the <link rel="next" ... /> tag in the head. Let's write a function which, given a document like this will make us a stream. We'll use jQuery because, well, why not?

$ = jQuery

doc2stream = (data) ->
  new NonBlockingStream $("body", data), (block) ->
    next = $("head link[rel=next]", data).attr('href')
    if next
      $.ajax
        url: next
        success: (data) -> block(doc2stream data)
        dataType: 'xml'
    else
      block the_empty_stream

Note that our promise doesn't call the block it's forced with immediately. Instead it fires off an asynchronous request for the next article in the stream with a success callback that (finally) calls the block.

Putting it together

The trick now is to get the stream up and running from our article index. Let's assume we have an initial page along the lines of:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Some blog articles</title>
    <link rel="start" href="the-first-article" />
    <script 
       type="text/javascript"  
       src="http://code.jquery.com/jquery.min.js"></script>
  </head>
  <body>
    <header>
      <h1>Some blog articles</h1>
    </header>
    <section id="articles">
      <footer>
        <p>If you can see this, I'm probably fetching a real article</p>
      </footer>
    </section>
    <script
       type="text/javascript"
       src=".../asynch-fetcher.js"></script>      
    <footer>
      ...
    </footer>

  </body>
</html>

This is where we shove all the navigational bits and pieces of our blog, the links to atom feeds, sidebars with associate links and whatever. But it's devoid of content. We need to fix that by finishing off our asynch-fetcher.coffee¹. First, we need to setup our stream:

articles = new NonBlockingStream '', (block) -> 
  $.ajax
    url: $("head link[rel=first]").attr('href')
    success: (data) -> block doc2stream data
    dataType: 'xml'

We can't simply call doc2stream because that expects to find the link to the next article in a [rel=next] link in the head, but we're using a 'first' link here. So we make a stream with the empty string a as a dummy head and a promise to fetch the article linked to by our [rel=first] and turn that into a stream via `doc2stream`

Next we need a function to update the articles variable and extract the article element from the head of a stream and insert it in our `#articles` section just before the footer. Once this is defined we force the first article into it.

show_next = (stream) ->
  articles = stream
  $("#articles footer").before(
    $("article", stream.head).clone().addClass("last-art")
  )

articles.force_into show_next

We also arrange for show_next to add a last-art class to the newly inserted article, which we'll use as a target in the watcher function we set up to handle fetching new articles as they are needed:

$.fn.is_in_view = () ->
  $(this).position().top < ($("body").scrollTop() + $(window).height())

watcher = ->
  last_art = $(".last-art")
  if last_art.is_in_view()
    last_art.removeClass('last-art');
    articles.force_into (str) -> 
      unless str.is_empty
        show_next str
        window.setTimeout watcher, 100
  else
    window.setTimeout watcher, 100

window.setTimeout watcher, 100

Our heuristic for judging when to fetch the next article is simple: if the article tagged with the last-art class is in the viewport, then it's time to go about fetching the next one. This assumes that our writing is compelling enough that by the time the reader gets to the bottom of an article, the next one will have been succesfully fetched. This may be an optimistic heuristic, but we're all about "for the purposes of illustration" here.

To get this to work, we add a simple is_in_view method to jQuery.fn. This tests if the selected element's top is placed higher than the bottom of the viewport. With that in place, we can write watcher which checks if the last article is in view. If it when it is, watcher removes the last-art marker class and kicks off the process of fetching the next article. We use window.setTimeout to ensure that we keep fetching next articles as long as their are articles to fetch and our reader is reading them.²

Notes

A similar stream could be set up for each article's comments, after which we might find that we should parameterize show_next and watcher in some fashion.
Real world use would probably involve serving up a few articles in the body of a blog front page - if only so that Google has something to index. tag indices or search results could be served up empty and populated on demand.
I've not (yet) redone this blog to use this pattern, but I've tested the code presented here and it does work.
For caching purposes, it may better for searchs and the like to return a small lump of JSON with a head link to the statically cached article document and a promise link to the next lump of JSON in the results. If we write doc2stream right, it should be possible to completely isolate the rest of the page from this decision, which seems like a win to me.

Footnotes:

¹ Which our server autogenerates from the coffeescript we're actually writing.

² Or, at least, scrolling past them.