Get sophisticated
Ruby’s primitives (Strings, Hashes, Arrays, Numbers – anything that has a literal syntax) are fine things. But that doesn’t mean you should use them everywhere. You’re often much better off wrapping them up in your own Value Objects.
Ruby’s primitives (Strings, Hashes, Arrays, Numbers – anything that has a literal syntax) are fine things. But that doesn’t mean you should use them everywhere. You’re often much better off wrapping them up in your own Value Objects.
Something I was working on at Railscamp this weekend threw up a great example of why it makes sense to replace primitives with more specific objects as soon as possible. Tom Morris asked me to take a look at Rena, an RDF/Semantic Web thingy.
The RDF spec describes two types of literals: a plain literal, which is a string with an optional language attribute and a typed literal, which is a string and an encoding (so the string might represent an integer, float, or anything else that your schema feels like expressing).
These literals can be output in one of (at least) two formats. We’ll
start by looking at Literal#to_trix
and see where that takes us:
def to_trix
if @lang != nil && @lang != ""
out = "<plainLiteral xml:lang=\"" + @lang + "\">"
else
out = "<plainLiteral>"
end
out += @contents
out += "</plainLiteral>"
return out
end
If we look over at TypedLiteral#to_trix
we see a much more
straightforward implementation:
def to_trix
"<typedLiteral datatype=\"#{@encoding}\">#{@contents}</typedLiteral>"
end
How do we eliminate that ugly conditional at the beginning of
Literal#to_trix
, and analogous conditionals in Literal#to_n3
and
TypedLiteral#to_n3
?
My first thought was that I wanted to be able to write something like:
def to_trix
"<plainLiteral#{@lang.to_trix}>#{@contents}</plainLiteral>"
end
But I didn’t want every string in the world suddenly acquiring a
to_trix
method. So, the solution was to intoduce a Literal::Language
class and coerce our language into it, so Literal#initialize
became:
def initialize(contents, lang = nil)
@contents = contents
@lang = Language.coerce(lang)
end
And Language would look something like:
def self.coerce(lang)
if lang.is_a?(self)
return lang
end
new(lang.to_s.downcase)
end
def initialize(lang)
@value = lang
end
def to_trix
if @value == ''
''
else
" xml:lang=\"#{@value}\""
end
end
That ugly conditional’s still there though, so we introduced the Null Object pattern, and things started to look a good deal cleaner:
class Language
class Null
include Singleton
def to_trix
''
end
end
def self.coerce(lang)
case lang
when self
return lang
when nil, ''
return Null.instance
else
return new(lang.to_s.downcase)
end
end
...
def to_trix
" xml:lang=\"#{@value}\""
end
end
At this point, we’re still just pushing code around. If anything, we’ve
got more lines of code now than when we started, but we’re starting to
move behaviour nearer to the data it relates to, and our objects are
starting to look like objects rather than data structures. So, we press
on and make a TypedLiteral::Encoding
class and, at this point things
start to look interesting. TypedLiteral is starting to look almost
exactly the same as Literal, but with an Encoding rather than a
language.
That strange leading space in Language#to_trix
is starting bug me.
Let’s rewrite like so:
class Literal
class Language
def format_as_trix(literal)
"<plainLiteral xml:lang=\"#{@value}\">#{literal}</plainLiteral>"
end
class Null
def format_as_trix(literal)
"<plainLiteral>#{literal}</plainLiteral>"
end
end
end
def to_trix
@lang.format_as_trix(@contents)
end
end
If we make analogous change to TypedLiteral
and
TypedLiteral::Encoding
it’s obvious that TypedLiteral and Literal were
essentially the same class. Renaming @lang
and @encoding
to
@language_or_encoding
makes this blindingly obvious, so we’ll remove
all of TypedLiteral’s methods except initialize. All that remains is to
introduce Literal.untyped
and Literal.typed
factory methods to
Literal, and make Literal.new
into a private method and we can remove
TypedLiteral in its entireity. So we change the specs to reflect the new
API (wrong way round I know). Now we have a chunk of shorter, clearer
code that will hopefully be easier to extend to cope with outputting
literals in other formats.
Retrospective
I realise that patterns aren’t the goal of development, but by the end
of the process we have a Strategy (Language/Encoding), a couple of
Factory Methods (Literal.typed
, Literal.untyped
) and a couple of
factoryish methods (Language.coerce
, Encoding.coerce
).
The most important aspect of the change was the introduction of the two
new value object classes. Once they were introduced, they became the
obvious places in which to put the varying behaviour and eliminate the
repeatition of conditional code from the to_*
methods. If there were
to be a third output style, I would look at introducing classes like
N3Stream
, TrixStream
and WhateverStream
and have a scheme like:
def to_n3
print_on( N3Stream.new )
end
def print_on(stream)
language_or_encoding.print_on(stream, value)
end
but that’s almost certainly over complicating things right now.
The other thing I like about this kind of refactoring is that it drives the code towards methods and classes which obey the single responsibility principle and, at the end of the process, not only do we have fewer lines of code in total, but the individual methods involved are all substantially shorter and closer to the left hand margin.
I really should start doing this kind of thing more in my Rails practice
– I keep being put off by the fact that the composed_of
helper is so
annoyingly not quite right and, rather than submitting a patch or making
a plugin I go “Ah well… I can live with a string for a bit longer…”
and I know. From hard won experience at that, that it’s going to come
and bite me. It’s already bitten Rails recently when Ruby got a new
@String#to/chars@
which doesn’t work like the ActiveSupport version.
Notes
If you want to see the gory details of how the change got made, Tom has merged this weekends changes into the github repository. It didn’t happen in quite the order I’ve described it in this post, but neither is this post a complete fabrication.
Changes
Corrected a stupid typo in the first block of code. Ugly condition is
actually if @lang != nil && @lang != ''
6 historic comments »
By Antoine Wed, 20 Aug 2008 12:42:32 GMT
About that ugly condition… ain’t
<pre>if @lang !
nil && @lang != false</pre>=
equivalent to<pre>if @lang</pre>
?
By Antoine Wed, 20 Aug 2008 12:51:38 GMT
Tried that in irb:
>> @contents = "something"
=> "something"
>> puts "<plainLiteral#{" xml:lang='" + @lang + "'" if @lang }>#{@contents}</plainLiteral>"
<plainLiteral>something</plainLiteral>
=> nil
>> @lang = "en_US"
=> "en_US"
>> puts "<plainLiteral#{" xml:lang='" + @lang + "'" if @lang }>#{@contents}</plainLiteral>"
<plainLiteral xml:lang='en_US'>something</plainLiteral>
=> nil
>>
By Antoine Wed, 20 Aug 2008 12:52:37 GMT
Well, my XML is a bit garbled by Textile + pre. It works fine though.
By Piers Cawley Wed, 20 Aug 2008 14:01:50 GMT
@Antoine # 1: Yes, it is the same, but it’s also a typo. I’ve corrected it in the post.
@Antoine # 2: Well, it works, but it’s code that only a mother could love. You’ve just moved the unpleasantness inline.
By Antoine Wed, 20 Aug 2008 17:10:34 GMT
#1: Ah, makes way more sense, thanks for clarifying.
#2: yep. Well, it’s just that I would rather use Stax and Java to do things cleanly like you describe in this post ; in Ruby I would go for this trick as I find it readable. But that’s just me.
By chromatic Mon, 25 Aug 2008 13:32:07 GMT
Dropping that primitive obsession in a web toolkit not only helps you avoid the fragile monkeypatching situation, but it gives you a sane, single place to prevent all XSS errors from ever occurring. The same goes for Unicode-conversion problems.