john@syntropy: Data/Object Anti-Symmetry

I’ve been reading Robert Martin’s book Clean Code. This is an important book because, almost uniquely, it tries to improve the quality of software from the bottom up. It doesn’t tell you how to improve your system design or management processes; it tells you, for example, how to lay out your code.

It’s not perfect, though. Some of the examples are unconvincing – Dave Cleal has already written about one poor example. I want to focus on another.

Chapter 6 Objects and Data Structures has a section entitled Data/Object Anti-Symmetry that argues that a procedural style of programming, where the data structure is separate from the functions that act on the structure, is sometimes more appropriate than an object-oriented style, where data is hidden behind interfaces. Taken at face value this is undoubtedly true, and Uncle Bob goes on to discuss the very common and frequently justifiable case of Data Transfer Objects. The specific anti-symmetry claim (pg 101) is:

Objects expose behavior and hide data. This makes it easy to add new kinds of objects without changing existing behaviors. It also makes it hard to add new behaviors to existing objects. Data structures expose data and have no significant behavior. This makes it easy to add new behaviors to existing data structures but makes it hard to add new data structures to existing functions.

However the example used to support the anti-symmetry, based on manipulation of different shapes, seems to me a perfect example of a situation where you’d nearly always favour the object-oriented approach. In the procedural solution (pg 95) the function to compute the area of a shape looks like this:

public double area(Object shape) throws NoSuchShapeException {
    if (shape instanceof Square) {
        Square s = (Square)shape;
        return s.side * s.side;
    }
    else if (shape instanceof Rectangle) {
        Rectangle r = (Rectangle)shape;
        return r.height * r.width;
    }
    else if (shape instanceof Circle) {
        Circle r = (Circle)shape;
        return PI * c.radius * c.radius;
    }
    throw new NoSuchShapeException();
}

Note that this function contains, effectively, a switch statement that selects between all the available shapes. Every other function that operates on the shapes will contain a switch statement with an identical form. This is a bad code smell that Martin himself criticises on page 37. In that critique he advocates – wait for it – having a single switch statement in a factory method that creates objects of the appropriate classes and then using polymorphism to access the required behaviour. If you applied that transformation in this example you’d end up with… the object-oriented solution!

If I decide I’m happy with the duplicated switch statements and stick with the procedural solution then to add, say, a perimeter() function I’ll actually write more lines of code than in the object-oriented solution because the polymorphic dispatching replaces the switch statement. In what sense, then, does the procedural approach make it “easy” to add new functions?

One important difference between the two solutions is in the scope of the change. When I add the perimeter() function to the procedural solution my change is all in one place, whereas with the OO solution the change is spread across multiple shape classes. Martin acknowledges this when he says (pg 97) “OO code makes it hard to add new functions because all the classes must change.” So perhaps for Martin “hard” means “touches lots of software entities” and “easy” means “touches only one software entity”. If that’s the case I have a little more sympathy with his position, but not much. Of all the evils in code, duplication is perhaps the worst, a point made several times in Clean Code. I’m prepared to pay the price of having to touch multiple classes in order to eliminate those evil switch statements.

In fact there’s no need to pay that price. If you want a procedural style – because you foresee that adding new functions is more likely than adding new structures – then the best way to achieve it is by using the visitor pattern, as Martin himself points out in a footnote. So why didn’t he show us a solution based on visitors? It’s a puzzle.

In truth I hesitate to criticise Clean Code at all because its heart is in exactly the right place and I hope everyone who writes code reads it. I criticise only in the hope that the second edition is even better.

john@syntropy

Monday, 12 January 2009

Data/Object Anti-Symmetry

No comments:

About Me

Blog Archive

Followers