Back in the days when modelling wasn't a dirty word, Steve Cook and I wrote a book that showed how to create precise graphical models of situations and software, and use those models to drive a development process. It was called Designing Object Systems: Object-Oriented Modelling with Syntropy and was published in 1994.
We didn't get rich from sales of the book but it was influential. Many of our ideas ended up in the UML, and the UML's Object Constraint Language was directly based on our work. Also, the book remains probably the most comprehensive reference on the use of state machines to describe object behaviour.
The book has been out of print for some while, but if you would like to take a look the whole book is now available online at www.syntropy.co.uk/syntropy. I'd love to know what you think.
Friday 23 April 2010
Monday 29 March 2010
Why you should start with end-to-end tests
I spent yesterday afternoon at GOOSgaggle, an event that provided an opportunity to learn more about the ideas in Nat and Steve's book and to discuss them.
In his talk Nat expressed surprise that a large number of people using mock objects for testing start by creating unit tests for, and implementations of, domain objects. Whereas the GOOS way is to start by creating end-to-end tests that test execution paths through all the layers of the system (aka "system tests"). If the entry point to the system is in layer 1, then the test will initially mock the objects in layer 2, then replace the mocks by real implementations in layer 2 with mocks in layer 3 and so on.
I share Nat's surprise that this isn't standard practice, but what particularly interests me is the justification for it. I discussed this a bit with Nat, and then some more with Rachel Davies and Willem van den Ende, and here are the thoughts we had.
It seems to me that for Nat and Steve the primary justification is consideration of process. First, I make the assumption that there is a direct correspondence between a set of end-to-end tests and a story: the implementation of a story is driven by the creation of a set of end-to-end tests where the starting points of the tests are events detected by the system-under-test at its boundary. The resulting top-down decomposition and refinement ensures there is a natural order of development that creates exactly the software required to pass the tests and hence implement the story. No extraneous lines of code are created, and the successive refinement makes it clear what you need to do next.
By contrast, starting the implementation of a story at the domain layer requires assumptions about how the triggering events will translate into domain model invocations. Get those assumptions wrong and you find that the code you've written in the domain model isn't a good fit when, eventually, you come to hook it to the system's external interfaces. And you may find you've written code you don't need.
This alone is probably all the justification you need for following the GOOS approach, but I think there are other considerations, the primary one being risk management.
My rule of thumb is that when creating a software system you should tackle the riskiest parts first, or at least as early as is compatible with the overriding need to demonstrate progress. In my experience the major risks to project success are not in the domain model; they are in how all the layers of the system fit together, and in the interactions with external agents, such as users and other systems. Therefore it makes sense to start with end-to-end tests because these tests expose exactly those issues. It's true that the design of the domain model may affect system-level characteristics, such as performance, but even there you are more likely to detect these effects through end-to-end tests than via domain model unit tests.
Given these two compelling justifications for starting with end-to-end tests, why is it that many people apparently don't start there? We came up with two possibilities, although there may be many others:
In his talk Nat expressed surprise that a large number of people using mock objects for testing start by creating unit tests for, and implementations of, domain objects. Whereas the GOOS way is to start by creating end-to-end tests that test execution paths through all the layers of the system (aka "system tests"). If the entry point to the system is in layer 1, then the test will initially mock the objects in layer 2, then replace the mocks by real implementations in layer 2 with mocks in layer 3 and so on.
I share Nat's surprise that this isn't standard practice, but what particularly interests me is the justification for it. I discussed this a bit with Nat, and then some more with Rachel Davies and Willem van den Ende, and here are the thoughts we had.
It seems to me that for Nat and Steve the primary justification is consideration of process. First, I make the assumption that there is a direct correspondence between a set of end-to-end tests and a story: the implementation of a story is driven by the creation of a set of end-to-end tests where the starting points of the tests are events detected by the system-under-test at its boundary. The resulting top-down decomposition and refinement ensures there is a natural order of development that creates exactly the software required to pass the tests and hence implement the story. No extraneous lines of code are created, and the successive refinement makes it clear what you need to do next.
By contrast, starting the implementation of a story at the domain layer requires assumptions about how the triggering events will translate into domain model invocations. Get those assumptions wrong and you find that the code you've written in the domain model isn't a good fit when, eventually, you come to hook it to the system's external interfaces. And you may find you've written code you don't need.
This alone is probably all the justification you need for following the GOOS approach, but I think there are other considerations, the primary one being risk management.
My rule of thumb is that when creating a software system you should tackle the riskiest parts first, or at least as early as is compatible with the overriding need to demonstrate progress. In my experience the major risks to project success are not in the domain model; they are in how all the layers of the system fit together, and in the interactions with external agents, such as users and other systems. Therefore it makes sense to start with end-to-end tests because these tests expose exactly those issues. It's true that the design of the domain model may affect system-level characteristics, such as performance, but even there you are more likely to detect these effects through end-to-end tests than via domain model unit tests.
Given these two compelling justifications for starting with end-to-end tests, why is it that many people apparently don't start there? We came up with two possibilities, although there may be many others:
- Starting with the domain model can provide an illusion of rapid progress. You can show business features working while ignoring the realities of the larger system environment. Clearly, this is not normally an approach that addresses the biggest risks first. But it's an easy option and attractive when you're under pressure.
- For some reason the system environment is not available to you; perhaps, for example, the team creating the infrastructure is late delivering. So rather than taking the correct – and brave – option of loudly declaring progress on your project to be blocked, you restrict yourself to creating those parts of the system that are within your control.
Wednesday 4 March 2009
Report on UK/Europe Sun SPOT Symposium
The first UK/Europe Sun SPOT Symposium was held on 24th February 2009 at the British Computer Society’s excellent London HQ. The event was sponsored by the BCS’s Software Practice Advancement Specialist Group, to whom many thanks are due.
A total of 28 people attended the Symposium. The programme can be seen at http://sunspotsymposium.wikidot.com/programme.
First up was a presentation by Robert Taylor from Manchester University on The Yggdrasil Data Collection Framework. This was a good overview of the data collection projects using SPOTs and of some of the planned future work but we could have done with more technical details about Yggdrasil and perhaps a demonstration.
Bart Braem, University of Antwerp, explained how SPOTs are used within their Masters course as part of the Computer Networks and Distributed Systems option. His presentation can be found here. Although the use of SPOTs is in its early days they have been popular and well-received by students, and a number of interesting projects have already been undertaken.
The replacement radio stack developed at Universität Karlsruhe was the subject of the talk by Markus Bestehorn. His presentation can be found here. This stack completely replaces all the pieces above the MAC layer but provides some backwards compatibility at the application API level with the standard stack. Marcus presented a very convincing demo and statistics to illustrate the unreliability of the standard stack when transferring large data sets (e.g. library suites) across multiple hops. He also demonstrated just how resilient the KSN stack is when faced with changing network topology.
Most of the afternoon was devoted to ad hoc Open Space-style sessions.
There were four lightning talks. John Nolan, in many ways the instigator of the SPOT project, talked about the original objectives and urged participants to focus on novel applications rather than just improvements to the SDK. Kurt Smolderen used his talk to request more general interfaces for network routing, arguing that the current interfaces were biased towards AODV. Daniel Van Den Akker described his project to rework the lower part of the radio stack to make it practical to replace either the MAC layer or the physical radio. John Daniels talked about some of the add-on boards created in Sun Labs, the upcoming revision 7 main board, and gave a demonstration of the “mega SPOT”.
There were several sessions in the longer breakout slots:
Everyone I spoke to enjoyed the day and thought it worthwhile. Hopefully another Symposium will be held later in the year.
A total of 28 people attended the Symposium. The programme can be seen at http://sunspotsymposium.wikidot.com/programme.
First up was a presentation by Robert Taylor from Manchester University on The Yggdrasil Data Collection Framework. This was a good overview of the data collection projects using SPOTs and of some of the planned future work but we could have done with more technical details about Yggdrasil and perhaps a demonstration.
Bart Braem, University of Antwerp, explained how SPOTs are used within their Masters course as part of the Computer Networks and Distributed Systems option. His presentation can be found here. Although the use of SPOTs is in its early days they have been popular and well-received by students, and a number of interesting projects have already been undertaken.
The replacement radio stack developed at Universität Karlsruhe was the subject of the talk by Markus Bestehorn. His presentation can be found here. This stack completely replaces all the pieces above the MAC layer but provides some backwards compatibility at the application API level with the standard stack. Marcus presented a very convincing demo and statistics to illustrate the unreliability of the standard stack when transferring large data sets (e.g. library suites) across multiple hops. He also demonstrated just how resilient the KSN stack is when faced with changing network topology.
Most of the afternoon was devoted to ad hoc Open Space-style sessions.
There were four lightning talks. John Nolan, in many ways the instigator of the SPOT project, talked about the original objectives and urged participants to focus on novel applications rather than just improvements to the SDK. Kurt Smolderen used his talk to request more general interfaces for network routing, arguing that the current interfaces were biased towards AODV. Daniel Van Den Akker described his project to rework the lower part of the radio stack to make it practical to replace either the MAC layer or the physical radio. John Daniels talked about some of the add-on boards created in Sun Labs, the upcoming revision 7 main board, and gave a demonstration of the “mega SPOT”.
There were several sessions in the longer breakout slots:
- A demonstration of the KSN management application including OTA deployment by Stephan Kessler.
- A technical question and answer session hosted by Dave Cleal and John Daniels.
- A demonstration of the use of Wireshark to analyse the output of his SPOT radio traffic packet sniffer by Kurt Smolderen.
- A demonstration of radio communication between a SPOT and a Telos mote by Daniel Van Den Akker.
- A discussion of possible thesis projects led by Bart Braem.
Everyone I spoke to enjoyed the day and thought it worthwhile. Hopefully another Symposium will be held later in the year.
Tuesday 3 March 2009
Why OCL shouldn't die
It perhaps won’t come as much of a surprise to those of you who have barely heard of OCL to discover that some people think it should die.
In his blog article OCL is a dead language. Universities: stop wasting time teaching it Jason Gorman argues that universities are wasting their time teaching students the Object Constraint Language, a somewhat obscure part of the Unified Modeling Language, since it has nearly no users in the real world. He’d prefer that universities spent their time on “writing better unit tests, or learning to automate acceptance tests.” I agree those things are important, but I’m going to argue here that the OCL has its uses, and that those uses are important enough to justify its inclusion – albeit in a minor way – in any serious software design class.
Of course, I would say that wouldn’t I? I’m one of the people Jason describes as “one degree of separation from the chap who invented OCL in the first place.” I’ve written books that use OCL or its predecessor extensively. So I’m prejudiced. Or perhaps I can just see the situation more clearly.
I’m going to start by assuming that UML itself is still alive. You see it being used less these days now that Big Design Up Front has been exposed as a charlatan. But I still see people using it extensively in an ad hoc fashion on whiteboards and to capture high-level abstractions. I don’t see anyone arguing that having a graphical language in which to describe the structure and behaviour of systems is a bad thing, so I’ll assume UML is sticking around.
Sometimes we want to describe systems that are not wholly – or even partially – implemented in software. It’s useless to argue that “the model is in the code” in situations like this, and UML certainly has a role to play there.
So now we come to my claims for OCL:
OCL is very helpful in defining and explaining the meaning of UML diagrams
Lots of people draw UML diagrams with only a slender understanding of how the diagrams should be interpreted. Although not a topic for beginners, any serious user of UML will need to understand precisely what the diagrams mean, and OCL is invaluable for this. The UML specification is itself partially written in OCL.
OCL allows you to say things that can’t be said in UML diagrams
Diagrams have their limitations – the graphical notations can’t express everything you might want to say. You don’t really want to clutter up UML diagrams with lots of text annotations but sparing use of OCL to express key rules that would otherwise be missing can be very valuable.
OCL is (or should be) the language of Model-Driven Engineering
MDE is the generic term for software development processes that focus on producing high-level models that can then be executed. The best known approach is MDA™, promoted by the OMG™. MDE isn’t as popular right now as it was, but many people still see it as the obvious long term goal for the software industry.
The most popular language for MDE is UML. For models to be executable they must contain complete behavioural specifications and the only way to write such specifications in UML is to use OCL. So OCL had better not die.
OCL helps you appreciate the value of precision and abstraction
For most software developers the only tool they have that takes a firm position on precision is the compiler. But not every program that compiles is correct, and my belief is that people who understand how to express their designs precisely yet abstractly in UML/OCL write better programs, even if the programs aren’t explicitly designed using those tools.
So ultimately I think it is worth teaching university students OCL – provided it’s done as part of a sensible approach to modelling – even if prospective employers aren’t asking for it. Students who get that education will think about their software at a level of abstraction higher than the code and will be better developers.
In his blog article OCL is a dead language. Universities: stop wasting time teaching it Jason Gorman argues that universities are wasting their time teaching students the Object Constraint Language, a somewhat obscure part of the Unified Modeling Language, since it has nearly no users in the real world. He’d prefer that universities spent their time on “writing better unit tests, or learning to automate acceptance tests.” I agree those things are important, but I’m going to argue here that the OCL has its uses, and that those uses are important enough to justify its inclusion – albeit in a minor way – in any serious software design class.
Of course, I would say that wouldn’t I? I’m one of the people Jason describes as “one degree of separation from the chap who invented OCL in the first place.” I’ve written books that use OCL or its predecessor extensively. So I’m prejudiced. Or perhaps I can just see the situation more clearly.
I’m going to start by assuming that UML itself is still alive. You see it being used less these days now that Big Design Up Front has been exposed as a charlatan. But I still see people using it extensively in an ad hoc fashion on whiteboards and to capture high-level abstractions. I don’t see anyone arguing that having a graphical language in which to describe the structure and behaviour of systems is a bad thing, so I’ll assume UML is sticking around.
Sometimes we want to describe systems that are not wholly – or even partially – implemented in software. It’s useless to argue that “the model is in the code” in situations like this, and UML certainly has a role to play there.
So now we come to my claims for OCL:
OCL is very helpful in defining and explaining the meaning of UML diagrams
Lots of people draw UML diagrams with only a slender understanding of how the diagrams should be interpreted. Although not a topic for beginners, any serious user of UML will need to understand precisely what the diagrams mean, and OCL is invaluable for this. The UML specification is itself partially written in OCL.
OCL allows you to say things that can’t be said in UML diagrams
Diagrams have their limitations – the graphical notations can’t express everything you might want to say. You don’t really want to clutter up UML diagrams with lots of text annotations but sparing use of OCL to express key rules that would otherwise be missing can be very valuable.
OCL is (or should be) the language of Model-Driven Engineering
MDE is the generic term for software development processes that focus on producing high-level models that can then be executed. The best known approach is MDA™, promoted by the OMG™. MDE isn’t as popular right now as it was, but many people still see it as the obvious long term goal for the software industry.
The most popular language for MDE is UML. For models to be executable they must contain complete behavioural specifications and the only way to write such specifications in UML is to use OCL. So OCL had better not die.
OCL helps you appreciate the value of precision and abstraction
For most software developers the only tool they have that takes a firm position on precision is the compiler. But not every program that compiles is correct, and my belief is that people who understand how to express their designs precisely yet abstractly in UML/OCL write better programs, even if the programs aren’t explicitly designed using those tools.
So ultimately I think it is worth teaching university students OCL – provided it’s done as part of a sensible approach to modelling – even if prospective employers aren’t asking for it. Students who get that education will think about their software at a level of abstraction higher than the code and will be better developers.
Monday 12 January 2009
Data/Object Anti-Symmetry
I’ve been reading Robert Martin’s book Clean Code. This is an important book because, almost uniquely, it tries to improve the quality of software from the bottom up. It doesn’t tell you how to improve your system design or management processes; it tells you, for example, how to lay out your code.
It’s not perfect, though. Some of the examples are unconvincing – Dave Cleal has already written about one poor example. I want to focus on another.
Chapter 6 Objects and Data Structures has a section entitled Data/Object Anti-Symmetry that argues that a procedural style of programming, where the data structure is separate from the functions that act on the structure, is sometimes more appropriate than an object-oriented style, where data is hidden behind interfaces. Taken at face value this is undoubtedly true, and Uncle Bob goes on to discuss the very common and frequently justifiable case of Data Transfer Objects. The specific anti-symmetry claim (pg 101) is:
If I decide I’m happy with the duplicated switch statements and stick with the procedural solution then to add, say, a perimeter() function I’ll actually write more lines of code than in the object-oriented solution because the polymorphic dispatching replaces the switch statement. In what sense, then, does the procedural approach make it “easy” to add new functions?
One important difference between the two solutions is in the scope of the change. When I add the perimeter() function to the procedural solution my change is all in one place, whereas with the OO solution the change is spread across multiple shape classes. Martin acknowledges this when he says (pg 97) “OO code makes it hard to add new functions because all the classes must change.” So perhaps for Martin “hard” means “touches lots of software entities” and “easy” means “touches only one software entity”. If that’s the case I have a little more sympathy with his position, but not much. Of all the evils in code, duplication is perhaps the worst, a point made several times in Clean Code. I’m prepared to pay the price of having to touch multiple classes in order to eliminate those evil switch statements.
In fact there’s no need to pay that price. If you want a procedural style – because you foresee that adding new functions is more likely than adding new structures – then the best way to achieve it is by using the visitor pattern, as Martin himself points out in a footnote. So why didn’t he show us a solution based on visitors? It’s a puzzle.
In truth I hesitate to criticise Clean Code at all because its heart is in exactly the right place and I hope everyone who writes code reads it. I criticise only in the hope that the second edition is even better.
It’s not perfect, though. Some of the examples are unconvincing – Dave Cleal has already written about one poor example. I want to focus on another.
Chapter 6 Objects and Data Structures has a section entitled Data/Object Anti-Symmetry that argues that a procedural style of programming, where the data structure is separate from the functions that act on the structure, is sometimes more appropriate than an object-oriented style, where data is hidden behind interfaces. Taken at face value this is undoubtedly true, and Uncle Bob goes on to discuss the very common and frequently justifiable case of Data Transfer Objects. The specific anti-symmetry claim (pg 101) is:
Objects expose behavior and hide data. This makes it easy to add new kinds of objects without changing existing behaviors. It also makes it hard to add new behaviors to existing objects. Data structures expose data and have no significant behavior. This makes it easy to add new behaviors to existing data structures but makes it hard to add new data structures to existing functions.However the example used to support the anti-symmetry, based on manipulation of different shapes, seems to me a perfect example of a situation where you’d nearly always favour the object-oriented approach. In the procedural solution (pg 95) the function to compute the area of a shape looks like this:
public double area(Object shape) throws NoSuchShapeException {Note that this function contains, effectively, a switch statement that selects between all the available shapes. Every other function that operates on the shapes will contain a switch statement with an identical form. This is a bad code smell that Martin himself criticises on page 37. In that critique he advocates – wait for it – having a single switch statement in a factory method that creates objects of the appropriate classes and then using polymorphism to access the required behaviour. If you applied that transformation in this example you’d end up with… the object-oriented solution!
if (shape instanceof Square) {
Square s = (Square)shape;
return s.side * s.side;
}
else if (shape instanceof Rectangle) {
Rectangle r = (Rectangle)shape;
return r.height * r.width;
}
else if (shape instanceof Circle) {
Circle r = (Circle)shape;
return PI * c.radius * c.radius;
}
throw new NoSuchShapeException();
}
If I decide I’m happy with the duplicated switch statements and stick with the procedural solution then to add, say, a perimeter() function I’ll actually write more lines of code than in the object-oriented solution because the polymorphic dispatching replaces the switch statement. In what sense, then, does the procedural approach make it “easy” to add new functions?
One important difference between the two solutions is in the scope of the change. When I add the perimeter() function to the procedural solution my change is all in one place, whereas with the OO solution the change is spread across multiple shape classes. Martin acknowledges this when he says (pg 97) “OO code makes it hard to add new functions because all the classes must change.” So perhaps for Martin “hard” means “touches lots of software entities” and “easy” means “touches only one software entity”. If that’s the case I have a little more sympathy with his position, but not much. Of all the evils in code, duplication is perhaps the worst, a point made several times in Clean Code. I’m prepared to pay the price of having to touch multiple classes in order to eliminate those evil switch statements.
In fact there’s no need to pay that price. If you want a procedural style – because you foresee that adding new functions is more likely than adding new structures – then the best way to achieve it is by using the visitor pattern, as Martin himself points out in a footnote. So why didn’t he show us a solution based on visitors? It’s a puzzle.
In truth I hesitate to criticise Clean Code at all because its heart is in exactly the right place and I hope everyone who writes code reads it. I criticise only in the hope that the second edition is even better.
Tuesday 6 January 2009
Am I a Master?
Several people have observed, with a snigger, that I am the only person registered for the forthcoming Software Craftsmanship conference who is listed as a Master. Ade Oshineye has gone further by writing “Anyone who seriously claimed [to be a master] would suddenly find themselves having to explain why they were better than everyone around them. Someone could attempt it but they’d need a lot of ego and a diminished capacity for self-doubt and self-awareness.” (original here)
Gosh. The main reason I listed myself as a master is that Jason Gorman was kind enough to describe me as one in his promotional material for the event. Seriously, though, should I be prepared to call myself a Master of Software Development? It’s a tricky question because traditional craftsmanship is not an accurate parallel to what I do at work. As Oshineye points out, there’s no agreed way of determining what mastery of software means. Like many of the analogies applied to software development, a consideration of “craftsmanship” can improve our understanding but it is misleading to assume craftsmanship – or any other analogy – will provide a complete and useful model. Software development is like… software development. Perhaps the most useful insight that comes from comparing software development with craft is the realization that apprenticeship might be the most appropriate way of learning a set of poorly understood and rapidly evolving skills.
If I really am a Master I should be able to point to my masterpiece. But, as is the way with most software development, all the successful systems I’ve been involved with have been team efforts. Even my books have been co-authored. So if I had to go before my peers to argue my case as a Master, what would I say?
I think the most important considerations in this assessment are whether your peers recognize that you have:
As for the third category, I’ll leave it to the people I’ve worked with over the years to decide.
Gosh. The main reason I listed myself as a master is that Jason Gorman was kind enough to describe me as one in his promotional material for the event. Seriously, though, should I be prepared to call myself a Master of Software Development? It’s a tricky question because traditional craftsmanship is not an accurate parallel to what I do at work. As Oshineye points out, there’s no agreed way of determining what mastery of software means. Like many of the analogies applied to software development, a consideration of “craftsmanship” can improve our understanding but it is misleading to assume craftsmanship – or any other analogy – will provide a complete and useful model. Software development is like… software development. Perhaps the most useful insight that comes from comparing software development with craft is the realization that apprenticeship might be the most appropriate way of learning a set of poorly understood and rapidly evolving skills.
If I really am a Master I should be able to point to my masterpiece. But, as is the way with most software development, all the successful systems I’ve been involved with have been team efforts. Even my books have been co-authored. So if I had to go before my peers to argue my case as a Master, what would I say?
I think the most important considerations in this assessment are whether your peers recognize that you have:
- advanced the body of knowledge of the field
- made efforts to pass knowledge on to others
- carefully and consistently met high standards in your own work.
As for the third category, I’ll leave it to the people I’ve worked with over the years to decide.
Tuesday 30 December 2008
Modelling with a Sense of Purpose
Steve Cook and I, in our 1994 book Designing Object Systems, were the first people to set out clearly the different purposes of object models, especially the distinction between a model of a situation (sometimes called a model of the world) and a model of a software system. It seems ridiculous now, but back in the early 1990s many people believed that models of situations in the world could just be treated as software designs if you squinted a bit.
I say it seems ridiculous, and I thought the whole issue was settled back in 1997 when Martin Fowler explicitly endorsed our approach in UML Distilled, but I was disturbed to hear from Keith Braithwaite recently that he still encounters many developers who have been taught UML without grasping this essential point.
There has been a backlash against modelling driven by the wastefulness of many large projects during the last fifteen years. These projects valued models over code and paid the price. But there is still a huge value in having a language at a higher level of abstraction than code that developers can use to organize and communicate their thoughts. The UML - for all its faults - can be this language, but not unless we teach people how to use it properly. Understanding the different purposes of models is absolutely key to this. How can we make sure this subject is properly covered whenever and wherever UML is taught?
I say it seems ridiculous, and I thought the whole issue was settled back in 1997 when Martin Fowler explicitly endorsed our approach in UML Distilled, but I was disturbed to hear from Keith Braithwaite recently that he still encounters many developers who have been taught UML without grasping this essential point.
There has been a backlash against modelling driven by the wastefulness of many large projects during the last fifteen years. These projects valued models over code and paid the price. But there is still a huge value in having a language at a higher level of abstraction than code that developers can use to organize and communicate their thoughts. The UML - for all its faults - can be this language, but not unless we teach people how to use it properly. Understanding the different purposes of models is absolutely key to this. How can we make sure this subject is properly covered whenever and wherever UML is taught?
Subscribe to:
Posts (Atom)