Introduction
In the past there has been an assumption that agile software development techniques only work in small projects. I think that this view may be shifting amongst most people who have minds that remain open on the topic. However scaling-up software projects does present new challenges not seen in the small.
I have worked on several very large, in some cases globally distributed, projects which have benefited hugely from agile development practices. This article is intended to highlight some of the differences and additions that I and my colleagues on those projects have found to be necessary in order to ensure that the benefits we see in these larger-scale settings are in line with the benefits that are commonplace for smaller agile teams.
The Problems of big projects
Big projects are fundamentally different from small. It is not simply a matter of doing more of the same things, different things become vitally important to the success of a big project. Some agile practices, unless handled carefully, can become causes of problems rather than solutions to them.
One fundamental of the way that my organization delivers software is through the use of Test Driven Development (TDD). I think of TDD as a way of harnessing the power of an evolutionary approach to ‘growing’ software to meet its intended function.
This evolutionary approach clearly has some wonderfully powerful properties, but the larger and more complex the project becomes the closer the development process itself comes to forming a genuinely evolutionary system.
There are several drawbacks to unchecked evolution in our software development process that will work against the success of the project unless they are recognized and mechanisms are put in place to correct them.
Genuine evolutionary systems are focussed wholly on fitness to function, and not on providing any declaration of intent. Try finding any statement of requirement or intent, or, indeed, any direct causal link for the appearance of opposable digits. We can theorize about how such a remarkably useful feature evolved but we cannot be sure that our explanation will be accurate, and to be honest we are probably more likely to settle on the wrong theory, based on our history and prejudices, than light upon some fundamental truth.
There is no direct selection pressure in evolution for simplicity; evolutionary solutions get stuck at points of high-complexity when there is no selection pressure for them to find alternative mechanisms. So software that genuinely evolves to fit a set of test criteria may be much more complex than is needed; it may have a large amount of redundancy and can be very tightly coupled unless there is an explicit reason for it not to be so.
When a code base, and associated development team, reaches a certain size the impact of an evolutionary approach to development is an important aspect of their use of TDD. Without great care such teams can end up with a large code-base, including a massive maze of unit tests which, combined, represent the environmental niche in which the application, genuinely, evolves throughout the development process but this evolution may easily drive the application toward local minima on the complexity scale and may not represent a good solution, even though, by definition it fulfills the requirements because it passes all the tests.
Like its biological equivalent, this niche can completely fail to define the intent of the code. The problem here is one of local-optimization of development process and code, and so a lack of selection pressure to avoid these local minima. Under such circumstances what has gone wrong is that the code-base is too large for anyone to know it all. In big projects, understanding the business process for the whole application can be close to impossible, let alone understanding the implementation in any depth. In such environments people will tend to focus-in. They will look for an understandable ‘chunk’ of the application that is the right size for them to understand.
In terms of requirements this will, of necessity, be defined by the cards that they work upon. In terms of the code it will be the structures and patterns that are close to hand. When they need to do something new they will look for convenient examples of design that solve similar problems to those involved in the task they are working on. This means that it is all too easy to re-evolve parts of the solution multiple times resulting in a code-base that grows rapidly in complexity, and becomes harder and harder to understand all this, of course, only makes matters worse, this is a negative-feedback loop, or to maintain my theme, evolution driving the system toward the wrong local-minima.
Another not necessarily obvious aspect of big projects is that the development team is generally much more fluid through the life of the project than in smaller developments. A proportion of the team is always joining, leaving and changing roles.
This makes the need for clarity-of-intent in the code even more important, but in the absence of such clarity there is often a reduction in the shared understanding of the project amongst the development team. This lack of understanding in turn often results in people tending to focus, at least in large Agile projects, on the tests themselves rather than the intent behind the tests and as a result the consistency and continuity of the application begins to diverge. This process is now a genuinely evolutionary one, as described earlier, rather than a process of directed evolution which is where TDD is at its strongest. The result is that the application becomes harder and harder to change as it evolves to more closely fit its niche and only its niche.
This is a massive problem to a large development team and can be one of the principal limiting factors in how well an agile project will scale.
When a code-base is in this state many follow-on problems emerge, it is therefore a key problem to address in working with large agile projects.
The goal then is to achieve a balance, a more clearly directed evolution that will tend to find optimal solutions to important problems but will also tend to capture the intent of the application at the same time.
Symptoms of Unchecked Evolution
In many respects the symptoms of unchecked, direction-free evolution, are the same or, at least, they are analogs of the classic failures in any large project, even those developed using traditional waterfall techniques. This often comes as a surprise to agilists who aim, and expect, to avoid these very pitfalls.
Some symptoms are listed below, in many respects these symptoms are attributes of some more fundamental failures, and so in many cases they are reflections of one-another. The really unpleasant aspect of dealing with big projects is that there is a kind of anti-synergy at play where the effects of poor control, local optimization and lack of direction tend to interact and combine to make the whole picture worse. This is one reason why big projects are so difficult, because if they do start to diverge from control, they tend to veer-off very rapidly, and once they have diverged, it is very difficult to regain control without some decisive and usually painful actions.
Large complex code-base with many tests
A naïve approach to TDD is to assume that the quality of can be measured through test-coverage. This can be a very poor indicator if the tests are not focussed on the main benefits of TDD.
It is not uncommon in some agile projects to see tests that are very focussed on the implementation of the application rather than on its intent. Under these circumstances changing an implementation can result in whole collections of tests becoming useless. This doesn’t help at all in proving that the changes have retained the function of the application, a primary function of tests in TDD.
I have seen large code-bases with test-coverage figures well above 90% which are very hard to change, and still exhibit lots of bugs as a result of this problem of implementation-focussed tests.
Hard to refactor
If tests are not focussed on the intent of the application, and are implemented merely to prove that a specific implementation works as expected, then these tests often make the application more difficult to refactor rather than less.
When relatively small changes result in large re-work in tests, as well as in the application itself, this is a strong warning sign that the application is in a poor state.
Scary to restructure
Larger-scale moves of the code-base become almost impossible, because there are so many tests to change, and since the tests are focussed on implementation rather than intent, it is not possible to understand what the application is meant to do, without understanding all of the code.
This is most obvious when it is not possible to identify anyone who is confident that they understand the application widely enough to make such a change. This is a symptom of the more fundamental problem of expertise fragmentation.
Difficult to add new function
Adding new behaviour to the application becomes increasingly difficult, because the logic of the application is spread across many, locally optimal, locations and so small changes may either be duplicated by different people addressing the same issue at multiple sites in the code, fixed in one place but not another because the other place is unknown to the developer, resulting in an application with unstable, and hard-to-predict characteristics, or least likely of all the code is restructured at huge cost in order to avoid the duplication, which while it is clearly the right thing to do, can be very hard to justify if the change that initiates the restructure is tiny.
I have seen systems where the disparity between doing the wrong thing and doing the right thing is the difference between 5 minutes and 20 days of effort. This kind of disparity makes the pressure against doing the right thing very hard to resist.
Perhaps the most unpleasant aspect of this symptom is that it increases the coupling in the code, because developers only have a sketchy understanding at best of the code of neighbouring areas of function, they will tend to interact with it in tactical ways, instead of adhering to the ‘design’ of the team that developed it. As coupling increases, sensible ways to interact with other modules become hidden and confused by tactical short-cuts and become hard, or even impossible to understand even by those developers who are working hard to find a clean approach.
Fragile to change
Small code changes break tests in supposedly unrelated areas of code. Or worse, small changes kill-off tests and the resulting effort to re-instate them starts to escalate.
This is the slippery slope to the nightmare of an unmaintainable legacy system.
Poor layering/encapsulation and coupling
It is hard to identify layers in the code, responsibilities of parts of the application are blurred together. Changes in technical approach become increasingly hard.
I have seen single applications with multiple (more than 5!) different persistence mechanisms because it was too hard to remove the old ones, because the code was tightly-coupled to the underlying mechanism.
The principle of “Tell, don’t ask” tends to disappear with an associated poor adoption of the Law of Demeter. The resultant code works against the XP goal of never implementing the same thing twice, with many examples of very procedural programming, with all the work of the application done in the client methods that treat the server classes as simply complex data-structures and little more.
Expertise Fragmentation
As these symptoms arise the fragmentation of, and reduction in, understanding of the application by the development team accelerates.
Common code ownership evaporates as developers become increasingly insecure outside their immediate area of familiarity in the code.
The worst part of this symptom is that it is such a slippery slope. This failure accelerates the appearance of the other symptoms described here.
As the fragmentation of understanding progresses approaches to development diverge, common solutions and patterns become less and less common and the developers become even more deeply entrenched in their own area of the application, because when they look outside it everywhere else looks so different and hard to understand.
Why is this worse on Big Projects?
Agile methods are focused on an iterative, guided (or steered) approach to development. All of the problems outlined here are endemic to some degree in software development in teams. However in most projects below a certain size these effects are dealt with implicitly in Agile projects by encouraging good communications and creative thinking and most importantly by not assuming, up-front, that anyone has all the answers. As problems arise, solutions are revisited and re-evaluated, and if found wanting, they are redesigned.
To put it another way, the reason that XP works is that it:
- Creates an environmental niche. – The tests of TDD
- Evolves a solution to fit that niche. – The code that makes the tests pass
- Selects for simplicity. – The rule that you do the simplest thing that works, with no duplication
- Makes changes so that the solution is the simplest that still fits the niche. – Refactoring
The first two steps clearly scale well. Each developer in a large agile team can write the tests that define the desired behaviour of the application. They can use those tests to make design decisions that make the interfaces to their components tidy and usable, and they can prove that it works by running the tests. With suitable focus and understanding, the developers will create good tests that capture the intent of the code and not simply prove that the particular solution, as implemented, works.
The real problems come with the later two steps. When selecting the “simplest solution that works”, if the cost differential between the “simple-clever” solution and the “simple-stupid” solution is crazily high which version of simple does the developer choose?
What does simple mean anyway if the application is too complex for any one person to understand? Part of the definition of simple is related to meeting all of the requirements, what happens when the project is big enough that no one can know all of the requirements? How do we define simple then?
In smaller teams solutions are consistently selected for simplicity by the development team, because each time some new function is added to the code it’s “fitness” is judged and the team will decide if the code needs to be refactored to support the new feature in a tidy way.
To put this another way, there is an implicit selection pressure for good design imposed by the XP rule that code must not be duplicated and by the sense of good-taste in the heads of the development team. This sense of taste will, and should be, informed by the experience of the developers, the nature of the project that they are working on, the technologies in use and many other factors. However in larger projects the remove-duplication rule, and sense of taste tend to get focused into ever shrinking sub-regions of code because of the effects of expertise fragmentation.
It is also true that simply because they are smaller, small projects can suffer the consequences of poorly factored code, or of solutions that are more complex than is strictly required, more easily than large projects. If I write code that is complex and hard to understand it may not be good, but it is acceptable as long as I understand it. If I am part of a team, the whole team must understand it and so the need for clarity over cleverness is heightened.
The inertia of big projects, that is the difficulty in changing the direction of such projects, and the nasty tendency of effects like Expertise Fragmentation and causes like Optimization to local Minima to take hold at what feels like an exponential rate when things do start to go wrong, mean that there is less time to react before you are in a bad situation and when you do react it seems to take longer for the effects to be felt. In some ways this means that you need to be more agile and more focussed on not letting things get too far off track before you start steering back in the right direction.
In big projects the effects of Expertise Fragmentation mitigates strongly against the ability of teams to understand all of the needs of the application, let alone agree on appropriate solutions. The exponential growth in communication overhead that occurs as teams get bigger also means that shared understanding becomes harder and harder to build and maintain as the size of the project grows, and as time passes.
So if steering in big projects is even more important than in small and if such big projects seem able to veer off-course very quickly how do we steer them towards success?
We must find techniques that will allow us to apply positive selection pressures to our evolutionary development system, in a similar manner to the implicit selection pressures found in small projects,
We must ensure that we have the right selection pressures in place to direct the evolution of our software toward the destination we wish. Some of those pressures are made explicit in techniques like TDD, for the others we need to work hard to ensure that we make those selection pressures explicit and do not rely on them happening by accident in an environment where communications, team size, project complexity and technological complexity can all mitigate against them.
We must look to steer the project more decisively. Large projects must be effectively and continuously directed toward the goals, both technical and commercial.
In agile development we use an evolutionary development process, but it cannot be allowed to be a process of un-checked evolution, or we may end up with a payroll system when we wanted a point of sale system. Instead we must establish a process of directed evolution.
Statement of Destination and iteration zero
Of immense importance in establishing, and gaining the benefits of, such a directed evolution is a statement of destination for the project. You need to establish a destination toward which the evolution of the project will be directed.
Since this is an agile project, the destination can change, but the cost of change can be very large if the destination is outlined in too much detail. So an ideal destination is concrete in that it defines a collection of constraints within which the development will occur, is simple enough that all of the development team can understand it’s use, and is general enough that small-scale changes in scope, technology and deployment decisions will have little or no impact on it.
Getting this right is difficult, but can, in large part, be done in an iterative fashion and so this is not an insurmountable problem.
An important piece of guidance is that it doesn’t matter if this description of the destination is formal, informal or a combination, if it consists of Wiki pages or UML, or if it is drawn on the back of a beer-mat. The important thing is that it is real, defines a collection of constraints that rule-out some things and some guidelines that address real problems, or areas of doubt, in the development.
Actually the beer mat won’t work because a requirement of this description of the system is that it is available to all developers. This big-picture of the project should be the thing that new starters are pointed at when they join the project and need to understand how it works. It should be the thing that people refer back to when they have a problem that doesn’t seem to be similar to others that have been tackled in the project before, it should be the thing that is revisited and maintained each time new solutions are identified, or old ones are found wanting.
“What’s the creation pattern for our MVC implementation?”, “What are the layers in our application?”, “What are the rules and constraints that define what must/must-not be in each layer?”, “How do we ensure that our functional tests keep working when we change code?” are all good questions of the type that should be answered globally for the project. The answers will of course vary for each project, but so will the collection of questions.
Being too formal about these questions is a mistake, but ignoring them and assuming that solutions will evolve completely bottom up is also a mistake, because while true, there will be many solutions to single problems because of the problems outlined in the preceding sections. In essence, the organizational structures must be in place to ensure that these issues can be tackled as they arise, and that when they do arise, they will be recognized as problems that need to be solved generally across the project, and not within the context of a sub team, and that the solution can be made quickly enough to not hold up the development.
It is my opinion that in a large development, where there are many of these problems, timely solutions can only be made with a small amount of look-ahead. Leaving such design choices too late will lead to local optimization and it’s associated problems. If the implementation of a story that needs some general solution or guideline is happening before that solution or guidance is in place, it is too late to solve the general problem and a locally optimal, but generally sub-optimal, solution will be found and you are quickly in the game of producing “tomorrow’s legacy system today!”.
My own preference for starting out is for a high-level diagram of the Domain Model of the system, concentrating at the Class Responsibilities-Collaborators level of abstraction. In addition I prefer an informal architectural model that captures some basic assumptions and rules about gross system structure – what layers are there in the software?, what should be in each layer?, what should not
And finally I like a minimalist set of coding standards that act as a placeholder for adding general detail as the project precedes – e.g. Logging strategies, exception handling policies and so on.
I used to use UML for these initial models, and it can work, but the problem with UML at this stage is that it can tend to give the illusion of accuracy; implying more precision than we really have, save the precision for the code and the tests, any models we create at this stage are meant to be indicative and are expected to change and evolve.
None of these things are defined in isolation; they are all built up incrementally at the start of the project. However, something of each should be in place before the development team starts to grow beyond 4-6 people. This should be part of the project initiation, and part of establishing a core team of developers who will help to ‘carry the torch’ through the next phase of the project.
There is also nothing wrong with making assumptions at this stage. There are two extremes to avoid. I think of them as failure modes of the Waterfall and of overly simplistic approaches to Agile development.
The first failure mode to avoid, Waterfall failure, is the classic Big-Up-Front-Design and it’s so common side-effect, analysis paralysis. I think of this failure as providing an illusion of rigour, a false-sense of security that you have foreseen all of the problems of the project.
The second mode, a failure of an overly simplistic approach to Agile development, is to do no preparation at all, and to assume that the evolution of a new system must always start from a blank page. For me this failure mode is the “President of the Universe” failure mode.
In ‘The Hitchhiker’s Guide to the Galaxy’ series Douglas Adam’s introduces the president of the Universe, who takes nothing for granted and so every morning when he wakes up he starts everything from first principles and as a result achieves nothing, a great guard against meddling politicians, but not a good start for a software project.
Achieving an appropriate balance between these two extremes can be tricky, but is not an insurmountable problem.
I prefer intentionally woolly definitions of what comprises an architectural big-picture:
“The collection of stuff, sufficient to ensure that when any one of us is presented with a problem we will tend to come up with essentially the same solution.”
This definition is vague enough to leave plenty of room for nuance, difference and human judgement, but it does state our goal, which is to establish an approach to design that will tend to make us converge on common approaches rather than develop a collection of silos in the application which represent little shared experience or share little common structure. It is a useful guideline to when we have done “enough” because it states a goal rather than a proscriptive collection of artifacts.
I prefer to establish some initial broad-brush guidelines, as a minimum I like to have a defined tiering model that is appropriate to the expected application, with each tier described by some straight-forward statements, rules that give guidance on what should and shouldn’t happen in each tier. Statements like:
“The presentation layer is responsible for the pixel-painting and the most basic of message routing. It consists solely of GUI components and event listeners. The event listeners do no more than delegate events to application layer components. The presentation layer contains no logic, only delegation.”
It is also valuable to focus on specific problem areas that you would prefer to avoid. This should not be a complex modeling or design exercise, although some of each will probably be useful as it seems relevant to particular problems. Instead this is really some scene-setting technical positioning work, before the team starts to grow too big. These initial efforts are best tackled in the classic Agile fashion, but this is a very dangerous period, because it is so easy to become seduced by the lure of Big-Up-Front-Design.
It is wise to concentrate on a few basic things, perhaps the construction order for your MVC implementation, or defining your approach to isolating your persistence mechanism from your business logic. Almost certainly it will include things like selecting a communication mechanism that you will use to cross process boundaries and a broad brush selection of application patterns (e.g. Thin/Thick Client, Front side Controller, Dependency Injection etc.). To carry my theme of evolution perhaps a step too far, you may think of this as defining some important structural genes (architectural patterns) which will populate the genome of your project and start it in the right direction.
All of these decisions should be worked out in the context of real business stories, but the focus is on establishing some design guidelines for the start of the project proper, when the rest of the development team arrive and start work on the bulk of the stories.
Experience is still of enormous value to Agile developers and should not be discounted, there is no need to be like Douglas Adams’ “President of the Universe” and make no assumptions at all.
All experienced developers know that some things are bad. It would be foolish to wait until there is a specific requirement to reuse a constant value before deciding to implement it as a named constant rather than as a magic-number spread through the code. There are a certain number of fundamentals that can be decided at the outset.
The core set of developers at the beginning of a big project should work together to agree what some of these starting points are.
It is immensely important for the project destination to be maintained in order that it remains useful. It needs to be updated and shared as decisions are made that change the collective understanding of the project. In a large project this requires active effort. Sufficient to ensure that good ideas, general principles, common project-specific design patterns and so on are shared by enough people to achieve consistency. This information can never be wholly captured in any model, text or artifact. It lives in people’s heads and the pictures, models and text are merely mechanisms for describing aspects of it.
In large projects it is vital that the organizational structures that the technical team use must encourage and assist-in the construction and maintenance of this shared mental model.
Development Team Organizational Structure
In reality the only way to keep this model for the project strong, fresh and relevant is to ensure that it remains fresh in the heads of a core group of developers. It is essential to success that this group of developers regularly work together, in order to keep the model fresh and rich, but they also need work apart in order to spread the word, we call them tech-leads, but their role is very closely related to that of XP coach.
This group is the principal tool in supplying the needed pressure for simplicity in a large project. This does not happen implicitly in the way that it can, and often does, in smaller-scale agile projects.
This group also plays a central role in the establishment and maintenance of effective cross-team communications. A good test of the effectiveness of the tech-lead team is that you should be able to ask any of them about the solution to a particular problem within the application and expect to get very similar responses no matter which member of the team you ask. This team will hold strongly consistent views of the application if they effective. They will share a model of the way that the application works and they will work to promote the understanding of this model and to ensure that when it is found lacking, it is changed to match the new needs of the project.
Members of the tech-lead group each work in separate functional teams; they are responsible for representing the overall goals of the project into their teams, and representing their teams out to the rest of the project. They are treated as design authorities, in that they have the say-so in any dispute over a design issue, and they work closely together, providing a technical perspective across the collection of development teams.
One individual has overall responsibility for the design and technical quality of the system. This person is treated as the design authority for the whole project. We refer to this role as the project architect, or Uber-TechLead when we want to make even more fun of them.
This role is much closer to the model of Fowler’s ‘Architect Oryzus’ than to ‘Architect Reloadus’. This individual works to develop and maintain consistency of vision, though the vision is not necessarily always theirs. They are the keepers of the vision and they are responsible ensuring that it is effectively communicated to the development team, and for ensuring a good enough picture of the reality of project to understand when the vision is becoming outdated, or too narrow to be effective.
Tech-leads rarely override design decisions unless those decisions compromise the metaphor, and architects very rarely override the decisions of tech-leads.
A common mistake is to select Tech-Leads and Architects only on the basis of their skill as developers. Tech-leads and architects in this model don’t have to be the best developers, or the best designers, but they do need to have certain qualities of leadership, they must have active experience of the code-base, and they do need to be the type of people that will be able to earn the respect of developers for their technical abilities in some measure.
The ability to maintain the big picture of the project and communicate this well to the development team of can be hard to achieve, but the people that are best at it usually have a good balance of being technically strong – though not necessarily the strongest, top down thinkers – though not to the exclusion of an ability reach developer-level detail in code, assertive – but not overbearing and collaborative – but not at the cost of indecision. These people can be hard to find!
In the past some agile practitioners have reacted badly to hierarchical organizational structures, I believe this to be more a reaction to more traditional development practices than against any sense of division of responsibilities or even hierarchical structures per-se. In reality, in any large project, skill differentiation is essential. No one can do everything so some form of division of responsibility must exist to maximize the productivity of the doers by removing the overhead of the responsibility for things like code-base-management, communication of the vision and agreeing on project wide change into other people’s heads. This is clearly hierarchical in several senses, but it is more about thinking about the application, it’s problems and solutions, at differing levels of abstraction.
In one sense the most important aspect of my role as an architect is to provide context for the decisions of my team-mates. Everyone has favorite metaphors for software development, mine is making a movie, my role is a lot like the that of a movie director. I try to get the best possible performance out my actors (developers) by ensuring that they always have the right context in mind for their actions when they are ready to perform (write code).
I am asked questions, every day, about detailed decisions in code, “Should we return a value here, or throw an exception?”, about cost benefit trade-offs “Should we spend the time to refactor this now or leave it as it is?”. I make decisions on technical priorities for my project “Because of our experience in production, we need to modify the logging so we can better focus it to the needs of the support team”. About half the time, the context I provide is that I know who the questioner should really talk to, because I don’t have the answers they need.
All of these decisions require a broad view of the application, which, in large projects, is different from the view of the system that is based on dealing with cards and Customers, who will, of necessity, be focussing on a specific functional piece of the whole project themselves.
This big-picture view is difficult to achieve as a side benefit of development on any single team, and so requires some investment of effort to achieve.
The fundamental point here is that it is important to establish roles and responsibilities across the team that ensure that the project is covered at several different levels of granularity, or abstraction. If not important aspects of the drive for simplicity and consistency will be lost.
Continuous Integration and Functional Testing
Both TDD and unit testing are wonderful however the tests that they produce are not necessarily the same thing.
For genuinely agile development on a large-scale to work it is essential that there exists a body of tests that are focussed on the intent of the application. These tests need to operate at a variety of levels, and in tests that are implementation-focussed at the level of a single class it is hard to achieve the essential coverage of intent.
Functional tests are clearly directed at the behavior of the application that is important to the Customer. Good automated functional tests are invaluable in providing a stable base for changing the application.
The most effective use of functional tests is in combination with a rich Domain Model where the functional tests focus on proving the effectiveness of the model in meeting the needs of the application. These tests are easy to isolate from technical dependencies, because they are focused on the Domain Model, and it is a very effective way of getting excellent functional test coverage, because though they are functional in focus, they are unit-test-like in implementation. This kind of testing needs some relaxation of the rule that unit tests should test a single class. This sort of test tends to test clusters of functionally related classes together.
Unfortunately as code size and functional coverage grow the cost of executing tests grows too. This is particularly true when the application does not have a strong Domain Model, because the technical dependencies are often harder to isolate, and so the tests themselves become more expensive to execute. This cost of test execution makes Continuous Integration (CI), a core tenet of most agile development practices, difficult to achieve.
We have projects that utilize clusters of build machines operating in parallel on pipe-lined builds. This level of sophistication is essential in order to keep the performance of our build at a sufficiently high frequency to support CI. We have started to discuss possible future build systems build on highly parallel builds, using clustered parallel hardware to support builds fast enough for our needs. Incidentally, this is a reflection of the value we place on CI in our development process! Even so, we are not always able to maintain tests that run quickly enough that we can afford to run all of the functional tests at check-in time.
We have found that even in those cases where we are unable to include our functional tests as part of our check-in stage, having them as part of our CI pipelined build is still important. On my current project our functional tests only run two or three times a day, but we prioritize fixing functional tests above all other activities apart from check-in build breakages, which are based on unit tests. We use a semi-CI approach.
Allowing to big a lag between functional test failures and applying fixes is another project failure mode that escalates in impact rapidly if not checked early. The development of functional tests needs to be intimately part of the development process, ensuring that developers spot breakages quickly and fix them rapidly is the best way of maintaining a body of functional tests that stay working through the project without the need for huge, hard to predict, integration phases.
One technique that we use to great effect is the use of Acceptance Test Suites. Acceptance Test Suites are collections of automated tests, organized by story card, that are focussed specifically on proving that the acceptance criteria of the story are met. Acceptance Test Suites have lots of nice properties, including making Customers and Business Analysts write nicer success criteria, but their principal benefit is that they put the writing and maintenance of functional tests clearly into the hands of developers and so focus the developers on the intent of the code they are writing not just the implementation.
Developers are not considered to have finished a card until the acceptance test suite passes. We build the success criteria collaboratively, using skills from different parts of the team with developers, Business Analysts and Quality Analysts working together to specify the desired outcomes.
Design is not Dead
There is a myth that Agile development removes the need for design, but the reality is that for any project, Agile development is ALL about design. The mistake is to assume that because Big-Up-Front-Design is bad that all design is bad.
In large projects appropriate attention to, and communication of, design decisions is essential for project success. This needs to be a little more formalized than in small projects, because the rate of divergence in large projects is so high.
As in all design simple is better than clever, the common failures of more traditional, less iterative, approaches to design are important to guard against, but this must not be at the expense of abdicating responsibility for maintaining a coherent consistent view of the application in the heads of the development team.
The information here represents no more than sketches of techniques that we have found to be successful in implementing large complex projects, while retaining the ability to steer the project in desired directions, and respond to change effectively throughout the project’s life.
The use of a rich domain model at the core of the application is of immense benefit. Of necessity, developers have to understand something of the problem domain in order to understand the cards that they are implementing. So if the all of the business logic of the application is encapsulated within the confines of a domain model that represents the business of the application, a domain model that is clean of technology dependencies which otherwise clutter the picture, the code becomes immediately more navigable and the vocabulary of the project becomes standardized, both are assets of vital importance in the large complex code-bases we are discussing here.
The adaptive nature of an agile approach to development means that all projects are always individual exercises in process development and optimization, but through experience on a variety of large complex projects I believe that the problems and techniques to address them described here are generic to all large projects. These techniques will always help, though each needs to be tailored and modified dynamically as the project progresses in order to meet the needs of the moment.
I have heard large projects sometimes likened to Super-Tankers, they are big, unwieldy, have huge inertia and take many miles to change direction. In my experience a successful large agile project is more like a flock of birds, some birds lead sections, because the followers suffer less aerodynamic drag as a result, and the flock as whole can turn at a moments notice in reaction to change.
Projects like this are a pleasure to work on, and the sense of confidence and empowerment amongst the development team is palpable – looking forward with relish to the next challenge that the project will throw at them.