I inadvertently found myself in the middle of a minor Twitter storm on the topic of diversity. The organisers of a conference that I attended made some, to me, intemperate remarks on the subject.

They were asked why there were so few women on the programme and responded in a way that came across as being, again, to my mind, overly aggressive and defensive.

I am a big fan of Twitter, but it is not really an effective vehicle through which to explore complex ideas. You can’t represent nuance in 140 characters.

So here are my thoughts on the topic.

I believe that our industry has a shameful history in terms of diversity. It is largely populated by young, white guys, at least in Western Europe and the USA. This is disproportionately the case, even compared to other technical disciplines.

This matters for a variety of reasons, social, emotional, political but also pragmatic.

It seems obvious to me that if I recruit people who are passionate about programming, enjoy science fiction, are obsessed with aeroplanes, aerobatics and physics, like playing the guitar and driving too fast (people like me) then we will tend to jump to similar conclusions and make similar mistakes.

I like working with people that have different ideas to me. I know that when I do this, it brings that best out of me, and I think it brings the best out of them. I think that we are at out most creative when we value ideas and work in teams that trust one-another sufficiently to feel free to debate those ideas freely and vigorously.

Software development is a VERY difficult thing to do well. I think we should maximise our chances of success by doing whatever it takes to be intelligent, creative and do great work. I believe that part of that is creating diverse teams. Not just having a smattering of women around, but teams populated with people from all sorts of different backgrounds, education, ethnic groups, sex or sexual orientation. I believe that this is one of the hallmarks of truly great teams.

The problem is that we live in an unfair world. I don’t know enough to solve the problems of inequality in our world. I am a software developer. I care deeply about these problems and believe that they represent an injustice, and a loss of potentialy great contributions. I believe that we can do better.

However, I also think that this is not a problem of the software conference industry. Poor representation of women at conferences did not get us into the problematic position that we find ourselves in. I have some sympathies for the conference organisers. Even though they responded in a manner that I thought at best intemperate, at worst inappropriate, there were mitigating circumstances.

At this conference there were 25 speakers. Of those, 2 were women. The organisers were criticised for the lack of women speakers, but actually in terms of representation of women in our industry that does not feel too far off being a proportionate number. The organisers words came across badly, but the English is not the first language of these conference organisers and I think that a reasonable interpretation of what they said is that they did not really get the nuance of what they implied with their comments.

I suppose that the key question is, what can be done to change the situation that we, as an industry, find ourselves in?

Well there are several positive examples that I am aware of. Even if we constrain ourselves to the conference arena. The Pipeline Conference, in London, does an excellent job of actively encouraging female speakers. They also work to eliminate subconscious bias in the selection process for talks by using a blind evaluation process. Submissions are stripped of identifiers so that the selection committee can’t be swayed by the sex, ethnic background, or fame of the prospective speakers.

QCon work hard to encourage and promote female speakers for their events and operate a code-of conduct for speakers and other participants that encourages and open and respectful approach to all.

These are important things, conferences are one public face of our industry and the roster of speakers will provide, albeit subliminally, an impression of what a software professional looks like and how they act.

However, even if we had a 50/50 split of men and women and a perfect representative sample of all cultural or ethnic groups at every conference, it would mean nothing if we don’t address the real problem, which is that there aren’t enough of these people in our industry. Worse than that, focusing only on women for a moment, they are after all 50% of the population, our industry no longer appeals to women and girls. We have driven them away to the extent that few even consider software development as a career.

I think that software is an important thing for the world. I feel privileged to have found a career that I love, that also happens to be interesting, challenging and pays pretty well. I don’t want that privilege to belong only to people like me.

I have done what I could in my career to treat people with respect, whatever their sex, sexual orientation, ethnic group or religious persuasion. That is not enough.

Posted in Culture | 1 Comment

Mob Rule?

I was at a conference last year where I saw Woody Zuill talking about “Mob Programming”. You can see that talk here

A very simple description of Mob programming, for those of you who don’t have time to watch Woody’s presentation, is “All the team working together on the same stuff”. Instead of dividing work up and allocating it to individuals or pairs, the whole team sits together and advises a “Driver” who has the only keyboard.

Interested but Skeptical

I thought it was an interesting, thought-provoking, even challenging idea. I confess that my first reaction was skepticism. To be honest, as a grumpy old man in my 50s, with a scientific-rationalist world-view my reaction to most things is skepticism. I think of it as a healthy first response 😉

I thought about my skeptical responses, “That can’t possibly be as efficient as the team dividing the work”, “Surely some people will just sit back and not really contribute”, “It is going to be dominated by a few forceful individuals”, “How could you design anything complex?”. Once I started voicing them, they seemed familiar. These are precisely the kind of responses that I get when I talk to teams about adopting Pair Programming.

Now, I am a passionate advocate for Pair Programming. I believe that it makes teams stronger, more efficient, more cohesive and allows them to produce significantly higher-quality work. So I was in a quandary.

On one hand, “this can’t possibly be efficient, can it?” on the other “Working collaboratively produces better results”. I am scientific rationalist enough to retain an open-mind. I assumed that that would be it. I assumed that, given the nature of my work as a consultant, I would never experience Mob Programming personally and so would only ever see it from the perspective of a distant, outside observer. I was wrong.

Invited to join the Mob

I have some friends working in a start-up company, http://www.navetas.com/ who have recently experimented with Mob Programming and have been practicing it for a few months now. I know them through my good friend, Dave Hounslow, who used to work there. The team very kindly invited both of us to spend the day with them and “join the mob”.

Before trying Mob Programming, the team was already fairly advanced in their use of agile development and Continuous Delivery. Dave had helped them to establish a strong culture of automated testing and an effective deployment pipeline. They were used to working collaboratively and doing pair programming, TDD and automated acceptance testing.

The team is fairly small, 5 developers, they saw a presentation on Mob programming and decided to try it as an experiment for a single iteration, and have never gone back to their previous mode of work, pair-programming.

Introductions, Process and People

Dave and I arrived and after coffee and introductions, we attended the team stand-up meeting. The team are thinking of changing the stand-up because it no longer has a role of establishing a shared understanding, they do that all day, every day, working together in the Mob. However there is one team member who works remotely on customer service and so this is an opportunity to catch-up with her.

We then spent a bit of time in front of a whiteboard while the team described their system to us. Dave knew it a bit from when he used to work there, it was new to me. They described their architecture and then the story that we would be working on, and then we began.

The team have invested a bit of time and infrastructure to support their mob programming. They have a couple of nice big monitors so that everyone can see what is going on as the code evolves and a timer that reminds them, when to swap who is driving.

The approach works by giving everyone a turn at the keyboard, including newbies like Dave and I. The timer is set for 15 minutes. Each person gets 15 minutes at the keyboard and then everyone shifts where they are sitting with a new person taking the keyboard. It is not totally rigid, so if you are at the keyboard and in the middle of typing a word, you can finish, but the team try to respect the time-slots. They avoid one person hogging the keyboard.

We sat in a loose semi-circle with the person at the keyboard placed at the centre. We all then offered advice on what to do and where to go next. When the time was up everyone shuffled round one place, musical chairs without the music. The next person in sequence would take the keyboard and we would continue the conversation and design of the code. I think that the simple act of getting up and shifting seats helped keep the engagement going throughout the day. Tiny as it was, the act of moving sharpened the focus, just a little bit.

Mob Programming1

Sounds chaotic doesn’t it? However, it wasn’t. There were occasions when the discussion went on enough that there was no typing during the 15 minute period but on the whole that wasn’t the case. On the whole we made steady progress towards the aims of the story.

Mob Programming2

There was a lot of talking, this wasn’t dominated by one or two people everyone had their say. There was, inevitably some variance. The level of experience of the team varied widely both in terms of general software development experience and in terms of experience of this project. So at different points different people had more or less to contribute. I was a complete newby to the system and so I asked more questions than the others and could mostly contribute on general issues of design and coding rather than specifics of the existing system. Others knew the system very well and so would contribute more when specific questions arose. This is one of the benefits!

Optimise for Thinking, Not Typing

The story we were working on was one of those stories that was exploring some new ideas in the system. It was gently challenging existing architectural assumptions. At least that was the way that I perceived it. This was a good thing. This wasn’t a trivial change, we had new territory to explore. We certainly spent more time talking than typing, but then I have never admired “lines of code” as a useful metric for software. I much prefer to optimise for thinking rather than typing. So while the amount of code that we produced during the day was relatively small, I think that it was higher quality than any of us would have produced alone.

The conversations were wide ranging, and often I felt that the presence of two outsiders, Dave and me, tilted the conversation more in the direction of reviewing some past decision than may otherwise have been the case. However, I also felt as though we both added some value to the discussions and the final output of the day, working code.

Assumptions Fulfilled?

So what about the questions that I started with?

“It can’t possibly be as efficient…”. Well the team have tracked their velocity and have seen a significant increase in the number of stories that they produce. I know that velocity is really only a pseudo-measure of progress, who is to say that this weeks stories are the same size as last? Nevertheless, the team subjectively feel that they are now moving significantly more quickly and producing a higher-quality output and the data that they have seems to back this up.

“People will sit back and not contribute”. There were some people that spoke less than others, but everyone was engaged throughout the day and everyone contributed to some degree.

“It will be dominated by forceful individuals”. There were a few “forceful personalities” present, myself and Dave included. Nevertheless the team seemed to me to listen carefully and consider ideas from everyone. It is certainly true that some of us present talked more and were a bit more directive than others in the group, nevertheless I felt as though the outcome was a group-driven thing.

“How could you design anything complex?”. This is an old chestnut that people new to TDD and Pair Programming raise all the time. I am completely relaxed about the ideas of incremental design. In fact I believe that it is the only way that we can solve complex problems with high quality. I was pleased that the story that was chosen for the day had a bit of substance to it. It wasn’t a deeply challenging problem, but did stress some architectural assumptions. I am confident that we, the whole group, moved the thinking about the architecture of the system forwards during the course of our work that day.

Justified Skepticism?

So what do I think about Mob Programming now that I have tried it? Well I am still on the fence. I finished that day having had my beliefs challenged. This is not a crazy, inappropriate waste of time by any stretch of the imagination.

This was a small team that was considerably disrupted by having two newbys, Dave and me, drop-in for the day. I think tha for any team this small, this would have been disruptive. I seriously doubt that, for most teams, the new people would have made much of a contribution. Dave and I are both experienced software developers and both used to working as consultants and adding value quickly. Nevertheless I think that we added more value, more quickly than would usually be the case. We made suggestions that the team tried immediately. We learned about the system more quickly, to a level of detail that surprised me. Again, in any other process, I think that we would have spent more time either doing introductory stuff or applied a narrower focus and worked in a smaller area of the code – on our first day.

I was extremely impressed that this small team could accommodate the disruption of, nearly, doubling in size for a day and not only do useful work, but actually do so in a way that moved on their thinking.

At the human level this is a nice way to work. You have the conversations that are important. You share the context of every decision with the whole team all the time. You laugh and joke and share the successes as well as the disappointments.

These days I work as an independent consultant, advising people on how to create better software faster. I am intrigued at the prospect of using Mob programming as a mechanism to introduce small teams to new ideas. I think it could be a wonderful instructional/learning tool. If anyone fancies carrying out that experiment you should give me a call 😉

So where are my reservations, why do I feel like I am still “on the fence” rather than a true believer? Well, there was a bit coaching by one of the participants to organise the others. I honestly don’t know if the process would have worked better or worse without him directing it quite so strongly, but I can imagine it breaking down if you had the wrong mix of people. That is a weak criticism, any process or approach can be disrupted by the wrong person or group of people. My point is that if you have a bad pair, you can move on and pair with someone else the next day. If you have a bad Mob you have to be fairly strong minded to face the problem and fix it. It will force you to have some difficult conversations. Perhaps no bad thing if you do it, but I know of many teams that would shy away from such a social challenge.

There must be some natural limit. The idea of a Mob of 200 people working on the same thing is ludicrous. So where is the boundary? I am a believer in the effectiveness of small teams. So I wouldn’t have a team of 200 people in the first place. However, I wonder how well this would work with larger, still small, teams? There were 6 of us in this Mob and it worked well. Would it have continued to do so with 8, 10 or 12? If I am honest I think that a team of 12 is too big anyway, but it is a valid question, what are the boundaries?

There are times when I want some time to think. If I am pairing and I hit such a point it is simple to say to my pair – “let’s take a break and come back to this in 30 minutes”. It is harder to make that call if everyone is working in a Mob.

On the whole I had a fascinating day. I would like to extend my thanks to the folks at Navetas for inviting me to join their Mob and experience this approach first hand. It was a lot of fun and I learned a lot.

Posted in Agile Development, Culture, Effective Practices | Leave a comment

Test *Driven* Development

Before Test Driven Development (TDD) the only thing that applied a pressure for high-quality in software development was the knowledge, experience and commitment of an individual software developer.

After TDD there was something else.

High quality in software is widely agreed to include the following properties.

High quality software:

    • Is modular.
    • Is loosely-coupled.
    • Has high cohesion.
    • Has a good separation of concerns.
    • Exhibits information hiding.

Test Driven Development is only partially about testing, of much greater importance is its impact on design.

Test Driven Development is development (design) driven by tests. In Test Driven Development we write the test before writing code to make the test pass. This is distinct from Unit Testing. TDD is much more than “Good unit testing”.

Writing the test first is important, it means that we always end up with “testable” code.

What makes code testable?

    • It is modular.
    • Loosely-coupled.
    • Has high cohesion.
    • Has a good separation of concerns.
    • Exhibits information hiding.

Precisely the same properties as those of high quality code. So with the introduction of TDD we now have something else, beyond the knowledge, experience and commitment of a programmer to push us in the direction of high-quality. Cool!

Like to know more?…









Posted in Agile Development, Continuous Delivery, Effective Practices, Software Design, TDD | 2 Comments


Motivation is a slippery thing. My favourite example is described by the writer Dan Pink. He tells the true story of a Nursery who, like many Nurseries, had a problem with parents turning up late to collect their children. This is a big problem for such organisations. So what do you do in this situation? Well the obvious answer, which this Nursery tried, was to introduce a series of fines. If you arrived later than you should, you incurred a fine. If you arrived very late you incurred a bigger fine.

It is obvious that that is a deterrent, right? Well no. In fact the late collection problem got dramatically worse. What had happened is that previously the parents understood that there was a social compact. You collected your children on-time because it was bad manners, if nothing else, not to. Now the Nursery had put a price on it. The fine was the fee for looking after your children for longer. The Nursery had removed the social incentive for doing the right thing and replaced it with a financial one that described how much it cost to do the wrong thing. Parents decided to pay for the extra child-care!

These things are done with good intentions but the results are counter-productive. As I said, motivation is a slippery thing!

Metrics in particular, one form of motivation, are very difficult to get right. We often hear organisations talk about being “data-driven” and “the importance of KPIs” in driving good behaviours, but do these things really work? It is all too easy to define a metric that drives exactly the wrong behaviour.

How many times have you seen something that looks crazy and asked “why on Earth do we do that?” only to hear the answer “It is because that group of people are measured on xyz”.

“Why do our sales people sell stuff that isn’t ready?”, “Because they are incentivised to sell”.

“Why do our developers create poor quality code?”, “Because they are incentivised to create lots of features”.

“Why do our operations people slow things down?”, “Because they are incentivised on the stability of the system”.

However, there is one measure that, as far as I have seen so far, does not lead to any inappropriate gaming or misdirected incentives – Cycle Time!

I argue that Continuous Delivery is really about one thing. Having an idea, getting that idea into the hands of our users and figuring our what they make of it. So we should optimise our software development processes for that. Whatever it takes.

Cycle Time is a measure of that process. Imagine the simplest change to your production system that you can think of. We want it to be simple so that we can ignore the variable cost of development. Now imagine that change going through all of the normal processes to get it prioritised, scheduled, defined, implemented, tested, verified, documented and deployed into production. Every step that a change to production would normally take. The time that it takes to complete all of those steps, plus the time that the change spends waiting between steps, is your Cycle Time. This is a great proxy for measuring the time from “idea” to “valuable software in the hands of users”.

I believe that if you take an empirical, iterative approach to reducing Cycle-Time then pretty much all of Agile Development, Lean thinking, DevOps and Continuous Delivery practice will fall out as a natural consequence.

I once worked on a demanding, high-performance complex system that processes billions of dollars of other people’s money on a daily basis. This was a complex enterprise system, it included high-performance services, account management, public APIs, Web UIs, administration tools, multiple third-party integrations in a variety of different technologies, data-warehouses the lot. We had a Cycle Time of 57 minutes. In 57 minutes we could evaluate any change to our production system and, if all the tests passed, be in a position to release that change into the hands of users.

Now think about the consequences of being able to do that.

If you have a Cycle Time of 57 minutes, you can’t afford the communications overhead of large teams. You need small compact, cross-functional, efficient teams.

You can’t afford the hand-overs that are implicit in siloed teams. If you divide your development effort up into technical specialisms you will be too slow. You need cross-functional collaborative teams to ensure a continual flow of changes.

You can’t rely on manual regression testing. You need a great story on automated testing. Human beings are too slow, too inefficient, too error prone and too expensive.

You can’t rely on manual configuration and management of your test and production environments. You need to automate the configuration management, automate deployment and you will need a good story on “Infrastructure as code”.

You can’t have a Cycle Time of 57 minutes and have hand-offs between Dev and Ops.

You can’t have a Cycle Time of 57 minutes if your business can’t maintain a constant smooth flow of ideas.

You have to be very good at a lot of aspects of software development to achieve this kind of cycle-time. If you can confidently evaluate your changes to the point where you are happy to release into production in under an hour, without any further work, you are doing VERY well!

Optimising for short Cycle Time drives good behaviours. It is not just that you have to be good to achieve this, striving to improve your cycle-time will help you to improve your development process, culture and technology. It will force you to address impediments and inefficiencies that get in your way. This is a metric that doesn’t seem to have any bad side effects.

Many people are nervous that reducing Cycle Time will reduce quality. My experience, and that of the industry, is that the reverse is true. What happens is that by reducing Cycle Time you reduce batch-size. By reducing batch-size you reduce the risk of each change. Each change becomes simpler and lower risk. 66% of organisations that practice claim to practice Continuous Delivery say that quality goes up, not down1. Personally I am not too sure what the other 34% are doing wrong 😉

If you have a short Cycle-time, you can, and will, release change in small batches. Think about each change. Each change will be small and simple. Easy to reason about. If you release only once every few months, then you will be storing up lots of changes. Let’s imagine that each change has a small amount of risk associated with it. So the total risk for any release is going to be the sum of all of those risks.

Hmmm, not quite! As well as the sum of the risks associated with each change, there is going to be a combinatorial effect. What if my change interacts with your change? So there is an additional risk associated with the collection of changes. The probability of one of these risks being realised will grow exponentially as more changes are combined. The more changes are released together, the higher the risk that two, or more, changes will interact in unexpected ways. So the total risk is going to be something like the sum of all the risks associated with each change plus the risk that two or more changes will interact badly. Now imagine releasing changes one at a time, the second set of risks, the risks for which the probability of them occurring will increase exponentially with the number of changes, disappear all together. So overall many small changes is a much less risky strategy than fewer larger changes.

A few years ago I worked with a team building some complex software in C++. This development team was very good. They had adopted an automated testing approach some years before. They were well ahead of industry norms in that they operated a process based on an overnight build. Each night their automated systems would run, build the software and run their automated tests against it. The build and tests took about 9 hours to complete. Each morning the team would look at the results and there would be a significant number of test failures.

I spoke to one of the developers who had been working this way for the past three years. He told me that in that three year period there had been four occasions when all of the tests had passed.

So, the team did what teams do and adapted. Each morning they would look at the test results and only release the modules for which all of the the tests had passed. This is a reasonable strategy as long as none of the components interact with one another. Mostly they didn’t, but some components did. So now the team is releasing untested combinations of software into production which may or may not work together. As a result this team often saw problems deploying new features into production because of incompatibilities with older versions of components that they depended upon.

I argued that cycle-time was important, a driver for good behaviour and outcomes. I won the argument enough to give it a try.

We worked hard on the build. We invested a lot of time, money and effort on experimenting with different approaches. We parallelised the build, improved incrementalism, we bought some massive severs and triaged the tests into groups, dividing the build into a deployment pipeline. We moved from a 9 hour overnight build to a 12 minute commit stage (running the vast majority of the tests) followed by a slower (1 hour) Acceptance test stage. The “Acceptance Test” designation was fairly arbitrary in this case. If a test was too slow, we moved it to the “Acceptance Test Stage”.

The results were quite dramatic. In the first two week period, following the introduction of this new build, we saw three builds where all of the tests passed – compared to four in the previous three years. In the following two week period there were multiple successful (all tests passing) builds every day. The process now switched, instead of cherry-picking modules with passing tests, we could release all of the software together, or not at all. Each morning we would simply deploy the newest release candidate that had passed all the tests.

Now we could have more confidence that these components would work together. Now we could begin to improve our testing of scenarios that crossed the boundaries between components. Now we could be more thorough!

Reducing cycle-time drives good behaviours. It encourages us to establish concrete, efficient feedback loops that allow us to learn and adapt. The team in my war-story above was not different before and after the change in process. The change in approach, the focus on cycle-time, gave them insight into what was going wrong and an opportunity to learn. They could quickly and efficiently experiment with solutions to any problems that arose. This is a very powerful thing!

Cycle-time drives us in the direction of lower-risk release strategies. It encourages good practice and it moves us in the direction of higher-quality development practices. I encourage you to optimise your development process to reduce cycle-time. I believe that you will find that it improves almost everything that you do.

1 CA Technologies “DevOps Survey” 2015

Posted in Agile Development, Continuous Delivery, Culture, Effective Practices | 2 Comments

RedGate Webinar Q&A

I recently took part in a Webinar for Database tools vendor Redgate. At the end of the Webinar we ran out of time for some of the questions that people had submitted, so this blog post provides my answers to those questions.

If you would like to see the Webinar you can find it here.


Q: “How can we overcome the “we’ve always done it that way” group-think mentality?”

Dave: For me the question is, “is your process working now as well as you want it to?” if not I think you should try something else.  

I believe that we have found a better way to deliver valuable, high-quality, software to the organisations that employ us. The trouble is that it is a very different way of working. Mostly people are very wary of change, particularly in software development, where we have promised a lot before and not delivered.  

The only way I know to move a “group-think” position is gradually. You need to make a positive difference and win trust. It is about looking at real problems and solving them, often one at a time.  

I believe that we, the software industry, are in a better place than we were, because we finally have the experience to know what works and what does not. The trick now is to migrate to the approaches that work. This take learning, because the new approaches are very different to the old, it challenges old assumptions. It is helpful to get some guidance, hire people that have some experience of this new way of working, read the literature, and carry out small, controlled experiments in areas of your process and business that will make a difference.  

I often recommend to my clients that they perform a “Value-stream analysis” to figure out where they are efficient at software delivery and where they are not. This is often an enlightening exercise, allowing them to easily spot points that can be improved. Sometimes this is technology, more often it is about getting the right people to communicate effectively. 

Once you have improved this problem, you will have improved the situation and gained a little “capital” in the form of trust that will allow you to challenge other sacred-cows. This is a slow process, but for a pre-existing organization it is the only way that I know. 

Q: “What advice would you have for gaining management buy-in for continuous delivery?”

Dave: Continuous Delivery is well-aligned with management ambitions. We optimise to delivery new ideas, in the form of working software, to our users as quickly and efficiently as possible. The data from companies that have adopted CD is compelling, it improves their efficiency and their bottom-line. Many of the most effective software companies in the world employ CD. 

The problem is not really the ideals, it is the practice, what it takes to get there. CD organizations look different to others. They tend to have many small teams instead of fewer large ones. Each team has a very high degree of autonomy, many don’t really have “management” in the traditional sense. So this can be very challenging to more traditional organizations. 

The good news is that the way to adopt CD is by incremental steps. Each of these steps is valuable in it’s own right, and so each can be seen as a positive step. If you don’t user version control – start. If you don’t work iteratively, and regularly reflect on the outcomes of your work so that you can correct and improve – start that. If you don’t employ test automation, or deployment automation or effective configuration management start those things too. Each of these steps will bring a different benefit, over time they reinforce one-another so you get more than the sum of the parts. 

There are several CD maturity models, there is one in the back of my book, which can offer guidance on what to try next here is another that I have used: http://www.infoq.com/articles/Continuous-Delivery-Maturity-Model 

Q: “We are very early in the stages of DB CD process changes, what are the most important issues to tackle early?” 

Dave: That is quite a tough question to answer without the consultants lament “It depends” 😉 

I think that the fundamental idea that underpins CD is to take an experimental approach to everything, technology, process, organization the lot. Try new things in small controlled steps so that if things go wrong you can learn from it rather than regret it. 

At a more technical level, I think that version controlling pretty much everything, automated testing and continuous integration are corner-stones. If you are starting from scratch, it is much easier to start well with automated testing and continuous integration than to add these later. It is not impossible to add them later, it is just more difficult. 

So be very strict with yourselves at first and try working so that you don’t make ANY change to a production system without some form of test. This will feel very hard at first if you are new to this, but it really is possible. 

Q: “Are there any best practices you’d especially recommend we bear in mind?” 

Dave: There is a lot to CD. I tend to take a very broad view of its scope and so it encompasses much of software development. At that level the best practices are grounded in Lean and Agile principles. Small, self-directed teams, working to very high quality standards, employing high levels of automation for tests and deployment are foundational. 

At the technical level there are lots at all different levels of granularity. I guess the key idea from my book is the idea of the “Deployment Pipeline” this is the idea of automating the route to production. A good mental model for this is to imagine that every change that is destined for production gives birth to a release-candidate. The job of the deployment pipeline is to prove that a release candidate is NOT FIT to make it into production. If a test fails, we throw the release candidate away. 

Q: “What are some common practical issues that people encounter during the implementation of CD?” 

Dave: I have covered some of this in the preceding answers. Most of the problems are people problems. It is hard to break old habits. At the technical end, the most common problems that I have seen have been very slow, inefficient builds, poor, or non-existent, automated deployment systems and poor, or non-existent, automated tests. 

Q: “What would be the fastest way to actually perform CD?” 

Dave: The simplest way to start is to to start from scratch, with a blank sheet. It is easier to start a new project or new company this way than to migrate an existing one.  

I think it helps to get help from people that have done this before. Hire people with these skills and learn from them. 

Q: “We deal with HIPPA regulated data and I am personally unsure of letting this data out. How does CD typically get implemented in highly regulated environments? Are there particular challenges?” 

Dave: The only challenge that I perceive is that regulators are often unfamiliar with the ideas and so their assumptions of what good regulatory conformance looks like is tailored with, what to me, looks like an outdated assumption of development practice. 

My experience of working in heavily regulated industries, mostly finance in different countries, is that the regulators quickly appreciate this stuff and they *love* it.  

CD gives almost ideal traceability, because of tour very rigorous approach to version control and the high-levels of automation that we employ we get FULL traceability of every change, almost as a side-effect. In the organizations where I have worked in the finance industry, we have been used as bench-marks for what good regulatory compliance looks like. 

So the challenge is educating your regulators, once they get it they will love it. 

Q: “How should a data warehouse deal with a source database which is in a CD pipeline?” 

Dave: As usual, it depends. The simplest approach is to treat it like any other part of the system and write tests to assert that changes work. Run these tests as part of your deployment pipeline.  

If not you need to take a more distributed, micro-service style approach. In this approach try and minimize the coupling between the Data Warehouse and the up-stream data sources. Provide well-defined, general, interfaces to import data and make sure that these are well tested. 

Q: “How do you recommend we use CD to synchronize, deploy and verify complex projects with database,  Agent Job,s SSIS packages and SSRS reports.” 

Dave: I would automated the integration of new packages as part of my deployment pipeline. I would also look to create automated tests that verify each change, and run these as part of my pipeline. 

Q: “How would you deal multiple versions of a database (e.g. development , internal final test, and a version for customer), and do you have any advice for the automatic build and deploy of a database?” 

Dave: I recommend the use of the ideas in “Refacoring Databases” by Scott Ambler and Pramod Sadalage. 

Q: “Do you have any tips for enabling rapid DB ‘resets’ during build/test? E.g. How to reset DB to known state before each test?” 

Dave: A lot depends on the nature of the tests. For some kinds of test, low-level unit-like tests it can be good to use the transactional scope for the duration of the test. At the start of the test, open a transaction, do what you need for the test, including any assertions, at the end of the test abort the transaction. 

For higher-level tests I like the use of functional isolation. Where you use the natural functional semantics of your application to isolate one test from another. If you are testing Amazon, every test starts by creating a user account and a book. If you are testing eBay every test starts by creating a user account and an auction…. 

You can see me describing this in more detail in this presentation – I am speaking more generally about testing strategies and not specifically about the DB, but I think that the approach is still valid. https://vimeo.com/channels/pipelineconf/123639468 

Q: “I’m concerned about big table rebuilds not being spotted until upgrade night.  Also obscure feature support like FILESTREAM. Do you have any tips for avoiding these kinds of last-minute surprises or dealing with a wide mix of systems?” 

Dave: I tend to try to treat all changes the same. I don’t like surprises either, so I try to find a way to evaluate every change before it is released into production. So I would try to find a way to automate a test that would highlight my concerns and I would run this test in an environment that was sufficiently close to my production environment to catch most failures that I would see there.  

Q: “Do you have any advice for achieving zero down time upgrades and non breaking on-line database changes?” 

Dave: I have seen two strategies work. They are not really exclusive of one another. 

1) The microservice approach, keep database scoped to single applications and create software that is tollerant of a service not being available for a while. I have done some work on an architectural style called “Reactive Systems” which promotes such an approach. 

2) Work in a way that every change to your database is additive. Never delete anything, only add new things, including schema changes and transactional data. So ban the use of UPDATE and DELETE 😉 

Q: “Managing db downtime and replication during CD” 

Dave: See comments to preceding question 

Q: “How do you craft data repair scripts that flow through various development environments?” 

Dave: I generally encode any changes to my database as a delta. Deployment starts from a baseline database image and from then on changes are added as deltas. Each copy of my database includes a table which records which delta version it is at. My automated deployment scripts will interrogate the DB to see which version it is at. It will look at the deltas to see which is the newest and it will apply all of the deltas between those two numbers. This approach is described in more detail in Pramod and Scott’s book. 

I think of the delta table as describing a “patch-level” for my DB. So two DBs at the same “patch-level” will be structurally identical, though they may contain different transactional data. 

Q: “What are some of the community-supported open source C.D. applications that would work well for an enterprise org that currently doesn’t have C.D.?” 

Dave: If you are going to take CD seriously you are going to want to create a pipeline and so coordinate different levels of testing for a given release candidate. So build management systems are a good starting point, Jenkins, TeamCity and Go from ThoughtWorks are effective tools in this area. 

I think that the tools for automated testing of DBs are still relatively immature, most places that I have seen use the testing frameworks from application programming languages and grow their own tools and techniques from there.  

RedGate have tools for versioning DBs. I haven’t used them myself, but they have a good reputation. My own experience is that up to now I have used conventional version control systems, Subversion or GIT, and stored scripts and code for my DB there. 

Q: “Which tools make CD (in general, and for the database) easier?” 

Dave: See above. 

Posted in Agile Development, Continuous Delivery, Effective Practices, External Post | Leave a comment

Pair Programming – The Most Extreme XP Practice?

I was an early adopter of Extreme Programming (XP). I read Kent’s book when it was first released in 1999 and, though skeptical of some of the ideas, others resonated very strongly with me.

I had been using something like Continuous Integration, though we didn’t call it that, in projects from the early-1990s. I was immediately sold when I saw JUnit for the first time. This was a much better approach to unit testing than I had come across before. I was a bit skeptical about “Test First” – I didn’t really get that straight away (my bad!).

The one idea that seemed crazy to me at first was Pair Programming. I was not alone!

I had mostly worked in good, collaborative, teams to that point. We would often spend a lot of time working together on problems, but each of us had their own bits of work to do.

With hindsight I would characterise the practices of the best teams that I was working with then as “Pair Programming – Light”.

Then I joined ThoughtWorks, and some of the teams that I worked with were pretty committed to pair programming. Initially I was open-minded, though a bit skeptical. I was very experienced at “Pairing – Light”, but assumed that the only real value was at the times when I, or my pair, was stuck.

So I worked at pairing and gave it an honest try. To my astonishment in a matter of days I became a believer.

Pair programming worked, and not just on the odd occasion when I was stuck, but all the time. It made the output of me and my pair better than either of us would have accomplished alone. Even when there was no point at which we were stuck!

Over the years I have recommended, and used, pair programming in every software development project that I have worked on since. Most organisations that I have seen try it, wouldn’t willingly go back, but not all.

So what is the problem? Well as usual the problems are mostly cultural.

If you have grown up writing code on your own it is uncomfortable sitting in close proximity to someone else and intellectually exposing yourself. Your first attempt at pair-programming takes a bit of courage.

If you are one of those archetypal developers, who is barely social and has the communication skills of an accountant on Mogadon1 this can, no-doubt, feel quite stressful. Fortunately, my experience of software developers is that, despite the best efforts of pop-culture, there are fewer of these strange caricatures than we might expect.

If you are a not really very good at software development, but talk a good game, pair-programming doesn’t leave you anywhere to hide.

If you are a music lover and prioritise listening to your music collection above creating better software, pairing is not for you.

There is one real concern that people ignorant of pair-programming have,  and that is “Isn’t it wasteful?”.

It seems self-evident that two people working independently, in-parallel, will do twice as much stuff as two people working together on one thing. That is obvious, right?

Well interestingly, while it may seem obvious, it is also wrong.

This may be right for simple repetitive tasks, but software development isn’t a “simple repetitive task”. Software development is an intensively creative process. It is almost uniquely so. We are limited by very little in our ability to create the little virtual universes that our programs inhabit. So working in ways that maximise our creativity is an inherent part of high-quality software development. Most people that I know are much more creative when bouncing ideas off other people (pairing).

There have been several studies that show that pair-programming is more effective than you may expect.

In the ‘Further Reading’ section at the bottom of this post, I have added a couple of links to some controlled experiments. In both cases, and in several others that I have read, the conclusions are roughly the same. Two programmers working together go nearly twice as fast, but not quite twice as fast, as two programmers working independently.

“Ah Ha!”, I hear you say, “Nearly twice as fast isn’t good enough” and you would be right if it wasn’t for the fact that in nearly half the time it takes one programmer to complete a task, two programmers will complete that task and do so with significantly higher quality and with significantly fewer bugs.

If output was the only interesting measure, I think that the case for pair-programming is already made, but there is more…

Output isn’t the only interesting measure! I have worked on, and led, teams that adopted pair-programming and teams that didn’t. The teams that adopted pairing were remarkably more effective, as teams, than those that didn’t.

I know of no better way to improve the quality of a team. To grow and spread an effective development culture. I know of no better way, when combined with high-levels of automated testing, to improve the quality of the software that we create.

I know of no better way to introduce a new member of the team and get them up to speed, or coach a junior developer and help them gain in expertise or introduce a developer to a new idea or a new area of the codebase.

Finally, the most fun that I have ever had as a software developer has been when working as part of a pair. Successes are better when shared. Failures are less painful when you have someone to commiserate with.

Most important of all the shared joy of discovery when you have that moment of insight that makes complexity fall away from the problem before you is hard to beat.

If you have never tried pair programming, try it. Give it a couple of weeks, before assuming that you know enough to say it doesn’t work for you. If your manager asks why you are wasting time, make an excuse. Tell them that you are stuck, or just “needed a bit of help with that new configuration”. In the long run they will thank you, and if not find someone more sympathetic to what software development is really about.

Pair programming works, and adds significant value to the organisations that practice it. Give it a try.

1A Medical Drug used as a heavy sedative.

Further Reading:

The Case for Collaborative Programming’ Nosek 1998
Strengthening The Case for Pair Programming’ Williams, Kessler, Cunningham & Jeffries 2000
Pair Programming is about Business Continuity‘ Dave Hounslow

Posted in Agile Development, Culture, Effective Practices, Pair Programming | 2 Comments

The Anatomy of an Experimental Organisation

I am a software developer. I see the world from that perspective. In reality though that is only one viewpoint. While it is important that we are effective at delivering software, what really matters is that we are effective at delivering business value.

When I describe Continuous Delivery to people I generally spend a fair amount of time impressing on them that it is not about tools and technicalities. It is not even about the relationship between developers and operations or product owners and testers.

Continuous Delivery is about minimising the gap between having an idea and getting that idea, in the form of working software, into the hands of users and seeing what they make of it.

This vital feedback loop is at the core of not just good development, but of good business too.

I have been lucky to work in, or with, several companies that I would describe as Agile companies, not just companies practicing Agile development. These organisations are fundamentally different in approach in almost everything that they do.

The principal characteristic of this kind of organisation is that they are experimental in their approach to everything.

For some more traditional organisations this sounds scary. “Experimenting, that sounds like you don’t know what you are doing”. Well, yes, we don’t know what we are doing, none of us really know!

To quote one of my heroes, Richard Feynman:

“It doesn’t matter how intelligent you are, if you guess and that guess cannot be backed up by experimental evidence – then it is still a guess!”

Research into the performance of successful organisations says that 2/3rd of their ideas that are implemented in software produce zero or negative value:

“Only one third of the ideas tested at Microsoft improved the metric(s) they were designed to improve”

“Avinash Kaushik wrote in his Experimentation and Testing primer that “80% of the time you/we are wrong about what a customer wants”

“Mike Moran wrote that Netflix considers 90% of what they try to be wrong”
(Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, Nils Pohlmann 2013).


Two thirds, or more, of the ideas of GOOD companies are waste.

Further, the experience of these companies, and the data, says that nobody can tell which third of ideas are the good ones before they are delivered. Clearly there is a place for guessing and predicting customer demand for innovative ideas, but it is important to remember that for every iPhone that you invent, you will have to go through three or four Newtons first.

If we are generally so bad at guessing, then the only sane strategy is to embrace the uncertainty. Optimise to have lots of ideas and to evaluate them quickly so that you can discard the poor ones as fast, and cheaply, as possible.

This is what really effective organisations do. Watch Henrik Kniberg’s wonderful descriptions of the Spotify Culture.
See how Netflix work.

These companies know that their ability to guess is poor, because they are becoming more scientific in their approach. Instead of guessing what their customers want, and then assuming that their customers like what they are getting, they are measuring. These companies are designing experiments. They define metrics, that will identify what their customers think and then carry out the experiment and reflect on the results in order to learn and improve.

So how do these experimental companies differ from others?

Value Innovation over Prediction

Innovation is the thing that differentiates the great companies from the rest. Great companies create new markets, provide new products or services that change how people do things.

More traditional companies value predictability. The trouble is that you can’t be both predictable and innovative at the same time. They are different ends of a spectrum. The only really predictable thing is status-quo. So if you want your company to be great, you need to value innovation and discard your reliance on predictability – at least for some of your products.

One aspect of this is launch-dates. Apple don’t pre-announce their products, they are very secretive. Only when the products are ready do they announce. This allows them to strive for “Insanely great” as Steve Jobs so memorably put it.

State your hypothesis

This shared vision is kind of fractal. It operates at all levels in effective organisations. Being hypothesis driven helps to establish this clarity, this shared purpose. Useful hypotheses can range from, stating what you expect the outcome of a test to be before you execute it to stating how you think your global business strategy will work out.

Some people even talk about “Hypothesis Driven Development“.

Seek the right experiment

As soon as you have a new idea. Whatever its nature, the next question should be “How can we test this idea?”. If it is an idea about how to improve your process, figure out an experiment to see if it works: “Ok, let’s try doing without the estimation meeting for a while and see if we save time”. If it is an idea about improving your product figure out an experiment to test that too: “Let’s release this feature and A/B test it against the existing service to see which one makes more money”.

Establish Feedback Loops

Experimentation means nothing without the closure of the feedback loop. After each experiment there should be a way to evaluate the results and figure out what to do next. The outcome from each experiment should be a real action of some kind. Depending on the results of the experiment your should either: Drop the change; Adopt the change; Run through the cycle again with a new experiment to learn more.

The establishment of effective feedback loops is a vital attribute of experimental organisations. Continuous Integration, Continuous Delivery, Experiments in Production, Test Driven Development, Retrospective meetings and Incident Reviews are all mechanisms for establishing effective feedback.

Assume the fallibility of experts

All of the really high-performance organisations that I have seen are notable in their attempts to minimise hierachy. Decisions based on the highest paid person in the room are always only guesses! Seek evidence, allow individuals and teams the freedom to experiment and make decisions based on what they find. Treat people’s opinions with respect, but recognise that whoever it is that holds them, they are still only opinions. Test the hypothesis!

Question Everything

There should be no sacred cows. Everything should be up for evaluation, and if it is found wanting should be amenable to change. Technology, practice, process, office-space, team organisation, even personnel.

If it is not working, try something else.

Eliminate waste

Successful organisations avoid doing unnecessary work. The Lean mantra of eliminating waste is a powerful one. We should be rational and objective in assessing everything that we do in the light of our real goals and finding how to make our work more efficient.

The most common response that I get when coaching teams how to do better is “We don’t have time to improve”. Investing in making your team and their working practices more efficient is almost never a cost!

Leadership is Different to Management

Spotify talk of having “Loosely-coupled, tightly aligned teams” as being one of their goals. They aim for high autonomy and high alignment.

Dan Pink talks of the importance of “Autonomy, Mastery and Purpose” in motivating people towards excellence.

These ideas require leadership, not management. The goal of leadership should be to establish a common vision, a shared purpose for the organisation without telling people how to achieve it. Ideally leadership is about inspiring people.

My experience has been that experimental organisations are generally much more successful than their more traditional counterparts. They also tend to appeal to the most talented people, because of the creative freedom that they offer.

People sometimes ask me “how do you know when your organisation is working well?”, I think that you know things are on the right track when your first response to any challenge or question is “How could we try that out?” rather than “I think this is the answer”.

Posted in Agile Development, Continuous Delivery, Culture | Tagged | 1 Comment

The Next Big Thing?

A few years ago I was asked to take part in a panel session at a conference. One of the questions asked by the audience was what we thought the “next big thing might be”. Most of the panel talked about software. I recall people talking about Functional Programming and the addition of Lambdas to Java amongst other things.

At the time this was not long after HP had announced that they had cracked the Memristor, and my answer was “Massive scale, non-volatile RAM”.

If you are a programmer, as I am, then maybe that doesn’t sound as sexy as Functional programming or Lamdas in Java, but let me make my case…

The relative poor performance of memory has been a fundamental constraint on how we design systems pretty much from the begining of the digital age.

A foundational component of our computer systems, since the secret computers at Bletchley Park that helped us to win the second world war is DRAM. The ‘D’ in DRAM stands for Dynamic. What that means is that this kind of memory is leaky. It forgets unless it is dynamically refreshed.

The computers at Bletchley Park had a big bank of capacitors that represented the working memory of the system and this was refreshed from paper-tape. That has been pretty much the pattern of computing ever since. We have had a relatively small working store of DRAM, backed by bigger, cheaper, store of more durable non-volatile memory of some kind.

In addition to this division between the volatile DRAM and non-volatile backing storage, there has also, always been a big performance gap.

Processors are fast with small storage, DRAM is slow but stores more, Flash is VERY slow but stores lots, Disk is even slower, but is really vast!

Now imagine that our wonderful colleagues in the hardware game came up with something that started to blur those divisions. What if we had vast memory that was fast and, crucially, non-volatile.

Pause for a moment, and think about what that might mean for the way in which you would design your software. I think that this would be revolutionary. What if you could store all of your data in memory, and not bother with storing it on disk or SSD or SAN. Would the ideas of “Committing” or “Saving” still make sense? Well, maybe they would, but they would certainly be more abstract. In lots of problem domains I think that the idea of “Saving” would just vanish.

Modern DRAM requires that current is supplied to keep the capacitors, representing the bits in our programs and data, charged. So when you turn off your computer at night it forgets everything. Modern consumer operating systems do clever things like implement complicated “sleep” modes so that when you turn off, the in-memory state of the DRAM is written to disk or SSD. If we had our magic, massive, non-volatile storage, then we could just turn off the power and the state of our memory would remain in-tact. Operating Systems could be simplified, at least in this respect, and implement a real “instant-on”.

What would our software systems look like if they were designed to run on a computer with this kind of memory? Maybe we would all end up creating those very desirable “software simulations of the problem domain” that we talk about in Domain Driven Design? Maybe it would be simpler to avoid the leaky abstractions so common with mismatches between what we want of our business logic and the realities of storing something in a RDBMS or column store? Or maybe we would all just partition off a section of our massive-scale non-volatile RAM and pretend it was a disc and keep on building miserable 3-tier architecture based systems and running them wholly in-memory?

I think that this is intriguing. I think that it could change the way that we think about software design for the better.

Why am I talking about this hypothetical future? Well, IBM and Micron have just announced 3D XPoint memory. This is nearly all that I have just described. It is 10 times denser than conventional memory (DRAM), it is 1000x faster than NAND (Flash). It is also 1000x better endurance than NAND, which wears out.

This isn’t yet the DRAM replacement that I am talking about. That is because although this memory will be a lot denser than DRAM and a lot faster than NAND it is still a lot slower than DRAM, but the gap is closing. If the marketing specs are to be believed then the new 3D XPoint memory is about 10 times slower than DRAM and has about half the endurance. In hardware performance terms, that is really not far off.

I think that massive scale non-volatile RAM of sufficient performance to replace DRAM is coming. It may well be a few years away yet, but when it arrives I think it will cause a revolution in software design. We will have a lot more flexibility about how we design things. We will have to decide explicitly about stuff that, over recent years, we have taken for granted and we will have a whole new set of lessons to learn.

Thought provoking, huh?

Posted in High Performance Computing, Software Architecture, Software Design | Leave a comment

Test Maintainability

At LMAX, where I worked for a while, they have extensive, world-class, automated acceptance testing. LMAX tests every aspect of their system and this is baked in to their development process. No story is deemed complete unless all acceptance criteria associated with it have a passing automated, whole-system acceptance test.

This is a minimum, usually there is more than one acceptance test per acceptance criterion. This triggers the question: “What is an accptance test?”. I recently had a discussion on this topic with some friends, trying to define the scope of accpetance tests more clearly. This was triggered by an article published by Mike Wacker of Google who claimed that it was not practical to keep “end-to-end tests passing”.

My ex-colleage Adrian replied. To summarise Adrian’s point, LMAX has been living with exaclty this kind of complex end-to-end test for the past eight or nine years. This sparked a debate on the meaning of end-to-end testing which I will skip for now. I will use the term “acceptance testing” to mean the sort of testing descrbed in the Google article, I think their intent is what I mean by acceptance tests. There is a serious problem to address here, that of test-maintainability.

As soon as you adopt an extensive automated testing strategy you also take-on the problem of living with your tests. I don’t know the details of Google’s testing approach but there are several things in the Mike’s article that suggest that Google is succumbing to some common problems:

Firstly, their feedback cycle is too long! The article talks about building and testing the latest version of a service “every night”. That is acceptable in a few limited, difficult circumstance, if you are burning your software into hardware devices for example. Otherwise it is unacceptably slow and will compromise the value and maintainability of your tests.

As my ex-colleague Mike Roberts used to say: “Continuous is more often than you think”. Testing every night is too slow, you need valuable feedback much more frequently than that. I think that you should be aiming for commit stage feedback in under 5 minutes (under 10 is survivable, but unpleasant) and acceptance stage feedback in under 30 minutes (60 is survivable but unpleasant). I think that unit testing alone is insufficient, for some of the reasons that the Google article cites.

There are hints of other problems. “Developers like it because it off-loads most, if not all, of the testing to others”. I think that this is a common anti-pattern. It is vital that developers own the acceptance tests. It may be that in the very early stages of their initial creation someone in a different role may sketch the test, but developers are the people who will break the tests and so they are the people who are best placed to fix them and maintain them. This is, for me, an essential part of the Continuous Delivery feedback loop. I have never seen a successful automated testing effort based on a separate QA team writing and maintaining tests. The testing effort always lags, and there is no “cost” to the development team of completely invalidating the tests. Make the developers own the maintenance of the tests and you fix this problem. Prevent release candidates that fail any test from progressing by implementing a deployment pipeline. Make it a developers priority to keep the system in a “releaseable state” – meaning “all tests pass”.

The final vital aspect of acceptance tests is that they should be simple to create and easy to understand. This is all about ensuring that the infrastructure supporting your acceptance tests is appropriately designed. Allowing for a clear separation of the “What” from the “How”. We want each test case to only assert “What” the system under test should do, not “How” it does it. This means that we need to abstract the specification of test-cases from the technicalities of interacting with the system under-test.

The Google article is right that unit tests, particularly those created as part of a robust TDD process, are extremely valuable and effective. They do though, only tell part of the testing story. Acceptance tests, testing your system in life-like circumstances are, to me, a fundamental part of an effective testing strategy. Although theoretically you could cover everything you need in unit tests, in practice we are never smart enough to figure that out. Evaluating our software from the perspective of our users is at the core of a CD testing strategy.


So here are my guidelines for a successful test strategy:

Automate virtually all of your testing.

Don’t look to tests to verify, look to them to falsify.

Don’t release if a single test is failing.

Do Automate User Scenarios as Acceptance Tests.

Do focus on short feedback loops (roughly 5 minutes for commit stage tests and 45 minutes for acceptance tests)

You can find a video of me presenting in a bit more detail on some of these topics here: https://vimeo.com/channels/pipelineconf/123639468

Posted in Acceptance Testing, Agile Development, Continuous Delivery, LMAX, TDD | Leave a comment

How many test failures are acceptable?

Continuous Delivery is getting a lot of mileage at the moment. It seems to be an idea whose time has come. There was a survey last year that claimed that 66% of companies had a “Strategy for Continuous Delivery”. Not sure that I believe that, nevertheless it suggests that CD is “cool”. I suppose that it is inevitable that such a popular, widespread idea will be misinterpreted in some places. Two such misinterpretations seem fairly common to me.

The first is that Continuous Delivery is really just about automating deployment of your software. If you have written some scripts or bought a tool to deploy your system you are doing Continuous Delivery – wrong!

The second is that automated testing is an optional part of the process, that getting your release frequency down to a month is a big step forward (which it is for some organisations) and that that means you are doing CD, despite the fact that your primary go-live testing is still manual – wrong again!

I see CD as a holistic process. Our aim is to minimise the gap between having an idea and getting working software into the hands of our users to express that idea so that we can learn from their experience. When I work on a project my aim is always to minimise that cycle-time. This has all sorts of implications, and affects pretty much every aspect of your development process, not to say your business strategy. Central to this is the need to automate, in order to reduce the cycle time.

The most crucial part of that automation, and the most valuable, is your testing. The aim of a CD process is to make software development more empirical. We want to carry out experiments that give us new understanding when they fail, and a higher level of confidence in our assumptions when they don’t. The principal expression of these experiments is as automated tests.

The best projects that I have worked on have taken this approach very seriously. We tested every aspect of our system – every aspect! That is not to say that our testing was exhaustive, you can never test everything, but it was extensive.

So what does such a testing strategy look like?

The deployment pipeline is an automated version of your software release process. Its aim is to provide a channel to production that verifies our decision to release. Unfortunately we can never prove that our code is good, we can only prove that it is bad when a test fails. This is the idea of falsifiability which we learn from science. I can never prove the theory that “All Swans are white”, but as soon as I see a black Swan I know that the theory is wrong.

Karl Popper proposed the idea of falsifiabiliy in his book “The Logic of Scientific Discovery” in 1934. Since then it has become pretty much the defining characteristic of science. If you can falsify a statement through experimental evidence it is a scientific theory, if you cannot it is a guess.

So, back to software. Falsifiability should be a cornerstone of our testing strategy. We want tests that will definitively pass or fail, and when they fail we want that to mean that we should not release our system, because we now know that it has a problem.

I am sometimes asked the question, “What percentage of tests do you think should be passing before we release?”. I think that people think that I am an optimistic fool when I answer “100%”. What is the point of having tests that tell us that our software is not good enough, and then ignoring what they tell us?

In the real world this is difficult for some kinds of tests in some kinds of system. There have been times when I have relaxed this absolute rule. However, there are only two reasons why tests may be failing and it still makes sense to release:

1) The tests are correctly failing and showing a problem, but this is a problem that we are prepared to live with in production.
2) The tests or system under-test (SUT) are flaky (non-deterministic) and so we don’t really know what state we are in.

In my experience, maybe surprisingly, the second case is the more common. This is a pretty serious problem because we don’t really know what is going on now.

Tests that we accept as “Oh that one is always failing” are subversive. First they acclimatise us to accepting a failing status as normal.

It is vital to any Continuous Integration process, let alone a Continuous Delivery process, that we optimise to keep the code in a releasable state. Fixing any failing test should take precedence over any other work. Sometimes this is expensive! Sometimes we have a nasty intermitent test that is extremely hard to figure out. Nevertheless, it must be figured out. The intermitency is telling us something very important. Either our test is flaky, or the SUT is flaky. Either one is bad, and you won’t know which it is until you have found the problem and fixed it.

If you have a flaky system, with flaky tests and lots of bugs in production, this may sound hard to achieve, but this is a self-fulfilling approach. To get your tests to be deterministic, your code needs to be deterministic. If you do this your bug count will fall!

I read a good article on the adoption of Continuous Delivery at PaddyPower recently, (http://www.infoq.com/articles/cd-benefits-challenges) in which the authour, Lianping Chen, claims “Product quality has improved significantly. The number of open bugs for the applications has decreased by more than 90 percent.”. This may sound surprising if you have not seen what Continuous Delivery looks like when you take it seriously, but this is completely in-line with my experience. This kind of effect only happens when you start being aggressive in your denial of failure – a single test-failure must mean “Not good enough!”

So take a hard-line with your automated tests, test everything and ensure that a single failure means that your system is not fit to release.

Posted in Acceptance Testing, Agile Development, Continuous Delivery, TDD | Leave a comment