The Anatomy of an Experimental Organisation

I am a software developer. I see the world from that perspective. In reality though that is only one viewpoint. While it is important that we are effective at delivering software, what really matters is that we are effective at delivering business value.

When I describe Continuous Delivery to people I generally spend a fair amount of time impressing on them that it is not about tools and technicalities. It is not even about the relationship between developers and operations or product owners and testers.

Continuous Delivery is about minimising the gap between having an idea and getting that idea, in the form of working software, into the hands of users and seeing what they make of it.

This vital feedback loop is at the core of not just good development, but of good business too.

I have been lucky to work in, or with, several companies that I would describe as Agile companies, not just companies practicing Agile development. These organisations are fundamentally different in approach in almost everything that they do.

The principal characteristic of this kind of organisation is that they are experimental in their approach to everything.

For some more traditional organisations this sounds scary. “Experimenting, that sounds like you don’t know what you are doing”. Well, yes, we don’t know what we are doing, none of us really know!

To quote one of my heroes, Richard Feynman:

“It doesn’t matter how intelligent you are, if you guess and that guess cannot be backed up by experimental evidence – then it is still a guess!”

Research into the performance of successful organisations says that 2/3rd of their ideas that are implemented in software produce zero or negative value:

“Only one third of the ideas tested at Microsoft improved the metric(s) they were designed to improve”

“Avinash Kaushik wrote in his Experimentation and Testing primer that “80% of the time you/we are wrong about what a customer wants”

“Mike Moran wrote that Netflix considers 90% of what they try to be wrong”
(Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, Nils Pohlmann 2013).

(http://ai.stanford.edu/~ronnyk/2013%20controlledExperimentsAtScale.pdf)

Two thirds, or more, of the ideas of GOOD companies are waste.

Further, the experience of these companies, and the data, says that nobody can tell which third of ideas are the good ones before they are delivered. Clearly there is a place for guessing and predicting customer demand for innovative ideas, but it is important to remember that for every iPhone that you invent, you will have to go through three or four Newtons first.

If we are generally so bad at guessing, then the only sane strategy is to embrace the uncertainty. Optimise to have lots of ideas and to evaluate them quickly so that you can discard the poor ones as fast, and cheaply, as possible.

This is what really effective organisations do. Watch Henrik Kniberg’s wonderful descriptions of the Spotify Culture.
See how Netflix work.

These companies know that their ability to guess is poor, because they are becoming more scientific in their approach. Instead of guessing what their customers want, and then assuming that their customers like what they are getting, they are measuring. These companies are designing experiments. They define metrics, that will identify what their customers think and then carry out the experiment and reflect on the results in order to learn and improve.

So how do these experimental companies differ from others?

Value Innovation over Prediction

Innovation is the thing that differentiates the great companies from the rest. Great companies create new markets, provide new products or services that change how people do things.

More traditional companies value predictability. The trouble is that you can’t be both predictable and innovative at the same time. They are different ends of a spectrum. The only really predictable thing is status-quo. So if you want your company to be great, you need to value innovation and discard your reliance on predictability – at least for some of your products.

One aspect of this is launch-dates. Apple don’t pre-announce their products, they are very secretive. Only when the products are ready do they announce. This allows them to strive for “Insanely great” as Steve Jobs so memorably put it.

State your hypothesis

This shared vision is kind of fractal. It operates at all levels in effective organisations. Being hypothesis driven helps to establish this clarity, this shared purpose. Useful hypotheses can range from, stating what you expect the outcome of a test to be before you execute it to stating how you think your global business strategy will work out.

Some people even talk about “Hypothesis Driven Development“.

Seek the right experiment

As soon as you have a new idea. Whatever its nature, the next question should be “How can we test this idea?”. If it is an idea about how to improve your process, figure out an experiment to see if it works: “Ok, let’s try doing without the estimation meeting for a while and see if we save time”. If it is an idea about improving your product figure out an experiment to test that too: “Let’s release this feature and A/B test it against the existing service to see which one makes more money”.

Establish Feedback Loops

Experimentation means nothing without the closure of the feedback loop. After each experiment there should be a way to evaluate the results and figure out what to do next. The outcome from each experiment should be a real action of some kind. Depending on the results of the experiment your should either: Drop the change; Adopt the change; Run through the cycle again with a new experiment to learn more.

The establishment of effective feedback loops is a vital attribute of experimental organisations. Continuous Integration, Continuous Delivery, Experiments in Production, Test Driven Development, Retrospective meetings and Incident Reviews are all mechanisms for establishing effective feedback.

Assume the fallibility of experts

All of the really high-performance organisations that I have seen are notable in their attempts to minimise hierachy. Decisions based on the highest paid person in the room are always only guesses! Seek evidence, allow individuals and teams the freedom to experiment and make decisions based on what they find. Treat people’s opinions with respect, but recognise that whoever it is that holds them, they are still only opinions. Test the hypothesis!

Question Everything

There should be no sacred cows. Everything should be up for evaluation, and if it is found wanting should be amenable to change. Technology, practice, process, office-space, team organisation, even personnel.

If it is not working, try something else.

Eliminate waste

Successful organisations avoid doing unnecessary work. The Lean mantra of eliminating waste is a powerful one. We should be rational and objective in assessing everything that we do in the light of our real goals and finding how to make our work more efficient.

The most common response that I get when coaching teams how to do better is “We don’t have time to improve”. Investing in making your team and their working practices more efficient is almost never a cost!

Leadership is Different to Management

Spotify talk of having “Loosely-coupled, tightly aligned teams” as being one of their goals. They aim for high autonomy and high alignment.

Dan Pink talks of the importance of “Autonomy, Mastery and Purpose” in motivating people towards excellence.

These ideas require leadership, not management. The goal of leadership should be to establish a common vision, a shared purpose for the organisation without telling people how to achieve it. Ideally leadership is about inspiring people.

My experience has been that experimental organisations are generally much more successful than their more traditional counterparts. They also tend to appeal to the most talented people, because of the creative freedom that they offer.

People sometimes ask me “how do you know when your organisation is working well?”, I think that you know things are on the right track when your first response to any challenge or question is “How could we try that out?” rather than “I think this is the answer”.

Posted in Agile Development, Continuous Delivery, Culture | Tagged | 2 Comments

The Next Big Thing?

A few years ago I was asked to take part in a panel session at a conference. One of the questions asked by the audience was what we thought the “next big thing might be”. Most of the panel talked about software. I recall people talking about Functional Programming and the addition of Lambdas to Java amongst other things.

At the time this was not long after HP had announced that they had cracked the Memristor, and my answer was “Massive scale, non-volatile RAM”.

If you are a programmer, as I am, then maybe that doesn’t sound as sexy as Functional programming or Lamdas in Java, but let me make my case…

The relative poor performance of memory has been a fundamental constraint on how we design systems pretty much from the begining of the digital age.

A foundational component of our computer systems, since the secret computers at Bletchley Park that helped us to win the second world war is DRAM. The ‘D’ in DRAM stands for Dynamic. What that means is that this kind of memory is leaky. It forgets unless it is dynamically refreshed.

The computers at Bletchley Park had a big bank of capacitors that represented the working memory of the system and this was refreshed from paper-tape. That has been pretty much the pattern of computing ever since. We have had a relatively small working store of DRAM, backed by bigger, cheaper, store of more durable non-volatile memory of some kind.

In addition to this division between the volatile DRAM and non-volatile backing storage, there has also, always been a big performance gap.

Processors are fast with small storage, DRAM is slow but stores more, Flash is VERY slow but stores lots, Disk is even slower, but is really vast!

Now imagine that our wonderful colleagues in the hardware game came up with something that started to blur those divisions. What if we had vast memory that was fast and, crucially, non-volatile.

Pause for a moment, and think about what that might mean for the way in which you would design your software. I think that this would be revolutionary. What if you could store all of your data in memory, and not bother with storing it on disk or SSD or SAN. Would the ideas of “Committing” or “Saving” still make sense? Well, maybe they would, but they would certainly be more abstract. In lots of problem domains I think that the idea of “Saving” would just vanish.

Modern DRAM requires that current is supplied to keep the capacitors, representing the bits in our programs and data, charged. So when you turn off your computer at night it forgets everything. Modern consumer operating systems do clever things like implement complicated “sleep” modes so that when you turn off, the in-memory state of the DRAM is written to disk or SSD. If we had our magic, massive, non-volatile storage, then we could just turn off the power and the state of our memory would remain in-tact. Operating Systems could be simplified, at least in this respect, and implement a real “instant-on”.

What would our software systems look like if they were designed to run on a computer with this kind of memory? Maybe we would all end up creating those very desirable “software simulations of the problem domain” that we talk about in Domain Driven Design? Maybe it would be simpler to avoid the leaky abstractions so common with mismatches between what we want of our business logic and the realities of storing something in a RDBMS or column store? Or maybe we would all just partition off a section of our massive-scale non-volatile RAM and pretend it was a disc and keep on building miserable 3-tier architecture based systems and running them wholly in-memory?

I think that this is intriguing. I think that it could change the way that we think about software design for the better.

Why am I talking about this hypothetical future? Well, IBM and Micron have just announced 3D XPoint memory. This is nearly all that I have just described. It is 10 times denser than conventional memory (DRAM), it is 1000x faster than NAND (Flash). It is also 1000x better endurance than NAND, which wears out.

This isn’t yet the DRAM replacement that I am talking about. That is because although this memory will be a lot denser than DRAM and a lot faster than NAND it is still a lot slower than DRAM, but the gap is closing. If the marketing specs are to be believed then the new 3D XPoint memory is about 10 times slower than DRAM and has about half the endurance. In hardware performance terms, that is really not far off.

I think that massive scale non-volatile RAM of sufficient performance to replace DRAM is coming. It may well be a few years away yet, but when it arrives I think it will cause a revolution in software design. We will have a lot more flexibility about how we design things. We will have to decide explicitly about stuff that, over recent years, we have taken for granted and we will have a whole new set of lessons to learn.

Thought provoking, huh?

Posted in High Performance Computing, Software Architecture, Software Design | Leave a comment

Test Maintainability

At LMAX, where I worked for a while, they have extensive, world-class, automated acceptance testing. LMAX tests every aspect of their system and this is baked in to their development process. No story is deemed complete unless all acceptance criteria associated with it have a passing automated, whole-system acceptance test.

This is a minimum, usually there is more than one acceptance test per acceptance criterion. This triggers the question: “What is an accptance test?”. I recently had a discussion on this topic with some friends, trying to define the scope of accpetance tests more clearly. This was triggered by an article published by Mike Wacker of Google who claimed that it was not practical to keep “end-to-end tests passing”.

My ex-colleage Adrian replied. To summarise Adrian’s point, LMAX has been living with exaclty this kind of complex end-to-end test for the past eight or nine years. This sparked a debate on the meaning of end-to-end testing which I will skip for now. I will use the term “acceptance testing” to mean the sort of testing descrbed in the Google article, I think their intent is what I mean by acceptance tests. There is a serious problem to address here, that of test-maintainability.

As soon as you adopt an extensive automated testing strategy you also take-on the problem of living with your tests. I don’t know the details of Google’s testing approach but there are several things in the Mike’s article that suggest that Google is succumbing to some common problems:

Firstly, their feedback cycle is too long! The article talks about building and testing the latest version of a service “every night”. That is acceptable in a few limited, difficult circumstance, if you are burning your software into hardware devices for example. Otherwise it is unacceptably slow and will compromise the value and maintainability of your tests.

As my ex-colleague Mike Roberts used to say: “Continuous is more often than you think”. Testing every night is too slow, you need valuable feedback much more frequently than that. I think that you should be aiming for commit stage feedback in under 5 minutes (under 10 is survivable, but unpleasant) and acceptance stage feedback in under 30 minutes (60 is survivable but unpleasant). I think that unit testing alone is insufficient, for some of the reasons that the Google article cites.

There are hints of other problems. “Developers like it because it off-loads most, if not all, of the testing to others”. I think that this is a common anti-pattern. It is vital that developers own the acceptance tests. It may be that in the very early stages of their initial creation someone in a different role may sketch the test, but developers are the people who will break the tests and so they are the people who are best placed to fix them and maintain them. This is, for me, an essential part of the Continuous Delivery feedback loop. I have never seen a successful automated testing effort based on a separate QA team writing and maintaining tests. The testing effort always lags, and there is no “cost” to the development team of completely invalidating the tests. Make the developers own the maintenance of the tests and you fix this problem. Prevent release candidates that fail any test from progressing by implementing a deployment pipeline. Make it a developers priority to keep the system in a “releaseable state” – meaning “all tests pass”.

The final vital aspect of acceptance tests is that they should be simple to create and easy to understand. This is all about ensuring that the infrastructure supporting your acceptance tests is appropriately designed. Allowing for a clear separation of the “What” from the “How”. We want each test case to only assert “What” the system under test should do, not “How” it does it. This means that we need to abstract the specification of test-cases from the technicalities of interacting with the system under-test.

The Google article is right that unit tests, particularly those created as part of a robust TDD process, are extremely valuable and effective. They do though, only tell part of the testing story. Acceptance tests, testing your system in life-like circumstances are, to me, a fundamental part of an effective testing strategy. Although theoretically you could cover everything you need in unit tests, in practice we are never smart enough to figure that out. Evaluating our software from the perspective of our users is at the core of a CD testing strategy.

Summary

So here are my guidelines for a successful test strategy:

Automate virtually all of your testing.

Don’t look to tests to verify, look to them to falsify.

Don’t release if a single test is failing.

Do Automate User Scenarios as Acceptance Tests.

Do focus on short feedback loops (roughly 5 minutes for commit stage tests and 45 minutes for acceptance tests)

You can find a video of me presenting in a bit more detail on some of these topics here: https://vimeo.com/channels/pipelineconf/123639468

Posted in Acceptance Testing, Agile Development, Continuous Delivery, LMAX, TDD | Leave a comment

How many test failures are acceptable?

Continuous Delivery is getting a lot of mileage at the moment. It seems to be an idea whose time has come. There was a survey last year that claimed that 66% of companies had a “Strategy for Continuous Delivery”. Not sure that I believe that, nevertheless it suggests that CD is “cool”. I suppose that it is inevitable that such a popular, widespread idea will be misinterpreted in some places. Two such misinterpretations seem fairly common to me.

The first is that Continuous Delivery is really just about automating deployment of your software. If you have written some scripts or bought a tool to deploy your system you are doing Continuous Delivery – wrong!

The second is that automated testing is an optional part of the process, that getting your release frequency down to a month is a big step forward (which it is for some organisations) and that that means you are doing CD, despite the fact that your primary go-live testing is still manual – wrong again!

I see CD as a holistic process. Our aim is to minimise the gap between having an idea and getting working software into the hands of our users to express that idea so that we can learn from their experience. When I work on a project my aim is always to minimise that cycle-time. This has all sorts of implications, and affects pretty much every aspect of your development process, not to say your business strategy. Central to this is the need to automate, in order to reduce the cycle time.

The most crucial part of that automation, and the most valuable, is your testing. The aim of a CD process is to make software development more empirical. We want to carry out experiments that give us new understanding when they fail, and a higher level of confidence in our assumptions when they don’t. The principal expression of these experiments is as automated tests.

The best projects that I have worked on have taken this approach very seriously. We tested every aspect of our system – every aspect! That is not to say that our testing was exhaustive, you can never test everything, but it was extensive.

So what does such a testing strategy look like?

The deployment pipeline is an automated version of your software release process. Its aim is to provide a channel to production that verifies our decision to release. Unfortunately we can never prove that our code is good, we can only prove that it is bad when a test fails. This is the idea of falsifiability which we learn from science. I can never prove the theory that “All Swans are white”, but as soon as I see a black Swan I know that the theory is wrong.

Karl Popper proposed the idea of falsifiabiliy in his book “The Logic of Scientific Discovery” in 1934. Since then it has become pretty much the defining characteristic of science. If you can falsify a statement through experimental evidence it is a scientific theory, if you cannot it is a guess.

So, back to software. Falsifiability should be a cornerstone of our testing strategy. We want tests that will definitively pass or fail, and when they fail we want that to mean that we should not release our system, because we now know that it has a problem.

I am sometimes asked the question, “What percentage of tests do you think should be passing before we release?”. I think that people think that I am an optimistic fool when I answer “100%”. What is the point of having tests that tell us that our software is not good enough, and then ignoring what they tell us?

In the real world this is difficult for some kinds of tests in some kinds of system. There have been times when I have relaxed this absolute rule. However, there are only two reasons why tests may be failing and it still makes sense to release:

1) The tests are correctly failing and showing a problem, but this is a problem that we are prepared to live with in production.
2) The tests or system under-test (SUT) are flaky (non-deterministic) and so we don’t really know what state we are in.

In my experience, maybe surprisingly, the second case is the more common. This is a pretty serious problem because we don’t really know what is going on now.

Tests that we accept as “Oh that one is always failing” are subversive. First they acclimatise us to accepting a failing status as normal.

It is vital to any Continuous Integration process, let alone a Continuous Delivery process, that we optimise to keep the code in a releasable state. Fixing any failing test should take precedence over any other work. Sometimes this is expensive! Sometimes we have a nasty intermitent test that is extremely hard to figure out. Nevertheless, it must be figured out. The intermitency is telling us something very important. Either our test is flaky, or the SUT is flaky. Either one is bad, and you won’t know which it is until you have found the problem and fixed it.

If you have a flaky system, with flaky tests and lots of bugs in production, this may sound hard to achieve, but this is a self-fulfilling approach. To get your tests to be deterministic, your code needs to be deterministic. If you do this your bug count will fall!

I read a good article on the adoption of Continuous Delivery at PaddyPower recently, (http://www.infoq.com/articles/cd-benefits-challenges) in which the authour, Lianping Chen, claims “Product quality has improved significantly. The number of open bugs for the applications has decreased by more than 90 percent.”. This may sound surprising if you have not seen what Continuous Delivery looks like when you take it seriously, but this is completely in-line with my experience. This kind of effect only happens when you start being aggressive in your denial of failure – a single test-failure must mean “Not good enough!”

So take a hard-line with your automated tests, test everything and ensure that a single failure means that your system is not fit to release.

Posted in Acceptance Testing, Agile Development, Continuous Delivery, TDD | Leave a comment

Incremental Design – Part II

In my earlier blog post on incremental design I suggested that we need to allow for failure. So how do we limit the impact of failure, how do we tell when our design choices don’t work and how do we organise our world in a way that allows us to build upon what we learn from our experiments?

Perhaps it may surprise you that I begin this post on design with a section on team structures…

Team Structure and Architecture

Dev teams work best in small groups. Team sizes of up to about 8 people seem optimal. So organising development groups into many small, 8 or less person teams is a common and effective strategy. Conway’s law tells us that this has implications on the design of our systems. If we want our teams to be autonomous, self-organising and self-directing, but comprise of fewer than 8 people, then how we can allocate work to these teams is seriously constrained. The problem is that the other really important thing to enable the continuous flow of valuable software is that we want these teams to each deliver end-to-end user value without having to wait for another team to complete their work – We want to eliminate cross-team dependencies for any given feature.

I think that this is one of the strengths of MicroService architectures that people are currently excited about. Different flavours of this idea have been around for a long time, but implementing a system as a collection of loosely-coupled, independent services is a pretty good approach. This approach is also reflected in the team structure of companies like Amazon who structure their entire development effort as small independent, (2 pizza) teams, each responsible for a specific function of the business.

A good mental model for this kind of architecture is to imagine an organisation without any computers. Each department has its own office and is wholly responsible for the parts of the business that it looks after and decides for itself how best to accomplish that. Within each office they can work however they like. That is wholly their decision and their responsibility. Communications between departments is in the form of memos. Now, replace the office with a service and the memos with asynchronous messaging and you have a micro-service architecture.

A great tool for helping to design such an organisation, and so encourage this kind of architecture is the idea of Bounded Contexts, from Eric Evans’ book “Domain Driven Design”. A Bounded Context is the scope within which a domain model applies. Any complex problem domain will be composed of various Bounded Contexts, within which a domain model will have a consistent meaning. This idea effectively gives you a coherent scope within your problem domain that sensibly maps to business value.

Bounded Contexts are a great tool to help you organise your development efforts. Look for the Bounded Contexts in your problem domain and use that model to organise your teams. This is rarely a team per-service, or even per-context, but it is a powerful approach to looking at how to group contexts and services to allocate to teams, so ensuring that teams can autonomously deliver end-to-end value to the business.

Maintaining Models

A common concern of people new to iterative design is the loss of a “big-picture”. I learned a trick years ago for maintaining a big-picture, architectural view. I like to maintain what I call a “Whiteboard Model” of the system that I am working on. The nature of the diagram doesn’t matter much, but it’s existence does. The “Whiteboard Model” is a high-level abstract picture of the organisation of the system. It is high-level enough, that any of the senior members of the team should be able to recreate it, on a whiteboard, from memory, in a few minutes. That puts a limit on the level of detail that it can contain.

More detailed description of the model in a CD context should exist as automated tests. These tests should assert anything and everything important about your system. That the system performs correctly from a functional perspective. That it exhibits the desired performance characteristics. That it is secure, scalable, resilient, conforms to your architectural constraints – whatever matters in your application. These are our executable specifications of the behaviour of the system.

In a big system, the Whiteboard Model probably identifies the principle services that collaborate to form the system. If it is too detailed it won’t serve its purpose. This model needs to be in the heads of the development team. It needs to be a working tool, not an academic exercise in system design.

I also like creating little throwaway models for tricky parts of the system that I am working on. These can be simple notes on bits of scrap paper that last a few minutes to explore ideas with your pair, or they can be longer-lived models that can help you to organise a collection of stories that over-time build up a more complex set of behaviours. I once worked as part of a team who designed and built a complex, high-performance, reliable messaging system using these incremental techniques. We created a model of the scenarios that we thought that our reliable messaging system would need to cope with. It took several of us several hours around a whiteboard to come up with the first version. It looked a bit like a couple of robots, so it was ever after referred to as the “Dancing Robot Model”. Often in stand-ups you would hear comments like “I am working on Robot 1’s left leg today” 😉

Over the years I think that I have seen two common failure modes with modelling. The first is over-modelling. For me trying to express the detail of an algorithm, all of the methods or attributes of a class is a waste of time and effort. Code is a much more efficient, much more readable model of structure at this level of detail. Models should be high-level abstractions of ideas that inform decision making, but striving for detail or some level of formal completeness is a big mistake.

The second failure is lack of modelling. I think that the widespread adoption of agile approaches to development have made this failure mode even more common. I often interview developers and commonly ask them to describe a system that they have worked on. It is surprising the number of them who have no obvious organising principles, certainly nothing like a whiteboard model.

There is a third failure, but it is so heinous that I shudder to mention it – lack of tests! Of course, once our models exist in code, they should be validated by automated tests!

The Quality of Design

Personally I believe that an iterative approach is a more natural approach to problem solving. The key is to think about the qualities of good design. Good designs have clean separation of concerns, are modular, abstract the problem and are loosely-coupled. All of these properties are distinct from the technology. This is just as true of the design of a database schema or an ANT script as it is of code written in an OO or Functional language. These things matter more than the technology, they are more fundamental.

One of the reasons why these things are so important is that they allow us to make mistakes. When you write code that has a clean separation of concerns it is easier to change your mind when you learn something new. If the code that places orders or manages accounts is confused with the code that stores them in a database, communicates them across a network or presents them in a user interface then it will be tough to change things when your assumptions change.

I once worked on a project to implement a high-performance financial exchange. We had a pretty clean codebase, with very good separation of concerns. In my time there we changed the messaging system by which services communicated multiple times, growing it from a trivial first version that sent XML over HTTP evolving it to a world-class, reliable, asynchronous binary messaging system. We started with the XML over HTTP not because we thought it would ever suffice, but because I already had some code that did this from a pet project of mine. It was poor in terms of performance, but the separation of concerns was what we needed. Transport, messaging, persistence and logic were all independent and pluggable. So we could start with something simple and ready to go to get the bare-bones of our services in place and communicating. We then evolved the messaging system as we needed to, replacing the transport at one point, the message-protocol at another and so on.

The messaging wasn’t the only thing that evolved, every aspect of the system evolved over time. At any given point in its life doing just enough to fulfil all its requirements. You can’t make this kind of evolutionary change without good quality automated testing, but when you have that insurance, any aspect of the system is amenable to change. As well as messaging, we replaced our relational database vendor, we changed the technology that rendered our UI several times. We even made a dramatic change in the presentation of our system’s UI moving from a tabular, data-entry style user-interface to a graph-based, point and click trading interface at one point. All this while keeping to our regular release schedule, releasing other new features in parallel and, vitally, keeping all the tests running and passing.

Worrying about good separation of concerns, modularity, isolation is important.

The secret to incremental design… it is simply good design!

Posted in Agile Development, Continuous Delivery, Software Architecture, Software Design | Leave a comment

Is Continuous Delivery Riskier?

I read an article in CIO magazine today “Three Views on Continuous Delivery“. Of course the idea of such an article is to present differing viewpoints, and I have no need, or even desire, for the world to see everything from my perspective.
However the contrary view, expressed by Jonathan Mitchell seemed to miss one of the most important reasons that we do Continuous Delivery. He assumes that CD means increased risk. Sorry Jonathan, but I think that this is a poorly informed, misunderstanding of what CD is really about. From my perspective, certainly in the industries where I have worked of late, the reverse is true. CD is very focussed on reducing risk.
Here is my response to the article:
If Continuous Delivery “strikes fear into the heart of CIOs” they are missing some important points.
Continuous Delivery is the process that allows organisations to “maintain the reliability of the complex, kaleidoscope of interrelated software in their estate”, it doesn’t increase risk, it reduces it significantly. There is quite a lot of data to support this argument.
Amazon adopted a Continuous Deployment strategy a few years ago, they currently release into production once every 11.6 seconds. Since adopting this approach they have seen a 75% reduction in outages triggered by deployment and a 90% reduction in outage minutes. (Continuous Deployment is subtly different to Continuous Delivery in that release are automatically pushed into production when all tests pass. In Continuous Delivery release is a human decision).
I think that it is understandable that this idea makes Jonathan Mitchell nervous, it does seem counter-intuitive to release more frequently. However, what really happens when you do this is that it shines a light on all the places where your process and technology are weak, and helps you to strengthen them.
Continuous Delivery is a high-discipline approach. It requires more rigour not less. Continuous Delivery requires significant cultural changes within the development teams that adopt it and beyond. It is not a simple process to adopt, but the rewards are enormous. Continuous Delivery changes the economics of software development.
Continuous Delivery is the process that is chosen by some of the most successful companies in the world. This is not an accident. In my opinion Continuous Delivery finally delivers on the promise of software development. What businesses want from software is to quickly and efficiently put new ideas before their users. Continuous Delivery is built on that premise.
Posted in Agile Development, Continuous Delivery, DevOps | 1 Comment

Incremental Design – Part I

Continuous Delivery is all about making small changes. Work flows more easily, planning is simpler, error detection is helped and the time from idea to value is reduced when we make changes in small increments, but how do you solve big problems in small pieces? How do you maintain a coherent design when each change is small?

There are many facets to this problem. The organisations that are best at it tend to take a very broad view of design and think about things like how teams are structured, how work is defined and how design works when delivering a flow of many small features.

The traditional approach seems, on the face of it, sensible but is not. “Let’s think very hard and design everything in detail before we start”. Big-up-front design has a long history of not working very well, principally because at the point when you do the design you know the least about the problem that you ever will.

For a while we tried something else…

“We’re Agile We Don’t Need Your Stinking Design”(1)

In the early days of agile adoption there were some very naive things said about software design. This was largely as a reaction to the failures of big-up-front design. Some of the advice used to remind me of the “President of the Universe” from Douglas Adams’ “The Restaurant at the End of the Universe”. In this excellent book, it had been decided, by the advanced civilisations of the universe, that the desire to become a politician should exclude anyone from becoming one. They had taken this to it’s logical conclusion and decided that the senior person should have no pre-conceived ideas about anything. So, the President of the Universe lived in a shack on a beach and started every day making no assumptions at all. On waking he would wonder at the big hot thing in the sky, muse on the nature of existence an so on. This left no time at all for any decision making, ideal for a politician, not quite so good for a software developer.

Some people advised a similar approach to design. We should dump our assumptions and previous experience, and instead go straight to code and not think about design at all. If you weren’t sure what to do next, write a test!

Now there are few bigger fans of TDD than me, but I have always thought that this fear of design was stupid.

My own view is that design is all that we, as software developers, do. There is no distinction between coding and design. Coding *is* design, but it is not all that there is to it. Software development is an intriguingly, difficult exercise in design and we should use all of the tools and experience at our disposal to accomplish it. We should also value good design and strive for elegance and simplicity in all that we do.

A More Experimental Approach

If you are building a large complex system, or indeed any system, design is important, but we have learned through painful experience that trying to do all of the design up-front doesn’t work. Does this mean that we have reached an impasse?

Not really! Agile development practices are all about being experimental. So we should be experimental in our approach to design. If we want to be experimental what will that take?

Perhaps the most important thing is to allow for failure. By definition not all experiments succeed.

This need to allow for failure has several implications:

o If a design choice fails, how do we limit the impact of this failure?
o If we expect some of our design choices not to work, how do we tell?
o How do we organise our work to allow us to build upon what we have learned from our experiments?

I am going to suggest approaches to dealing with each of these in my subsequent blog posts. As a teaser here are some of the factors that I think are important in enabling a more incremental approach to design:

Team structure, Business-Alignment, Modelling and Models, Bounded Context, Isolation, Reactive Architectures, Loose-Coupling and Separation of concerns – there may be more 😉
(1) A quote from Andrew Phillips of XebiaLabs – with tongue firmly in cheek.

Posted in Agile Development, Continuous Delivery, Software Architecture, Software Design | Leave a comment

New Job, New Business

I have very recently given up my job at KCG (Formerly Getco) Ltd. KCG have been a very good employer and I thank them for the opportunities that they gave me.
The reason that I left though is, I hope, understandable. I have finally, after the long-time nagging of my friends and relatives, decided to try independence. This is a very exciting time for me and I thank my friends and relatives for giving me the push that I needed
;-)
I have set up my own consulting company called ‘Continuous Delivery Ltd.” – What else? I intend to offer advice to companies travelling down, or embarking on, the complex road to Continuous Delivery. I also plan to work on some software that I think is missing from the Continuous Delivery tool-set – I have to feed my coding-habit somehow (stay-tuned).
I hope that this will, if anything, give me a bit more time for writing blog-entries here. I have a LOT to say that I haven’t got around to yet.
If you are curious, you can read more about my new venture at my company website http://www.continuous-delivery.co.uk/
Naturally, if you feel that my services can be of any help, please get in touch.
Posted in Personal News | Leave a comment

Cargo-cult DevOps

My next blog post in the XebiaLabs “CD Master Series” is now available.
DevOps is a very successful meme in our industry. Most organisations these days seem to be saying that they aspire to it, though they don’t necessarily know what it is.
I confess that I have a slight problem with DevOps. Don’t get me wrong, DevOps and Continuous Delivery share some fundamental values. I see the people that promote DevOps as allies in the cause of making software development better. We are on the same team.
At its most simple level I don’t like the name ‘DevOps.’ It implies that fixing that one problem, the traditional barrier between Dev and Ops, is enough to achieve software nirvana. Fixing the relationship between Dev and Ops is not a silver bullet…
Posted in Agile Development, Continuous Delivery, DevOps | Leave a comment

The Reactive Manifesto

Over the past couple of months I have been helping out some friends to update the Reactive Manifesto.
There are several reasons why I agreed to help. First I was asked to, by my old friend Martin Thompson. The most important reason though is because I think that this is an important idea.
The Reactive Manifesto starts from a simple thought. 21st Century problems are not well-served by 20th Century assumptions of software architecture. The game is moving on!
There are lots of reasons for this: The problems that we are asked to tackle are growing in scale, sometimes in complexity too; The demands of our users are changing; The hardware environment has, and continues to change. The rate of change in our best businesses is increasing.
Talk to any of my friends and they will, no doubt, tell you that I am a bore on the topic of software design – as well as several other subjects
;-)
I think that we, as an industry, don’t spend enough time thinking about the design of our solutions. Too often we start out projects by saying “I have my language installed, my web-server, I have Spring, Hibernate, Ruby-on-Rails, <insert your favourite framework here> and  my database ready to go – now, what is the problem?”. We have become lazy and look for cookie-cutter solutions. We then proceed to write code in straight-lines – poor abstraction, little modelling, rotten separation of concerns. Where is the fun in any of that?
I get genuine pleasure from creating solutions to problems, but I don’t get pleasure from just any old solution. Code that only does the job is not enough for me. I want to do the job with as few instructions as possible, as little duplication. I want the systems that I write to be efficient, readable, testable, flexible, easy to maintain, high-quality, dare I say elegant!
I have been lucky enough to work on a few systems that looked like this. Do you know what? When we achieve those things we are also more efficient and more cost-effective as developers. The software that we create is more efficient too, it runs faster, does more with fewer instructions and is more flexible. This not over-engineering, this is professionalism.
Interestingly their are sometimes similarities in the course-grained architecture of such systems, at least on the larger ones that I have worked on. They are loose-coupled, based on services that implement specific bounded-contexts within the problem domain, that communicate with each other only via asynchronous messaging. These systems almost never look like the standard, out-of-the-box three-layer architecture built on top of a relational database, although pieces of them may use some of the standard technologies, including RDBMS.
The hardware environment in which our software executes is changing. The difference in the cost per byte between RAM and disk is reducing. The capacity of RAM is increasing dramatically. Distributed programming is the norm now, the relative performance of some of our hardware infrastructure has changed (e.g. Network is now faster than disk). Large-scale non-volatile RAM is on the horizon. All this means that the assumptions that underpinned the ‘standard-approach’ have changed. The old assumptions don’t match either the hardware environment nor the problems that we are solving. 
The Reactive Manifesto is about discarding some of those assumptions. About more effectively modelling the problems in our problem domain, writing code that is easier to test, more efficient to run, easier to distribute and that is dramatically more flexible in use.
Take a look at the Reactive Manifesto. If you think we are right please sign it, more than 8000 other people have done so so far. If you think we are wrong, tell us where.
Most importantly of all, please don’t assume that the same old way of doing things is the best approach to every problem.
Posted in High Performance Computing, LMAX, Microservices, Reactive Systems, Software Architecture, Software Design | Leave a comment