What is Modern Software Engineering?

I have a new book out. It’s called “Modern Software Engineering” and I have been working on it for the past few years. 

The ideas in it grew out of a growing realisation that the way that I approach software development, and the way that all of the teams that I was familiar with, that I considered excellent at software development, shared some fundamental characteristics. 

This got me interested in trying to nail down those characteristics and formulate them into a model that I could use to explain what it seemed to me worked best for software development.

Applying Science

I recognised that my own thinking has long been influenced by one of my hobbies, I like to read and learn about science. I am interested not just in the findings of science, but also in the organised approach to knowledge acquisition that it represents. Science is humanity’s best approach to problem solving, so it should certainly be applicable to a difficult, technical discipline like software development.

I had long described my preferred approach to software development, Continuous Delivery, as a simple, pragmatic, application of scientific style reasoning to solving problems in software. This led me to become really interested in exploring this idea in more depth. What does applying an informal approach to scientific learning and discovery mean when we apply it to solving practical problems? There’s a word for that, we call it “Engineering”.

Engineering != Bureaucracy

At this point I got rather nervous. It seems to me that in our discipline of software development the term “engineering” has become either incorrectly loaded with meaning, or emptied of it all together.

On one hand, many people assume that “Engineering” means stifling bureaucracy and heavy-weight process control. 

On the other, “Engineering” simply means writing code and nothing else.

Both of these are profoundly wrong. In other disciplines, “Engineering” is simply the stuff that works. It is practical, pragmatic and more, not less, efficient. 

Sure, it may offer some guide-rails, constraining our thinking, but it does that in a way that helps us to rule-out, or at least steer us away from, dumb ideas. This is a really good thing! 

Avoiding Bad Ideas

In software development we have a poor history of being able to eliminate bad ideas. We tend to make the same mistakes over and over again. 

Data scientists not using version control, resulting in 90% of ML projects never making it into production

Low-code environments that work on the assumption that you know exactly what you want at the start of a project and that that requirement will never change – good luck with that idea! 

So something that was able to steer us away from bad ideas would be a very valuable thing to have.

The Billion Dollar Mistake

The idea that “Engineering == Bureaucracy” is wrong, it comes from a completely incorrect, but understandable, mis-categorisation of what engineering in other disciplines is about, and then applying that mis-categorisation to software.

Humans are used to building physical things, so the production of physical things is front and centre in our minds when we think about making things. The inspiration and design of a physical thing is certainly a challenging problem, but it is so much more difficult to scale that up to produce those things en-mass, that we assume that that is where the only real challenge lies and so we assume that is all that engineering does.

We assume that “Engineering == Production Engineering” and so we, as an industry, made the billion dollar mistake of attempting to improve the efficiency of software development by applying production-line techniques. Didn’t work!

Production is not Our Problem

In software “production” is not our problem! Our product is a sequence of bytes, and we can recreate any sequence of bytes essentially for zero cost. 

This means that we NEVER have a production problem! Our problem is always one of learning, discovery and design. Engineering for software then, needs to focus very firmly on that part of the challenge and ignore, or at least automate our production process. 

Design Engineering NOT Production Engineering

So how do we optimise for exploration, learning and design?

If we want to look for examples outside of software, this is much more closely related to the innovative creation of new things, than it is to production engineering. Design engineering is a very different discipline. Think NASA designing Mars rovers, or Apple designing the first iPhone or SpaceX designing their Starship.

For this kind of engineering you optimise to be great at learning. Modern engineering consciously designs systems in ways that allow the engineers to iterate quickly and efficiently so that they can learn what works, and what doesn’t. We need to do the same.

Designing Complex Systems

The systems that modern engineers create are increasingly complex and sophisticated, so as well as focusing on learning, modern engineering in general, but certainly modern software engineering, needs to focus us on managing that complexity. We need to focus our tools, techniques and mindset on dealing with the complexity that is always at the root of our discipline. 

Software Development as an Engineering Discipline

I came to the view that this assumption that “software development isn’t really engineering” is correct in practice, but very wrong in principle

How I worked for most of my career certainly did not qualify as engineering, it certainly was closer to craft. However, in the latter part of my career I started to think more consciously about how I could do better. 

I started to take a consciously more rational approach to decision making in all aspects of software development. I started to apply some heuristics that would guide me, and the teams that I worked with, more reliably towards better outcomes. It works!

Modern Software Engineering

I have tried to outline this organised, but pragmatic and low-ceremony approach to software development in my new book. To capture some principles that I think are generic to all software development in a way that we can adopt them and use them to steer us in the direction of more successful outcomes. 

My thesis is this, if an engineering approach to software development doesn’t help us to create better software faster, then its wrong and doesn’t qualify as “Engineering”. 

Like most authors, I was nervous about how these ideas would be received, but “Modern Software Engineering”, my new book, is starting to gather some great reviews and people are finding the ideas in it as helpful as I have. If you read it, I hope that you enjoy it.

Posted in Continuous Delivery, Continuous Integration, Culture, Effective Practices, Engineering Discipline, Personal News, Software Design, Software Engineering, TDD | Tagged , , , , | Leave a comment

Women in Computing

March 8th is International Women’s Day, which got me thinking again about why we have so few women programmers (well, in Europe and the USA anyway).

Things didn’t start out this way. Many of the first programmers and pioneers of computing were women. Women like:

Ada Lovelace – who wrote the first algorithm intended to be executed by a computer

Grace Hopper – the first person to design a compiler for a programming language

The history-making Bletchley Park code-breakers.

Katherine Johnson whose contribution to NASA was told in the film “Hidden Figures”

And one of my personal heroes –

Margaret Hamilton who led the team responsible for programming the onboard flight software for the Apollo mission computers and invented the term “Software Engineering”

But, the proportion of women in computing peaked in 1984 and has declined ever since! This is the opposite of the trend in other science, medicine and engineering disciplines. Recent figures I’ve seen, suggest that there may be as few as 15% of people studying computer science subjects, or wanting to work in this field are women and girls. (The gap is worse in Europe and USA, than in India, Malaysia, Africa and China.)

Now I think being a Programmer is a wonderful, challenging, rewarding, rapidly changing, career and I have had the privilege of working with some very talented women. So what is happening in our industry that deters women from joining us?

There are many strongly-held opinions about why this is the case. Misperceptions pervade about what it takes to be a ‘good programmer‘ coupled with a well-established, ‘computer geek stereotype’. 

In 2017, James Damore, a senior engineer at Google, was famously fired in response to his memo claiming that there was a ‘biological reason’ for a lack of female computer scientists. Even a cursory reading of science shows that the differences between men and women, whether cultural or biological, are well within the range of variance for either men or women, so James Damore was talking rubbish, from the perspective of the science, and simply voicing his personal prejudice.

I am convinced that what, if any, differences there are are cultural, we have done this. There is a wide-spread ‘toys for boys‘ culture which has developed in our industry over decades which will be very hard to break down. 

“Computing is too important to be left to men” 

Professor Karen Sparck Jones, pioneering British computer scientist

Is it impossible to change? It’s a difficult problem – but problem-solving is what SW engineering is all about! What can we do more of to provide women with opportunities and to ensure this talent pool isn’t lost to computing? Tackle recruitment practices. Create more positive attitudes to diversity and inclusive organisational culture. Support the work of organisations like “Girls Who Code”. Recognise women role models – women like:

Shafi Goldwasser – the most recent woman winner of the Turing award for her work on cryptography.

…. and, of course, encourage more girls and women to study computer science and learn software engineering.

To mark International Women’s Day, I am offering 

50% off  any of my Continuous Delivery Training & DevOps Courses

Follow this link for more details CD.Training:

I know that gender isn’t the only Diversity issue that needs to be addressed in computing, and Diversity in all it forms is increasingly important particularly as we are now entering the era of AI, algorithms and machine learning. I wrote more generally on Diversity in computing here. However, this article was inspired by International Womens day and inspirational women in computing.

Posted in Culture, Software Engineering | Tagged , , | 2 Comments

Is SAFe safe?

I recently made a passing comment in one of my videos about SAFe, it was a bit of a cheap-shot on my part if I am honest and I got picked-up on it, appropriately, by a viewer.

It is too easy to take cheap-shots at things, so here is my slightly more reasoned explantation of why I am not a fan of SAFe.

My Experience of SAFe

My direct experience of SAFe is limited, but I have worked with several clients that have adopted it as their corporate agile strategy.

The results that I have seen have been in-line with the predictions that I would have made based on my reading of the SAFe approach, and so has tended to reinforce my opinions, and no-doubt, prejudices.

Not Obviously Wrong But…

I have said for years, since I first became aware of SAFe’s existence, that if I look at any small piece of it, while it looks too bureaucratic to me, it is mostly not wrong.

Many people dislike the commercial stuff around SAFe, but for good or ill we live in a capitalist world, so I have no objection to that.

My reticence, and experience, is that I have never seen it work successfully, and I don’t know anyone, anyone that has seen a real successful agile project at least, that claims to.

This is a limited sample of course, and depends on how we measure success, but the orgs that I have seen try SAFe look indistinguishable from the orgs that pay lip-service to Scrum to me. In both cases, they look like waterfall orgs to me.

How Orgs Change

The problem is less a problem of SAFe itself, but more a problem of my perception of how people, and in particular, organisations, adopt change. On the whole they try really hard not to adopt any change at all!

Organisational and cultural inertia is a real problem, and a real barrier to progress. This means that these orgs will read SAFe and then try to force-fit it into their pre-existing mental model. I think that SAFe’s problem is that it looks too much like what went before, therefore fits too neatly into the wrong mental models.

Revenge of the Mutant Methodologies

When I first saw SAFe it reminded me, very strongly, of a previous attempt, RUP (Rational Unified Process). If you looked at RUP from the perspective of small iterative, what these days we’d call, “agile”, teams then RUP made good sense. I was involved in several successful projects based on a very light-weight adoption of RUP. I think that is probably what its creators intended.

The problem was that almost no-one in industry saw it that way. RUP was designed as a kind of self-build-kit for development process, attempting to point (for me strongly) at a more iterative, collaborative (we’d now say “agile”) approach.

What nearly everyone read though was “Ok, so we have to do all this extra paper-work and bureaucracy to do”. One of the first steps in RUP was “select the important artifacts that you will use”, what nearly everyone did was “pick all of the artifacts that RUP ever mentions”. In practice it was often the most bureaucratic version of waterfall that you have ever seen.

The Release Train Plateau

SAFe looks like that to me. There are things that I think are wrong, more complex and wrong, in SAFe.

I dislike the idea of Release Trains, and think that they are an anti-Continuous Integration step, in my experience they often move teams further from where they need to be rather than closer. In some ways you can see Release Trains as a step along the way to Continuous Delivery, in practice they seem to me to, much more commonly, represent a plateau that halts team’s progress towards a more effective flow-based approach.

CD Works Better!

In one of my more successful consulting projects I worked with a moderately large team in a very big organisation, that had nominally adopted SAFe. There were several hundred people in the team that I was working with.

Inevitably, I guided them using Continuous Delivery principles. Starting points were “work so that your SW is always releasable” and “optimise for speed of feedback”. This team outperformed, based on internal company metrics, every other team in the org and now acts as an example and coach for others in the org.

It is Hard to Change

I don’t think SAFe is evil, I can imagine some circumstances, with the right combination of people, where it would work. The trouble is that you can say that about anything, even waterfall sometimes works by accident, if you have people in-place that can navigate it.

I think that SAFe is misguided.

It is incredibly difficult to make changes in big orgs, but I don’t see any evidence that SAFe helps on that journey.

Having said all of that, my experience is limited, please do let me know in the comments where I am wrong in this?


You can watch me explain more of my thoughts over on my Continuous Delivery YouTube Channel

Posted in Agile Development, Culture, Effective Practices | Tagged , , , | 14 Comments

Are we ‘Deploying’, ‘Releasing’ or ‘Delivering’?

This year I started a YouTube channel in which, every week, I discuss ideas related to Continuous Delivery. I have been very pleased with its success.

As part of that, but also because of working from home more as a result of the COVID pandemic, I began to take a more active part in Twitter. The combination of these things is that I now feel much more strongly, than I did before, part of a community. There are a large group of us that regularly communicate and discuss ideas on Twitter and sometimes in the comments on my YouTube channel.

A regular contributor, and digital friend, on both, Jim Humelsine, commented on my latest YouTube video which discusses the topics of Continuous Delivery and Continuous Deployment. Two closely related ideas that are subtly different, and that are the topic of this week’s episode on my channel.

I thought that it was a really good question and that it may help to clarify my answer to it here as well as in the comments section of YouTube. So this post is about definitions and subtle differences in what we mean in the language that surrounds the practice of Continuous Delivery and DevOps.

Jim’s question

“Can you please restate definitions for Release, Deploy and Delivery? I’m still a bit confused among them, especially the distinction between Deploy and Delivery.”

My answer

It is confusing and my wife, Kate, pointed out to me recently that I am not completely consistent in how I use the words. I use the words in the following way:

DeployThe technical act of copying some new software to a host environment and getting it up-and-running and so ready for use. In context with ‘Continuous’ as in ‘Continuous Deployment’ I use it to mean ‘automating the decision to release’ – If your deployment pipeline passes, you push the change to production with no further actions.

ReleaseMaking a new feature available for a user to use. (Note: we can deploy changes, that aren’t yet ready for use by a user, when we make them available to a user, with or without a new deployment, we release them – I plan to do a video on strategies for this).

DeliveryIn context with ‘Continuous’ I generally mean this in a broader context. for me ‘Continuous Delivery’ makes most sense as used in the context of the Principles listed in the Agile Manifesto – ‘Our highest priority is the early and continuous delivery of valuable software to our users’. So it is about a flow-based (continuous) approach to delivering value. That means that the practices need to achieve CD are the practices needed to maintain that flow of ideas – so it touches on all of SW dev. I know that this is an unusually broad interpretation, but it is the one that makes the most sense to me, and the one that I find helps me to understand what to do if I am stuck trying to help a team to deliver.

There is, as far as I know, one place where my language is regularly a bit inconsistent, at least one place that Kate has told me about. I tend to talk about working towards “Repeatable, Reliable Releases”, if I were willing to drop the alliteration and speak more accurately what I really mean is “Repeatable, Reliable Deployments”.

I hope that this helps a bit?

If you are interested in taking a look at my YouTube channel you can find it here.

If you would like to join a group of opinionated people over on Twitter you can see my stuff here.

Posted in Uncategorized | Tagged , , , | Leave a comment

101 Pieces of Advice for New Developers


First, I’d like to say welcome. You’ve chosen a great industry and profession in which to pursue a career. We software developers are changing the world. We are changing the world with a unique form of creativity, and that is something to keep with you while continuing to challenge yourself throughout your career. So welcome. I think you’ve made a fantastic choice.

Who’s Advice?

I recently asked my Twitter and LinkedIn followers what their advice for junior developers might be, and I have gathered responses from a wide range of professionals. From those that are only a few steps ahead of you, to others at the very top of the industry. Some of these people are likely to be the people that are your leaders and the people that make hiring decisions.

“Everyone is ‘Junior'”

The most common response, in one way or another, was to remember that everybody is a junior at something. Everybody, including those professionals at the top of the industry.

All software developers should be learning something new, all of the time. I am considered an expert in a number of development practices, yet I have recently picked up a project in which I am learning to programme in Angular JS. Something I have never done before.

This constant learning is part of our discipline and it differentiates us from many other industries. It is true of other industries too, but I think that it is more true of ours. This, in many ways, is the real essence of what we do. So this is always true. Keep on learning!

Not Knowing the Answer

This leads to my first, and potentially the most important piece of advice. It’s OK to say ‘I don’t know’. Nobody knows everything. Software development is a vast and extensive thing. There are so many technologies and disciplines that nobody in the world is an expert at everything.

More than that though, “not knowing” is fundamental to what we do. We are always creating something new, and exploring ideas that are new to us, and sometimes to the world. It takes a little while to learn this, but not knowing isn’t something to be embarrassed about, in fact, you will spend your career not knowing. 

I like to apply scientific-style thinking to software development, after all, if software really is about “not knowing” then we need to learn to be really good at learning, and humanity has no better approach to learning than science.

Failure is ALWAYS an Option!

One of the key lessons in science is to proceed in a series of experiments, validating our guess as we go. This means that you shouldn’t be afraid of failure. Failure is where we learn our most valuable lessons. Focus on what is in front of you right now and apply effective learning techniques to solve those problems. 

Try not to become overwhelmed by the complexity or the size of a task. Take it one step at a time. If we want to minimize the danger of making a mistake, then we want to be able to make those mistakes on a small scale. Dividing our work up into small, controlled, units of progress allows us to see whether it is worth taking that next step or not. Does this change move us forwards, or is it a step backwards?

Choose Your Own Boss

A good mentor can guide your path to learning, so look out for somebody who can fill that role. I put it like this, choose your own boss. This won’t necessarily be your boss by hierarchy, but somebody you can look up to, learn from, and call on for help. 

If you are in the process of interviewing for jobs, try to recognise where you might be best supported as a new developer. Make sure you show up to interviews with your own questions on that topic. A good team and organisation will have no problem with you questioning the support system that they provide. Whether you are currently in a position of work or making applications, take yourself and your career seriously. Evaluate your best routes for success and take them.

Learn by Playing

Not only should you think about your professional progression, but take time to recognise what you really enjoy about writing code and developing software. Play with software. Do things for yourself.

I started my fascination with software by creating games and then went on to develop other silly things for my own entertainment. I actually have a video on YouTube about the dumbest things I’ve ever programmed (Watch here: https://bit.ly/Top5DumbCode).

They might have been dumb, but they provided significant entertainment to myself and coworkers around me. There’s no better avenue for success than enjoying what you do.

Focus on the Fundamentals

The fundamentals of software development do not change. Which means they are more important than any tool, language or framework. Those fundamentals are where the real skill of a developer lies, so focus on the fundamentals. 

When you are starting out you will inevitably have a strong focus on learning the tools of our trade, but be careful. The tools that really matter aren’t all obvious. Sure you need to know the language, dev environments and frameworks that you are using. However, over the course of your career, these will come and go. The real tools are more foundational. 

It is unfortunate that recruitment doesn’t always work this way. Recruitment often works by having checklists of tools and technologies, rather than recognising the problem-solving techniques that are the real heart of our industry. This is particularly true when recruiting junior developers. As it’s hard to demonstrate your capability, let alone potential, so many organizations resort to these simple check-lists.

This is, of course, a poor way to go about recruitment, but it is often the approach taken by many firms. Remember, that this is not what makes you a great software developer. A long list of tech in your Resume or CV doesn’t make you good at software development.

Learn the Skills to Compartmentalise Problems

My final piece of advice goes back to the idea that it is ok to say, ‘I don’t know’. The human brain is limited by capacity, there’s only so much stuff we can genuinely understand, and these days, we build software systems far beyond that capacity. We need ways to manage that complexity. 

As developers we need to become experts at the techniques of compartmentalising problems and our solutions to them. Take that idea seriously!

Compartmentalise and create ‘modular systems’ to divide up any problem into pieces that are more manageable. Related ideas in you code should be close together, this is called Cohesion.

Each piece of your code should be focussed on achieving one thing. Use separation of concerns as a tool to help you to create better designs with better Modularity & Cohesion. Value the readability of the code that you create, and do what you can to make your code easy to work on. Seek out examples of good code, and maybe bad. Be opinionated about what makes good code good!

One of my favourite descriptions of “Good Design” comes from Kent Beck:

“Good design is moving things that are related closer together and things that are unrelated further apart”

Enjoy the Journey

I am attempting to do a small part in helping with your learning on my YouTube Channel…

and My MailList, where I regularly publish useful guides on ideas and practices that seem important to me…

I hope that these pieces of advice can be helpful to you. Take time to appreciate the journey. Look to where you can best succeed and work on a career that can leave you feeling truly fulfilled. And I shall say it again. Welcome. You’ve made a great choice.

For more top tips and advice for junior developers watch my YouTube video here!

Posted in Uncategorized | Tagged , | 1 Comment

The Impact of Continuous Delivery

When Jez and I wrote our book, we knew that we were describing a powerful approach. We were very nervous of claiming a “Methodology” though. Instead we saw the Continuous Delivery book as describing, in some detail, an approach to Build, Test and Deployment Automation and hinting at something broader in-scope.

I am less reticent these days. I have seen the philosophy of Continuous Delivery transform organisations. I have been personally involved in helping many firms make this shift. I now make no bones, CD is a holistic approach that extends a long way beyond merely Build, Test and Deployment Automation.

Working so that you are in a position to deliver value into the hands of your users and customers continuously is a radical change, but it is now measurably, demonstrably the most effective way that we humans currently know how to create great software.

The impact on teams and organisations is profound. Continuous Delivery as a discipline is not just about the technicalities either in terms of practice or impact.

From my own experience, but also backed-up by over 10 years of robust research, studies and the experience of software developers around the world, I now know that Continuous Delivery has the following benefits:

  • We can create better software faster, with no trade-offs between those two ideas.
  • We can reduce defect counts by multiple orders of magnitude.
  • We can spend a significantly greater proportion of our time on new ideas and less on unplanned tasks.
  • We can innovate more quickly and steer our businesses towards greater financial success.
  • We can solve harder problems.
  • We can have a better work/life balance while doing all of these things.
  • We can work with less stress, and more creativity.
  • We can be more compliant and safer in regulated and safety-critical industries.
  • We can revolutionise the organisations in which we work.
  • We can do all of these things whatever the nature of the software.

I now believe that this engineering-led approach is not only generally applicable, but is the best way to create any software.

I asked some people that I know, who have personal experience of employing Continuous Delivery in businesses of all sizes, to comment on this impact:

“Continuous Delivery helps us focus effort on things that bring value to our customers, instead of wasting time on repeatable and automate-able tasks in configuration, testing and deployment. It enables us to move fast with confidence, frequently applying small changes and keeping our product quality high and deployment risk low. MindMup.com is a product of a two-person team, serving millions of users across the world. The two of us do everything from user research through development, testing and operations tasks, to customer support. There’s no way we’d be able to achieve any kind of delivery speed if our releases required a lot of work or caused customer support requests. Software quality and stability are essential so we can focus on adding value instead of fixing bugs. Investing in continuous delivery practices pays off big because it delegates repeatable tasks to machines and frees up our time to work on things that actually require human insight, allowing us to successfully compete with several orders of magnitude larger organisations.”

Gojko Adzic (http://gojko.net)

The automation that Gojko describes is a central practice in CD. I like that he uses the word “Investing” though. I speak to many teams starting out who claim that they don’t have time to automate. This is a bit like saying that you are going to walk from London to Edinburgh because you don’t have the time to put petrol in your car.

The State of DevOps reports says that teams that practice CD spend 44% more of their time on new work than teams that don’t.

“Working with large enterprises, I’ve seen that the technical agility Continuous Delivery supports can be leveraged for greater business agility. With that in mind, it was easy to believe Dr. Forsgren’s research showing better business performance from teams adopting DevOps. I put my money where my mouth is, so to speak, and invested in companies who convince me that they’re dedicated to DevOps and Continuous Delivery. My only regret to date is not investing more.” 

Eric Minick (IBM) (https://devops.com/revisiting-my-bet-on-devops/)

Since the publication of my book, I have grown to believe that the reason that CD works is quite profound. I believe that CD is founded on the application of scientific principles to solving problems in software. We make evidence-based decisions, use falsification, form hypotheses that we test in the form of (often automated) experiments. We proceed in small steps, validating our learning and understanding as we progress. We control the variables with techniques like Infrastructure as Code.

For me CD is a genuine “Engineering Discipline for Software“. I too find Dr Fosgren’s work (described in the excellent “Accelerate” book) compelling. We have a measuring stick (Stability & Throughput) and correlative model that can guide our efforts to learn to create “Better Software Faster“.

“Working on a software product with long upgrade cycles, my team felt caught between sales demanding features immediately and customers who couldn’t immediately adopt and didn’t want frequent releases. We rearchitected to enable more of a Continuous Delivery approach and zero downtime updates to just one area of the product that was a magnet for these sales requests. That investment gave us the agility we needed where we needed it. The developer/sales relationship improved as did the business.” 

Eric Minick (IBM) (https://dzone.com/articles/know-the-business-case-for-rearchitecting)

Eric points out that this is NOT just a technical impact. CD has a revolutionary impact on the way in which the business that employ it can operate.
Organisations that practise Continuous Delivery “have a 50% higher market cap growth over 3 years

“I would love to say that it made NS better, faster, cheaper, and happier – although many believe this we still can’t really prove that with data! I can say however that our CD efforts in 2016/17, and the Agile and DevOps transformation that followed, has led to greater ownership by the teams, to fundamental discussions about organization and governance of software development, and to improved understanding and alignment between business and IT. And of course, in the meanwhile, teams have automated everything they could, supported by a fine tool suite and plenty of coaches.”

Huub van der Wouden
Dutch National Railways

The adoption of Continuous Delivery is not easy. Because of its holistic nature the changes sometimes proceed slowly, but even then the impact is significant. The data from CD teams is interesting in terms of cultural performance as well as technical. Teams that practise CD claim better work/life balance, lower stress, and a greater sense of ownership.

The State of DevOps Report found that “The No.1 predictor of high performance in teams is Job Satisfaction“.

“At Siemens Healthineers, we have a growing number of teams adopting Continuous Delivery in the heavily regulated medical device industry. Currently, we have 20+ teams on the journey to Continuous Delivery. Different teams adopt various Continuous Delivery ways of working at different rates. Some teams are further ahead than others. What we have seen over the years is a growing appetite for Continuous Delivery, which is set to continue in future. Teams making breakthroughs in e.g. testing, deployment etc. never look back but rather want to achieve more!”

Vladyslav Ukis
Siemens Healthcare

Vlad describes another, growing, impact of CD. It is almost impossible to imagine the creation of a Deployment Pipeline without getting a perfect audit-trail of production changes as a side-effect. This, and other properties, make CD the perfect approach for regulated and safety-critical industries. I wrote about this in this blog post on “Continuous Compliance“.

“On the cost side, the software delivery has to be optimized for small chunks, in order to support the revenue side of the story. This requires the delivery organization to be set up in such a way that development teams can make releases independently. This holds true irrespective of the number of teams. That is, the cost of running the organization (overhead) does not significantly increase when the number of teams increases. Additionally, the teams can be trained to work in a way that allows them to accelerate over time. This reduces the cost per small batch delivery over time.

On the capital allocation side, the software delivery in small chunks has another business advantage. If an organization has a software delivery capability with small chunks, it enables investments in software products to be placed with a stop option, which can be exercised at any time. This way, you can invest a little, see whether the results are promising, invest a little more if they are, reduce the investment if they are not, and keep going this way. It is an efficient way to allocate the capital bit by bit taking small risks and evaluating results along the way. The stop option allows the capital to be invested with an in-built risk reduction strategy.

For the reasons above, I would not want to invest in software products that are not built using Continuous Delivery!”

Vladyslav Ukis
Siemens Healthcare

Working in smaller steps is natural in CD. It gives faster, clearer feedback of each change and is promoted by an efficient Deployment Pipeline. It is is also extremely valuable to businesses. It allows them to try new ideas at lower cost, to be more reactive to customer or market demand. It also allows them to take the safety of the systems that they create more seriously too. If a change is small and simple, it is also lower risk, easier to diagnose if something goes wrong and easier to remove if necessary.

In a paper on “Online Experiments at Large Scale” looking at Microsoft – “2/3 of ideas produce zero or negative value

“CI is a communication tool.  It alerts people working on a system that they need to talk to other people to resolve a potential conflict between the work they are doing separately.  Or at least, appeared to be doing separately until CI detected the work wasn’t as separable as they thought. That’s why integrating “at least once a day” is not a useful definition any more.  Do you want to waste a day’s work before dealing with any conflicts?  CI should provide that alert as fast as possible.”

“Integration should happen so frequently that when a conflict occurs, it’s quicker for the people involved to have a discussion, discard some of the changes and write them again than it is to merge their changes.”

“Continuous Deployment changes the way you can think about architecture.  It gives you more choices about where data and logic can be placed. When you can safely/rapidly/automatically redeploy any components of an application at any time, you can choose to place data or logic in those components rather than in separate services or databases.”

“CI/CD pipelines are not a “dev” environment.  They are critical to the functioning of the production system, because they are the way fixes get deployed into production when there is an incident.”

Nat Pryce (http://www.natpryce.com)

Nat points out the profound impact that thinking about fast-feedback and test-ability has on the software that we create. In general “quality” in code and systems has been an issue of experience and talent. Continuous Integration, and its big brother Continuous Delivery, apply a separate pressure for higher quality.

If you can’t release into production if a single test fails (a recommended CD practice) then keeping the tests passing is central to the approach. To keep the tests passing you need “repeatable reliable” tests. If you want your tests to be deterministic, then you need to make your code testable. The properties of testable code are the properties that we value as the hallmarks of high-quality software.

Of course this still depends on the talent and ingenuity of development teams, but it applies a pressure to do the right things that otherwise only exists as a matter of personal discipline. CD is a GREAT tool to drive a better focus on higher-quality work and provides a mechanism to give clearer feedback to developers on the quality of their work. (I described this idea, in part, here)

“I highly recommend this book <Continuous Delivery>. Working at LMAX where we put into practice many of these things fundamentally changed the way I think about software development”

“I’ve seen that organisations that have adopted CD spend significantly less time on releases. Higher levels of automation means much less (if not zero) time spent by developers and operations team members on evenings and weekends. This is beneficial in a number of ways: less overtime; less time spent on releases means more time to develop features; less human interaction means a lower risk of human-introduced errors; and a happier, more productive team environment (which is great for recruitment and retention).”

Trisha Gee, JetBrains

Trish describes how the technical practices of CD reinforce the desirable human outcomes for the people working on these teams. Less stress, better work/life balance and generally a better, more creative, higher-quality work environment.

“Our CI/CD environment is consistently humbling our development team with failing tests that were not expected. Most of these failures would have silently made their way to production unnoticed.”

“Our CI/CD environment is able to provide fast accurate feedback because our developers are empowered to write tests that stand up to a rigorous cost benefit analysis rather than simply adding tests to meet some arbitrary code coverage metric”

Judd Gaddie (TransFICC)

Judd’s team operates a sophisticated CD approach delivering very high-class software. CD is an engineering discipline in that if you follow the practices your will certainly get a higher-quality result. Measured in terms of speed and quality.

“We are able to make large, cross cutting changes to our system with high-levels of confidence due to the quality of our CD pipeline. We were able to completely re-write the architecture of our system and know it worked from our pipeline (we didn’t have to change any acceptance tests either!).
Some things that spring to mind but are not necessarily CD related…

The first thing we built when we started the company was our CD pipeline and a feedback screen. Whilst it meant we sent slowly at the beginning it has meant we have always had a strong testing pipeline and allowed us to focus on automating everything and provided an easy way of showing whether our software works or not. It has also provided credibility to the company in a way of showing we know what we are doing
Zero time spent “preparing” a release as trunk is always releasable”

Tom McKee (TransFICC)

Tom takes us back to the start with the idea of investing in our approach and our work. Work done to make us more efficient is not waste, it helps us to eliminate waste. For Continuous Delivery teams there is no trade-off between speed and quality. In fact investing so that you can go quickly naturally leads to working in smaller steps and promotes higher-quality outcomes.

I believe that Continuous Delivery represents the “state of the art” for software development. I also believe that this approach is a genuine “engineering discipline” for software development.

Continuous Delivery does what engineering does in other fields, it amplifies our craft and creativity and allows us to create machines and systems with greater speed, quality and efficiency. There is no better way, that we know of, to create high quality software efficiently.

It takes hard work, and often ingenuity, to adopt Continuous Delivery. To be world-class you need to consider cultural, organisational and technical performance. Organisations that are world class at this are different in approach, and effectiveness, to organisations that aren’t. This is the approach behind many of the leading companies in the world.

So, ten years after my book was first published, I am very proud of the impact of these ideas, and in my part in popularising them, but there is still a lot more work to do, so I am looking forward to the next 10.

You can find out more about some of these ideas on my “Continuous Delivery” YouTube channel here.

I have also recently released an on-line version of some of my successful CD training courses here:

Finally, if you haven’t had enough of me yet, you can subscribe to my mail list here.

Posted in Uncategorized | Tagged , , , , , | Leave a comment

10 Years of “Continuous Delivery”

10 Years Since My Book Was Published

My book, “Continuous Delivery” was launched on 10th August 2010, so in a few weeks time it will be the 10th anniversary of its publication. Jez and I spent 4 years writing the book, and several years before that doing the work that informed that writing. Continuous Delivery has been a feature of my life for a long while now.

I very clearly recall the sense of pride when our book was published, nevertheless neither of us thought that it would have the impact that it has and that the ideas would become so widely recognised as the state of the art in software development approach. Continuous Delivery is now the approach behind the work of many of the biggest and most successful software-driven organisations on the planet.

Originally we had a few celebratory things planned for this 10th anniversary year. Jez and I spoke together for the first time at the DeliveryConf in Seattle, at the start of the year (you can watch it here: https://youtu.be/FVEWdatM8Uk). Then a global pandemic reminded us of the limits of our planning.

The Importance of CD

I have spent the last few years working as an independent software consultant, advising clients on how to improve their software engineering practices, with Continuous Delivery at the heart of those improvements. I have become more convinced, rather than less, that the ideas in Continuous Delivery are important, and bigger than I thought when we wrote the book.

I believe that the reasons why CD works is that it is rooted in some deep, important ideas. It is primarily focussed on learning efficiently. CD works by creating fast, efficient, high-quality feedback loops that operate from the few seconds of feedback from a TDD test run, to the feedback generated by creating a releasable thing multiple times per day. It also facilitates that most important feedback loop of all, from customer to producer. Allowing organisations to experiment with their products and hone them to better meet the needs of customers and so create great products.

When we came up with the ideas and practices of CD it was done as an exercise in empirical learning and pragmatic discovery. We did none of this based on theory, all was based on practical experience in real software projects. Since then, through my experience of helping people to understand and adopt these practices in all sorts of organisations, for all kinds of software, I now recognise some deeper explanations for why CD works.

CD As an “Engineering Discipline”

I believe that CD represents a genuine “engineering” approach to solving problems in software. By that what I mean is that we are applying some important scientific principles to software. Despite us thinking of our discipline as technical, it has been surprisingly un-scientific in approach. Most software dev proceeds as a series of guesses, we guess what users want, we guess at a design, we guess, usually based on a convincing expert or colleague which tech we will use, we guess if there are bugs in it on release. I believe that CD, when taken seriously and practiced as a core, organising, discipline for software development, rather than interpreted as meaning only “deployment automation”, helps us to eliminate much of this guesswork. Instead we create hypotheses, try them out as mini-experiments, we accurately measure the results and we work to control the variables so that we can distinguish signal from noise.This is “engineering” and the results when we apply it are astonishing and dramatic, as shown by Jez’s work with Nicole Fosgren.

My CD Mission

So ten years later, I feel like I am on something of a mission. Of course I am delighted at the success of our book and the impact that it has had on teams all around the world. I am also personally grateful for the impact that it has had on my career. I am now seen as an expert in this field and have travelled the world helping people and teams as a direct result of that literary success. However, my mission is not done.

I believe that CD matters because Software matters and CD is THE BEST WAY TO CREATE SOFTWARE with speed, efficiency and quality.

So thank you for your support over the years. I hope that you have enjoyed my book, and my other stuff.

I have some other things in the pipeline to, hopefully, help me with my mission which is to help teams and individuals improve their skills, techniques, and perhaps most important of all, engineering approach to software development.

A New Book?

I am working on another book, in which I explore in some depth this idea of what “Engineering” should mean for the discipline of software development. What form would a genuine “engineering discipline for software” take? I am not good at predicting when I will finish books, but this one is progressing quite well so I am hoping that it will be published next year.

CD Online

My YouTube Channel

I had been very busy with my consultancy, and so in some sense the pandemic gave me the impetus, and the time, to do something that I had been thinking about for a long time.

I have begun a series of videos, published weekly (every Wednesday evening, UK time) on YouTube, in which I explore different aspects of, and different ideas that are prompted by, Continuous Delivery and its practice.

My “Continuous Delivery Channel” covers my thoughts and experiences on Continuous Delivery, DevOps, TDD, BDD and Software Development as an Engineering discipline. It is quite wide-ranging talking about the technical, cultural and organisational practices and impact of Continuous Delivery.

I have been very pleased with the growth of the channel so far, and naturally hope that it will continue to be interesting, and useful, to people.

Continuous Delivery Training Courses

I am also in the process of getting most of my training courses set up on-line. But that takes a bit longer, so I’ll be saying more about these later in the summer. My on-line training programme will include:

"Getting Started With Continuous Delivery"
"Anatomy of a Deployment Pipeline"
"TDD - Design Through Testing"
"ATDD - Stories to Executable Specifications"
"Leading Continuous Delivery"

I have plans for many more, but this is already a lot of work 🙂

My goal is to share the best ideas about how to build better software, faster. So if you have a particular interest that we can explore in one of my next YouTube videos, or if you have a particular training need, please let me know.

CD Mail-List

Finally I have set up a mail-list via which I will share thoughts and keep people informed of any news. To celebrate the 10th anniversary of my book, and to say thank you to subscribers to my mail list, I am running a competition, I am giving away a signed, first-edition, copy of “Continuous Delivery”. Everyone on the email list is eligible for the draw, so if you haven’t already, please sign-up.

Posted in Agile Development, Continuous Delivery, DevOps, Engineering Discipline, Personal News, Software Engineering | 2 Comments

Welcome to My YouTube Channel

I have recently decided to launch a YouTube Channel to complement this Blog.

My aim for the channel is to provide some insight into the techniques and practice of Continuous Delivery, explain some of my ideas on Software Engineering, and how we can start to work more like Engineers and gain advantage from that.

I also plan, from time to time, to indulge my general interest in software development, muse on what we can learn from science and how to apply that to software development, and finally to be a bit opinionated.

Take a look and if there are any topics that you are particularly interested in me covering please let me know.

So far I have published the following episodes:

Posted in Agile Development, Blog Housekeeping, Continuous Delivery, Effective Practices, Engineering Discipline, High Performance Computing, Personal News | Tagged , , , , | Leave a comment

Q&A from GOTO Copenhagen Session on Reactive Systems

I recently spoke at GOTO Copenhagen on the topic of Reactive Systems.

I will post a link to the video of my talk here when it is published.

I didn’t have time to answer all of the questions, so here are my answers to the questions asked…

Q: A typical question that arises when thinking about eventual consistency is the user responsiveness. Imagine a user updating some property of something in the UI, that user often wants to see the result of that update, immediately. It’s kind of bad user experience to show him out of date information and a message “please refresh in some time to see the result of your action”. What is your view on that? How can we hide eventual consistency from the user, in the cases where we don’t want him to notice that it’s eventual consistent in the backend?

Q: How do you deal with asynchrony in the UI where users want immediate feedback of their action?

A: Eventual consistency may, maybe should, make you think about some different ways of showing the user what is going on, but there is also an assumption built into your question that this is going to be slow.

Imagine what is going on in a synch call across a process boundary. A request-response call to a remote server perhaps…

Our client code needs to encode the call somehow.
The request needs to be sent across the wire.
Our thread needs to be blocked while we wait for a response.

The server end will be triggered when the message arrives.
We will need to translate the message into something useful.
We call the server code with the message.
Server code formulates a response.
Server code with need to translate the response into something to send.
and send it.

Our client code will need to receive the message.
Translate it into something useful.
Wake-up the client-blocked sync thread.
Call the client code with the response.
Process the response.

Now think about how that would be different for an async communication. We would remove the steps to block the thread on the client and reactivate it. Instead we could imagine that thread being continually busy, in the simplest case, looping around looking for new messages.

So in the line of any communication there is less work to do, not more.

All of the highest performance systems in the world, that I am aware of, are built on top of async messaging for this reason. Telecoms, Trading, Real time control systems.

So for the vast majority of interactions a user of an async system will get better, rather than worse, response. In the tiny number of interactions when something is going wrong, and responses are slowed the user is seeing the truth, that their invocation hasn’t finished yet, but they are not blocked from making progress elsewhere.

This does lead to a slightly take on UI design, but it is, at least, only different rather than worse, and maybe a more accurate and more robust representation of the truth.

Q: Event-Driven or Message-Driven in 2020?
Q: What is the difference between events and messages?
Q: Should we be storing messages or event? Or both?

A: When we wrote the Reactive Manifesto https://www.reactivemanifesto.org/ we debated this a lot. “Event or Message”. We came down on the side of Message because an “Event” gives the impression that there is not necessarily any consumer of the event. Whereas “Message” has a more obvious implication is that something somewhere cares and is listening.

I think that this may be thought of as a bit like counting Angels, and personally I am fairly relaxed about the differences, but when you are trying to communicate ideas broadly it is sometimes useful to be a bit pedantic about the language that you choose.

Q: How do reactive frameworks relate to reactive systems?

A: I think that there is a relationship, but they are not the same. Reactive frameworks are largely focussed on stream processing at a programatic level, Reactive Systems is more of an architectural stance.

I did say in my presentation that these ideas are kind of fractal though, so the async, event/message-based nature of both of these levels of granularity are common.

There are some details of the Reactive frameworks that I have seen that I dislike, as a matter of personal taste, as a programmer (Futures for example). I see little advantage in trying to make aysnc look like sync.

Taking a more architectural viewpoint and simply, at the level of a service, processing async messages as input and sending async messages as output results in simpler code. It may result in a little more typing, but the code will be simpler and so easier to follow.

The real advantage that I perceive in Reactive Systems is the separation of essential and accidental complexity. The code that I spend my day-to-day work on is inside the services. It is focused solely on the domain logic of my problem. Everything else is outside in the infrastructure. Reactive Programming probably offers the same effect if you think about it, but most of the code that I have seen doesn’t achieve that.

Q: Any good places to store events?

A: Ideally that is a problem for your infrastructure. Aeron, for example, has “Clustering” Support which allows you to preserve, and distribute, the event-log. When configured this way, it will record, and play-back, the stream of events for you.

But once you have the stream, you can do almost anything you like with it.

Q: When would a massage driven system be inappropriate, or just overkill?

A: I think that my answer to this splits into two.

On the one hand, this style of development is still reasonably niche. It has a long an extremely well established history, but is still most widely used in fairly unusual problems. Trading, Telecoms, Real-Time systems and so on. I believe that it is MUCH more widely applicable than that, but because of that the tooling is fairly niche too. Akka is probably the most fully-fledged offering. It is certainly Reactive, personally I think that there are some aspects of the Actor model in Akka that seem more complex than is really required, but it is a great place to start, with lots of examples and successful industrial and commercial applications.

On the other hand, as I said there is something fairly fundamental at the level of Computer Science here. Async message passing between isolated nodes is a bit like the quantum physics of computing, it is the fundamental reality on which everything else is built. This is how processors work internally. It is how Erlang works, it is how Transputers worked in the 1980s and it is how most of the seriously high-performance trading systems ion the world work at some level.

Performance isn’t the only criteria though. I value this approach primarily for the separation of accidental and essential complexity. Distributed and Concurrent systems are extremely complex things – Always! This approach allows me to build complex systems more simply than any other approach that I know.

So I think that it should be MUCH more broadly applicable, but the currently level of tooling and support means that I probably would choose to use it only when I know that the system will need to scale up to run on more than one computer or needs to be VERY robust. For systems simpler than that, I may compromise on a more traditional approach 🙂

Q: Instead of back pressure couldn’t you automatically startup an extra component b?

A: Yes you can but you need to signal that need somehow, and that is what “Back-pressure” is for. It allows us to build systems that are better able to “sense” their need to scale elastically on demand.

Q: Why unbounded queue is bad pattern? How about Apache Kafka?

A: An unbounded queue is ALWAYS unstable. If you overload it what happens next? To build resilient systems you must cope with the case of what you will do when demand exceeds supply (the queue is full).

There are only three options:

  1. Discard excess messages.
  2. Increase resources to cope with demand.
  3. Signal that you can’t cope and slow-down the input at the sender.

Options 2 & 3 require the idea of “Back-Pressure” to get the message out to something else to either launch some more servers (elastically scale) or to slow input.

At the limit, given that resources are always finite (even in the cloud) you probably want to consider both 2 & 3 for resilient systems.

Kafka allows you to configure what to do when the back-pressure increases.

Q: If you need to join two datasets, coming from two different streams, first stream – fast real-time, second – slowly changing, without persisting data on your storage, how would you recommend to do it? Any recommended patterns?

A: In the kind of stateful, single-threaded reactive system that I was describing this is a fairly simple problem. Imagine a stateful piece of code that represents your domain logic. Let’s imagine a book-store. I could have a service to process orders for books. I have lots of users and so the stream of orders is fast and, effectively, constant.

I may not choose to design it like this in the real-world, but for the sake of simplicity, let’s imagine that we check the price of the book as part of processing an order.

I am going to process orders and changes to the price of books on the same thread. This means that I can process different kinds of messages via my async input queue. When an event occurs to change the price of a book, interspersed with processing orders, as I begin to process that message, nothing else is, or can, go on. Remember, this is all on a single thread, so the “ChangeBookPrice” message is in complete, un-contended control of the state of the Service.

So I have no technical problems, my only problems are related to the problem-domain. These are the sorts of problems that we want to be concentrating on!

So what should we do when we change the price of a book?

We could change the price and reject orders not at that price. We could change the price, but allow orders placed before we changed the price to be processed at the old price… and so-on.

I think that the simplicity of the safe, single-threaded, stateful, programming model combined with the separation of technical and domain concerns that it entails gives us greater focus on the problem at hand.

Q: Let’s say you have scalable components and a large history of events. How to deal with the history to recreate the state of that new component which just scaled up. Use snapshots to store an intermediate state of a component?

A: Yes, this is one of the complexities of this architectural approach. You get some wonderful properties, but it is complex at the point when messages change.
The first thing to say is that in these kinds of architectures, the messages store is the truth!

The first scenario, that you talk about, is what happens if you want to re-implement your service. Well, as long as the message protocol is consistent – go ahead, everything will still work. Since a message is the only way that you can transform the state of your service, as long as you can consistently replay the messages in order, your state, however it is represented internally, will be deterministic.

The problem comes when you want to change the messages. You have then got an asymmetry between what you have recorded and what you would like to play-back. When we built our exchange we coped with this in two different ways.

When we shut the system down we would take a “snapshot” of the state of a service. When the service was re-started it would be restarted by initializing it with the newest snapshot, and then by replaying any outstanding, post-snapshot, messages.

We then built some tools that allowed us to apply (and test) transformations on the snapshot. This was a bit complicated, but worked for us.

The other solution was to support multiple message versions at runtime and dynamically apply translations into the new form required by the service.

One more, common, pattern that we didn’t use much in our exchange was to support multiple versions of the same message, through different adaptors.

Q: How can random outcomes be reproduceable? Eg implementing a game with dice. Roll die will have a result, but if only the command is saved?

A: Fairly simply, you externalize the dice! Have a service outside of the game that generates the random event. Send that as a message. The game is now deterministic in terms of the sequence of messages.

Q: What about eventual consistency of data? how do you resolve conflicts?

A: I think that broadly there are two strategies. You align your service with Bounded Contexts in the problem domain. You choose these, where you can, so that you don’t care about consistence between different services.

For example. If I am buying books off Amazon. The stuff that is in my shopping cart right now is unrelated to the history of my orders. Even once I have ordered the stuff in my cart, I don’t really care if it takes a second or two for the order-history to catch-up. So “eventual consistency” between my “Shopping Cart service” and “Order History Service” doesn’t matter at all.

Where I need two distinct, distributed, services to be in-step I can take the performance overhead of achieving consensus. There are well-established distributed consensus protocols that will achieve this. RAFT is probably the best known at the moment. So you can apply RAFT to ensure that your services are in-step where they need to be.

If this sounds slower, it is, but it is no slower than any other approach that is ALWAYS what you must do to achieve consistency. These are the same kind of techniques that happen below the covers of more conventional, distributed synchronous approaches – e.f. Distributed Transactions in a RDBMS.

Q: How do you ensure ordering across multiple instances of the same component? So scaling up, without risking two instances reserving the same, but last, book in the inventory?

A: This is back to the idea of eventual consistency. There are two strategies, live with the eventual consistency:

Allow separate instances of your multiple instances to place an order for a book at the same time, but have one “Inventory” to actually fulfill the order.


Use some distributed consistency protocol to coordinate the state of each place where books can be ordered.

Q: Isn’t reactive Actors in a different pyjama?

A: The stuff I was describing could be considered to be a simple actor style. It misses some of the things that are usually involved in other actor based systems (e.f. Akka).

The fundamental characteristics though are the same. We have stateful bubbles of logic (actors) communicating exclusively via async messages.

Q: In the system you built – did you use a message bus?

A: Yes, we built our own infrastructure layered on top of a high-performance messaging system from 29 West.

Q: Should messages be sent to kafka or similar?

A: You can certainly implement Reactive Systems on top of Kafka.

Q: Why not just accept message 4 if 3 is missing? is order important?

A: Dropping messages is not a very sensible thing to do at the level of your infrastructure. Though it may make sense within a particular problem domain.

The problem is, if my infrastructure just ignores the loss of message 3, then the state of my service processing the messages is now indeterminate. Imagine two services listening to the same stream of messages. One gets 3 the other doesn’t. If we don’t make the message order dependable our system is not deterministic.

If your problem domain allows you to ignore messages, perhaps they arrive too late and are no longer of interest – true in some trading systems for example, then you should deal with that problem in the code that deals with the problem domain, the implementation of the service, rather than in the infrastructure.

So the safest approach is to build the reliable messaging into the infra and deal with other special cases as part of the business problem.

Q: Persisting events sounds like a Big overhead on traditional synchronous call

A: Yes and doing it efficiently is important. However, if you have a system, of any kind, that requires state to be retrievable following a power outage, you have to store it somewhere. The mechanisms that I described for how the message stream is persisted as a stream of events is almost precisely the same as you would implement in the guts of a Database system. All modern RDBMS’ are based on the idea of processing a “transaction log”. This is the same thing, except where, and when, we process the log is changed.

When building our exchange we did a lot of research into this aspect of our system’s performance. The trouble with something like a DB is that it is optimized for a general case. If you look carefully at the performance of the storage devices, that we use to persist things, they are all, even SSDs, not optimized for predictable performance in Random Access. They work most efficiently if you can organize your access of them sequentially. We took advantage of that in the implementation our or message-log persistence so that we could stream to pre-allocated files and so get predictable, consistent latency. Modern disks and SSDs are very good at high-bandwidth streaming. So we could outperform a RDBMS by several orders of magnitude.

There is tech on the horizon that, I think, may disruptive and so strengthen the case even more for the kind of Reactive Systems that I described. That is massive-scale, non-volatile RAM.

Q: Was it LMAX you were working for?

A: Yes, LMAX was the company where we built the exchange.

You can read a bit more about our exchange and its architecture here: https://martinfowler.com/articles/lmax.html

Q: Service as a State machine implicates that the services should be stateful? That is added complexity? Thinking about changing the flows etc.

A: You have to have state somewhere, otherwise your code can’t do anything much.

Not all of your code needs to be stateful though. For the parts of your system that form the “system of record”, in this approach, those parts are implemented as “Stateful Services”.

If you want high-performance you can do this using the in-memory state as the system of record, using the techniques that I described – That was how our exchange worked. For other, slower, systems you service could be stateful and backed by some more conventional data store if that makes sense.

Q: How would a single thread bookstore service handle an order coming in while it is still processing the previous order? Or alternatively, two simultaneous orders?

A: It would queue the incoming second order and process it once the BookStore had finished processing the first. However, because of the MASSIVE INEFFICIENCY of data sharing in concurrent systems, avoiding the locking is something like three orders of magnitude faster than tackling this as a problem in concurrency.

Q: How to effectively handle transactions (and rollback in case of fail) in event based system? And how to understand that transaction not finished?

A: In these kind of systems the simplest solution is that a message defines the scope of a transaction. If you need broader consistency, use a distributed consensus protocol like RAFT.

Q: How do you deal with the communication with mobile and web frontends (and the UX of it)? Websockets and other solutions always feel more complicated for many use cases.

A: My preferred approach for all UI is to deal with it as a full bi-directional async comms problem. So then you have to use something like Websockets to get full “push” to the remote UI.

Q: Can your code use different cores in the CPU? Or will the next instance of execution use the same core? Do you utilise all the cores?

A: Yes, the system is multi-core, but is “shared-nothing” between cores. We can achieve this through good separation of concerns. For example, one core may be dedicated to taking bytes off the network and putting them in the ring-buffer, another maybe focussed on journaling the message to disk, another to processing the business-logic of the service and so on.

You can read more about the LMAX Disruptor, that coordinated those activities here:


…and see an interview with me and my friend Martin Thompson on the topic here:


Q: How would you make mission critical software asynchronous?

A: Most really mission-critical software is already asynchronous! Look at Telecoms!

Q: Do we have to use Reactive Frameworks (like RxJava) in a Reactive System?

A: No, see my earlier answer.

Q: Seriously, why not JS on server-side?

A: It was a cheap-shot, but Javascript is an enormously inefficient use of a computer. One argument against it is from the perspective of climate-change.

Data Centres, globally, put more CO2 into the atmosphere than commercial aviation. Something between 7 and 10% of global CO2 production. The kind of systems that I am describing are something like four or more orders of magnitude faster than most conventional Javascript systems.

If I am over-exaggerating, and we could improve the performance by a single order of magnitude we could reduce global CO2 emissions by 9%!!!

We tend not think about software in these terms, but perhaps we should!

I cannot think of any sphere of human activity that tolerates similar levels of inefficiency as software.

Q: How to measure the impact of Eventual Consistency on asynchronous Event-Driven Systems?

A: The term “eventual” is confusing, we are talking about computer-speeds here. Eventual usually, under most circumstances, means faster than any human can detect. So in most cases, the eventuality of the system doesn’t matter at the human scale. Where the system slows for some reason, then you need to be able to cope with the fact that the data is not in-step, but that is simply a reflection of the truth of ANY distributed system. So the trade-off is ALWAYS between slower communications with consistency or faster communications with eventual consistency. The overhead for consistency is considerable, but it is ALWAYS considerable, even in sync-systems.

Posted in Uncategorized | Tagged | Leave a comment

Continuous Compliance

I work as an independent consultant advising organizations on how to improve their software development practice. This is a very broad theme. It touches on every aspect of software development from design, coding and testing, to organizational structure, leadership and culture.

My work is structured, perhaps unsurprisingly, around Continuous Delivery (CD). I believe that CD is important for a variety of reasons. It is an approach grounded in the ideas of “Lean Thinking”, it is based on the application of the Scientific Method to software development. It is driven through a rapid, high-quality, iterative, feedback-guided approach to everything that we do, giving us deeper insight into our work, our products and our customers.

All of this is powerful in its impact, but there is another dimension that matters a lot in certain industry sectors.

The majority of my clients work in regulated industries, Finance, Health-care, Gambling, Telecoms, Transport of different kinds and others.

My own background, as a developer and technical leader, was, in the later part of my career, in Finance – Exchanges and Trading Systems. Also highly regulated.

Nevertheless, when describing the Continuous Delivery approach to people I am regularly asked, “Yes, that all sounds very good, but it can’t possibly work in a regulated environment can it?”.

I have come to the opposite conclusion. I believe that CD is an essential component of ANY regulated approach. That is, I believe that it is not possible to implement a genuinely compliant, regulated system in the absence of CD!

Now, that is a strong statement, so what am I talking about?

What are the goals of Regulatory Compliance?

All of the regulatory regimes that I have seen are, in essence, focussed on two things:

1) Trying to encourage a professional, high-quality, safe approach to making change.

2) Providing an audit-trail to allow for some degree of oversight and problem-finding after a failure.

There is a third thing, but it is really secondary compared to these two. The third thing is that we need to be able to demonstrate the safety, quality and professionalism and our ability to work in a traceable (audited) way to regulators and auditors.

How does CD Help?

I believe that the highest quality approach that we know of for creating software of any kind is a disciplined approach to CD. The evidence is on my side https://amzn.to/2P2aHjv

So if our regulators require a professional, high-quality, safe approach to making change, the evidence says that they should be demanding CD (and structuring their regulations to encourage it!).

One of the core ideas in CD is the concept of the “Deployment Pipeline”, a channel through which all change destined for production flows. A Deployment Pipeline is used to validate, and reject changes that don’t look good enough. It is a platform, an organizing concept, and a falsification mechanism, for production-destined change. It is also the perfect vehicle for compliance.

All production change flows through the Deployment Pipeline. This means that, almost as a side-effect, we have access to all of the information associated with any change. That means that we can create a definitive, authoritative audit-trail.

(See links at end for more info on Deployment Pipelines & CD in general)

Figure 1 shows a diagram of an example Deployment Pipeline. Remember, there is no other route to production for any change.

Figure 1 – Example Continuous Delivery Deployment Pipeline

If we tie together our requirement-management systems with our Version Control System (VCS), through something as simple as a commit message tagged with the story, or bug, ID that this commit is associated with, then we have complete traceability. We can tell the story of any change from end-to-end.

We can answer any of the following questions (and many more):

  • “Who captured the need for this change?”
  • “Who wrote the tests?”
  • “Who committed changes associated with this piece of work?”
  • “Which tests were run?”
  • “This change was rejected, what failed to reject the change?”
  • “Who was involved with any manual testing?”
  • “Who approved the release into production?”
  • “Which version of the OS, Database, programming language, etc was deployed and used?”
  • “Which version of the deployment script/tooling was used?”

All of this information is available as a side-effect of building a Deployment Pipeline. In fact it is quite hard to imagine a Pipeline that doesn’t give you access to this information. I sometimes describe one of the important properties of Deployment Pipelines as “providing a key’ed, search-space for all of the information associated with any production change.”. This is Gold for people working in compliance, regulation and audit.

The Deployment Pipeline, when properly implemented, is in the perfect place to act as a platform for regulatory functions and data-collection. If we can mine this Gold, if we can identify the needs of the people working in these areas, we can implement behaviors, in the Pipeline, to support and enhance their efforts.

Here are a few examples from my own, real-world, experience…

  • Generate an automatic audit-trail for all production change.
  • Implement access-control to the Pipeline so that we can audit who did what.
  • Implement “compliance-gates” to automate rules
    e.g. “We require sign-off for release”:
    Solution: Use access-control credentials (and people’s roles) to automate “sign-offs”
  • Reject any commit that fails any test.
    (Most regulators *love* this idea when you explain it to them!)
  • In an emergency we may need to break a rule
    Solution: Allow manual override of rules
    e.g. “Reject any commit that fails a test”, but audit the decision and who made it.
    (Regulators love that too. They recognize that bad things sometimes happen, but want to see the decision-making)

What Does It Take?

Assuming that you have a working Deployment Pipeline, creating one is outside the scope of this article (see links below), the first practical step to implement “Continuous Compliance” is the one I have already mentioned. Connect your Pipeline, via commit messages, to your requirements system!

Use the IDs from Jira or Trello (or whatever) and tag every commit with a Bug or Story ID.

That should give you a key-based system that joins all of the information that you collect together and makes it searchable (and so amenable to automation, reporting, and tool-building).

The next step is to add access-control to Pipeline tools so that you can track human decision-making.

Continuous Delivery is defined as “working so that your software is always in a releasable state”. This does not eliminate the need for human decision-making. Where applicable and appropriate, capture the outcome of human decisions via the Pipeline tools.

The “Lean” part of CD means that we are trying reduce the work associated with the process to a minimum. We want to eliminate “waste” wherever we find it, and so we need to be smart about the things that we do and maximize their value.

For example, regulation often says that we need to document changes to our production systems. I agree! However, I don’t want to waste my, and my teams’, time writing documents that will only ever be read by regulators. Instead I would like to find things that I must do to create useful software and do them in a way that allows me to use them for other purposes, like informing regulators. One way to think about this is we are trying to achieve regulatory compliance as a side-effect of how we work.

In order to design and develop software I must have some idea of what I am trying to achieve (a requirement of some kind), I must work to fulfil that need (write code of some kind) and I must check that the need is fulfilled (a test of some kind).

What if I could do only these things, but do them in a way that allowed me to use the information that I generate for more than only these things?

If my requirements were defined in a way that documented changes to the behavior of the system and why they are useful (sounds a bit like “User Stories” doesn’t it?). If I adopted some simple conventions in the way that I captured and organized this information, to aid automation, then I have descriptions of changes that would contribute to, and make sense as, release notes. So I will be able to automate some of the documentation associated with a release, in a regulated environment.

If I drive these requirements from examples of desirable behaviors of my system, they define these desirable behaviors in a way that allows me to automate the examples and use them as specifications for the behavior of the system – Executable Specifications. These automated specifications (aka “Acceptance Tests”) can be used to structure the development activities. At the start of each new piece of work we begin by creating our “Executable Specifications” then we practice TDD, in fine-grain form, to incrementally evolve a solution to the meet these specifications.

These activities, combined, give us an extremely high-quality approach to developing solutions. If we record them they also provide us the “whys”, “whens” and “whos” that allow us to tell the story of the work done.

We can “grow” the system via many small, low-risk, audited, commits. Each change is traceable, audited and of very high quality. Each change is small and simple, verified by Continuous Integration, and so safer.

We can make a separate decision of when to release each, tiny, change into production and we will have an automated audit-trail of all of the actions and decisions that contributed to that release.

This approach is demonstrably, measurably, higher-quality and safer than any other that we know of.

All changes, whatever their nature, are treated in the same way. There are no special cases for bug-fixes or emergency fixes. No special “back-doors” to production. All production change flows through the same process and mechanism and so is traceable and verified to the level of testing that we decide to apply in our Pipeline.

How else could we minimize work?

Regulated industries often require various gate-keeping steps, sign-offs for example. Unfortunately the evidence is against these as a successful approach to improving quality and safety. In fact, the more complex approaches to gatekeeping, like “Change Approval Boards” are negatively correlated with software quality! The more complex the compliance mechanisms around change, the lower the quality of the software. (See Page 49 of the 2019 “State of DevOps Report”).

Nevertheless, most regulatory frameworks were designed before this kind of analysis was available. Most regulatory frameworks were built on an assumption of a Waterfall style, gated process. So if we want to achieve “Continuous Compliance” in a real-world environment, we must cope with regulation that is not quite the right shape for this very different paradigm. That is OK, because this new paradigm is much more flexible than the old.

Over time I hope, and expect, regulation to adapt, to catch-up to these more effective ways of working. It is, after all, a better way to fulfil the underlying intent of any regulatory mechanism for software.

I believe that there have been some small moves, at the level of interpretation of regulation, in some industries. Over time I expect that the regulations themselves will change to assume, or encourage, CD, rather than only allow interpretations that permit it.

I have had success with regulators, and people working in compliance organizations, in several different industries by engaging with them and demonstrating that what I am trying to achieve is in their interest. By bringing them on-board with the change, and helping to solve the real problems that people in these roles regularly face, you can not just get approval to interpret the regulations in ways more amenable to CD, but you can gain allies who will work to help you.

Here are a few techniques that I have used and advised my clients to adopt:

Example: When you are being audited, assign developers to help the auditors. Their job is to help, to give the auditors all of the information that they need, but also to observe what is going on and to treat the audit as a chance to learn what the auditors really need. This is a requirements-gathering opportunity! Then take what you have learned and implement new checks, in your Pipeline to stop errors sneaking through. Improve the audit-trail so that a future auditor can more easily see what happened. Create new reports on your Pipeline-search-space to tell the story in a way that meets the needs of the auditor.

Example: If your regulatory framework requires a code-review, how do you do that best and keep-up the pace of work that makes CI (and CD) work best? I my experience Pair-Programming, coupled with regular pair-rotation, gives all of the benefits of code-review, and more, and is acceptable to regulators to demonstrate that the code has been reviewed and that there is some independent oversight/verification of change.

Example: Your regulatory framework requires sign-off from a developer, operations person and tester before release. Use the access-control tools in your Pipeline to enforce this policy, and audit it.

Example: Regulation requires a separation of roles. Devs can’t access production, Ops can’t access Dev environments. Fine, I prefer to take it a step further. “No-one can access production!”. All production access is through automation, e.g. Infrastructure as Code, automated deployment, automated monitoring etc.

These are a few techniques that I have seen applied, and applied myself, in regulated environments. My experience, across the board, has been that regulators prefer these approaches, once they come to understand them, because they provide a better quality experience all around.

What Is Not Covered by CD?

Some regulatory regimes require significant documentation describing the architecture and design of the system as well as describing any “significant change” to its design.

I believe that these are another hang-over from Waterfall thinking. I think that the intent is that by asking for such documentation regulators are attempting to encourage people to think more carefully about change and to approach it with more caution.

I believe that a sophisticated approach to test-automation is a better approach. Nevertheless, current regulation usually requires documentation of the system architecture and significant changes to it.

I tend to approach this part of the problem in more conventional ways. Write the architecture documents as you always have, except try to ensure that the detail is not too precise. What you need to achieve is an accurate, but slightly vague description of your system. For example, describe the principles services, perhaps how they communicate. The main information flows and stores, but don’t go into the detail of implementation, code or schemas. Leave room for the system to evolve over time, but still meet the architectural description.

Try to agree, with your regulators, what “significant change” entails, what are they nervous of? They probably won’t tell you. Or at least they won’t be very definite. It is not their job. However, what you are looking for is how to ensure that the massive flood of changes that you want to apply (in a CD context) don’t count as significant.

Even these tiny, frequent changes will be audited, documented (by tests), reviewed (by pair-programming) and have things like (autogenerated) release-notes associated with them, but they won’t count as “significant” in the sense of requiring new documentation (beyond the automated tests and requirements).

Again, I hope, and expect, that regulation will change over time, to allow for these more effective forms of documentation to be used instead of Prose doing a poor job of describing some kind of design intent.

I am not against documentation that is useful in helping people to understand systems. I like to create and maintain a high-level description of the system architecture that aids people in navigating their way around the system. I am just not sure how this helps the goals of regulation and I don’t want to be forced to document, in prose, every change to my production system – that is the role of automated tests, which do a better job, because they are a more accurate description of the behavior of the code (they must be because they passed) and they are necessary for other reasons, beyond regulatory compliance, and so I am going to create them anyway.


I have worked in regulated industries before and after I learned how do practice Continuous Delivery. All of my non-CD experience, including what I have, over several decades in consultancy roles, observed in client organizations, leads me to the belief that in the absence of CD, regulatory compliance is practiced “more by the breach than the observance”. That is to say, most regulated organizations usually have a long list of “compliance-breaches” that they, one day, hope to catch-up on.

The usual organizational responses that I have observed are to either, try to slow the pace of change to gain control (this is counter productive because slow, heavy-weight process are negatively correlated with quality) or they try and skate-close-to-the-edge and keep working and do the bare-minimum to keep regulators happy. Neither of these is a desirable, or a high-quality outcome!

I have seen the CD approach remove compliance as an obstacle!

I have seen organizations move from taking weeks, sometimes months, to ensure that releases into production were “compliant” with regulation (and never making it), to being able to generate genuinely compliant release candidates multiple times per day, along with all of the documentation and approvals.

In fact, when working at LMAX on creating one of the highest performance financial exchanges in the world, it was more difficult for us to release a change that wasn’t compliant, than one that was. Our Deployment Pipeline enforced compliance, and so the only way we could avoid that was to break, or subvert the Pipeline.

So when I say “I believe that CD is an essential component of ANY regulated approach. That is, I believe that it is not possible to implement a genuinely compliant, regulated system in the absence of CD!” I really do mean it.

More Info

Continuous Delivery (Book):

Rationale for CD (Talk):

Optimizing Deployment Pipelines (Talk):

Accelerate – The Science of Lean Software and DevOps (Book): https://amzn.to/2P2aHjv

Adopting CD at Siemens Healthcare (Article): https://www.infoq.com/articles/continuous-delivery-teamplay/

Posted in Agile Development, Continuous Delivery, DevOps, Effective Practices, Engineering Discipline, Pair Programming | Tagged , , , , | 12 Comments