DevOps can help reduce technical debt in some fundamental ways.
Continuous Delivery/Deployment
First, building a Continuous Delivery/Deployment pipeline, automating the work of migration and deployment, will force you to clean up inconsistencies and holes in configuration and code deployment, and inconsistencies between development, test and production environments.
And automated Continuous Delivery and Infrastructure as Code gets rid of dangerous one-of-a-kind snowflakes and configuration drift caused by making configuration changes and applying patches manually over time. Which makes systems easier to setup and manage, and reduces the risk of an un-patched system becoming the target of a security attack or the cause of an operational problem.
A CD pipeline also makes it easier, cheaper and faster to pay down other kinds of technical debt. With Continuous Delivery/Deployment, you can test and push out patches and refactoring changes and platform upgrades faster and with more confidence.
Positive Feedback
The Lean feedback cycle and Just-in-Time prioritization in DevOps ensures that you’re working on whatever is most important to the business. This means that bugs and usability issues and security vulnerabilities don’t have to wait until after the next feature release to get fixed. Instead, problems that impact operations or the users will get fixed immediately.
Teams that do Blameless Post-Mortems and Root Cause(s) Analysis when problems come up will go even further, and fix problems at the source and improve in fundamental and important ways.
But there’s a negative side to DevOps that can add to technical debt costs.
Erosive Change
Michael Feathers’ research has shown that constant, iterative change is erosive: the same code gets changed over and over, the same classes and methods become bloated (because it is naturally easier to add code to an existing method or a method to an existing class), structure breaks down and the design is eventually lost.
DevOps can make this even worse.
DevOps and Continuous Delivery/Deployment involves pushing out lots of small changes, running experiments and iteratively tuning features and the user experience based on continuous feedback from production use.
Many DevOps teams work directly on the code mainline, “branching in code” to “dark launch” code changes, while code is still being developed, using conditional logic and flags to skip over sections of code at run-time. This can make the code hard to understand, and potentially dangerous: if a feature toggle is turned on before the code is ready, bad things can happen.
Feature flags are also used to run A/B experiments and control risk on release, by rolling out a change incrementally to a few users to start. But the longer that feature flags are left in the code, the harder it is to understand and change.
There is a lot of housekeeping that needs to be done in DevOps: upgrading the CD pipeline and making sure that all of the tests are working; maintaining Puppet or Chef (or whatever configuration management tool you are using) recipes; disciplined, day-to-day refactoring; keeping track of features and options and cleaning them up when they are no longer needed, getting rid of dead code and trying to keep the code as simple as possible.
Microservices and Technology Choices
Microservices are a popular architectural approach for DevOps teams.
This is because loosely-coupled Microservices are easier for individual teams to independently deploy, change, refactor or even replace.
And a Microservices-based approach provides developers with more freedom when deciding on language or technology stack: teams don’t necessarily have to work the same way, they can choose the right tool for the job, as long as they support an API contract for the rest of the system.
In the short term there are obvious advantages to giving teams more freedom in making technology choices. They can deliver code faster, quickly try out prototypes, and teams get a chance to experiment and learn about different technologies and languages.
But Microservices “are not a free lunch”. As you add more services, system testing costs and complexity increase. Debugging and problem solving gets harder. And as more teams choose different languages and frameworks, it’s harder to track vulnerabilities, harder to operate, and harder for people to switch between teams. Code gets duplicated because teams want to minimize coupling and it is difficult or impossible to share libraries in a polyglot environment. Data is often duplicated between services for the same reason, and data inconsistencies creep in over time.
Negative Feedback
There is a potentially negative side to the Lean delivery feedback cycle too.
Constantly responding to production feedback, always working on what’s most immediately important to the organization, doesn’t leave much space or time to consider bigger, longer-term technical issues, and to work on paying off deeper architectural and technical design debt that result from poor early decisions or incorrect assumptions.
Smaller, more immediate problems get fixed fast in DevOps. Bugs that matter to operations and the users can get fixed right away instead of waiting until all the features are done, and patches and upgrades to the run-time can be pushed out more often. Which means that you can pay off a lot of debt before costs start to compound.
But behind-the-scenes, strategic debt will continue to add up. Nothing’s broke, so you don’t have to fix anything right away. And you can’t refactor your way out of it either, at least not easily. So you end up living with a poor design or an aging technology platform, slowly slowing down your ability to respond to changes, to come up with new solutions. Or forcing you to continue filling in security holes as they come up, or scrambling to scale as load increases.
DevOps can reduce technical debt. But only if you work in a highly disciplined way. And only if you raise your head up from tactical optimization to deal with bigger, more strategic issues before they become real problems.
Thanks for the well-balanced view. It seems that a mix of styles would be a good recipe for success. Perhaps a strategic order and a tactical order working in concert would work best.
ReplyDeleteI really enjoyed the article, and as philn5d said - it's nice with a balanced piece talking realistically about pros and cons.
ReplyDeleteReally good post. It sounds like you have to have the right managers in place to ensure appropriate discipline and to continuously be managing the bigger picture.
ReplyDeleteRe: Mcroservices. Interesting post from Stefan Tilkov on Martin Fowler's site
ReplyDeletehttp://martinfowler.com/articles/dont-start-monolith.html
"f you decide to build things using a microservices approach, you need to be aware that while it will be a lot easier to make localized decisions in each individual part, it will be much harder to change the very boundaries that enable this. Refactoring in the small becomes easier, refactoring in the large becomes much harder."
I disagree.
ReplyDeleteMost often, tech debt doesn't get paid right away because it's a significant portion of effort, non-technical stakeholders have a say too heavy in setting priorities and an understanding too shallow about what tech debt actually means, in terms of cost. They simply get used to the idea that decreasing velocity is a fact of life, and don't understand that there's something that can be done about it.
With microservices, things change. For many of the services, there are no non-technical stakeholders. The team maintaining a microservice is much in a position of a small business inside another business - they have to sell their microservice to service consumers - most of which are teams of developers, not product owners, sales or marketing people. If need be, you can rewrite the whole service during a single sprint. Thus, even if overall the effort to pay tech debt stays the same or even increases, it is a lot more granular, and therefore easier to pay in small, low risk chunks. Therefore, I think that paying tech debt is a lot simpler, even if not necessarily cheaper, with microservices.
But it doesn't in fact make the problem of code rot go away. It just transfers it to another level. Instead of hard to maintain monoliths you get complex service infrastructures, with many services on the brink of extinction, because their clients have migrated to new services providing the same functionality in a better way.
But at least it makes things more resilient. A large monolith with lots of tech debt is like an elephant with several diseased organs. A microservices infrastructure with obsolete services is like a herd with some old and sick animals. The sick organs will kill the entire elephant, whereas you can kill the obsolete microservices without killing the entire herd.