There are a number of challenges in successfully maintaining a system, especially a big business-critical system, a system that has been around for a while, that represents the work of many people over many years.
There are the technical challenges in safely managing change, minimizing technical and operational risk, recognizing and containing technical debt, understanding and working with code that you did not have a hand in writing (and whose author has long left the company), testing and installing upgrades to ensure that the technology stack does not become obsolete, and keeping up with the changing security threat landscape. Setting and maintaining a high level of quality, continuously reviewing and refactoring the design and implementation, ensuring that you don’t let entropy set in and fall into the trap of having to re-write the system and lose all of the work that has already been done – what Joel Spolsky calls “the single worst strategic mistake that any software company can make”.
And there are challenges in managing the team, keeping the strongest possible team together for as long as possible, sustaining momentum and commitment and engagement over time, making the work interesting and worthwhile and important and fun.
As I described in a previous post Everything I needed to know about Maintenance, I learned how a company could be successful over a long period of time maintaining and supporting the same software, continuously focusing on delivering new value to customers and pushing for technical excellence, all done by a small senior team following an incremental development approach. I have tried to apply these ideas at my current firm, where we have been supporting and maintaining a business-critical financial application for more than 3 years now.
Many of these same ideas and practices, and more, are captured in today’s agile development methods: XP, in particular, is an engineering and management framework that is especially well-suited for software maintenance. XP provides a foundation for maintenance through:
Frequent, small releases to production: always be delivering, creating a sense of accomplishment for the team and continuously delivering value to the customer, responding to change, and learning from feedback.
A constant focus on quality: “No defect is acceptable, each is an opportunity for the team to learn and improve”.
Making change safe through automated developer testing, building a regression safety net.
Continuous integration: always knowing that the code works.
Close contact with the customer (the business side and operations both).
Lightweight, continuous communications, with just enough documentation.
Refactoring: constantly improving code and design – critical in recognizing and reducing technical debt.
Slack and sustainable pace: giving people time to do their work, preventing burnout, and ensuring that you can fit in other work demands, especially important in maintenance because you can’t control requirements for support and firefighting.
Collective code ownership, skilling-up and skilling-out the team: in maintenance, by necessity sooner or later just about everyone is going to touch just about every piece of code.
Just enough design, breaking problems down and finding the simplest possible solution – this is a controversial aspect of XP when building enterprise systems from scratch, but it is the right approach in maintenance where you are dealing with incremental problems and the constraints of existing architecture and technology.
Transparency and respect – creating and maintaining an open, trusting and respectful environment within the team, with operations and with the customer.
Now, in our case, I don’t mean 100%, full-on, hardcore, literally by-the-book XP: I mean a dialed-down implementation of Extreme Programming, a less-Extreme Extreme Programming. As Kent Beck states in Extreme Programming Explained: Embrace Change
"The values, principles and practices are there to provide guidance, challenge and accountability… The goal is successful and satisfying relationships and projects, not membership in the ‘XP Club’."We have followed his guidance, to
“Experiment with XP using these practices as your hypotheses”and we have adapted the ideas and principles of XP to our situation, our experience and way of working.
XP, by Kent Beck’s admission, is an integrated set of good engineering and management practices, dialed up to 10. In adapting and integrating the practices in XP for maintenance, we have dialed back in specific areas:
Pair Programming
All of our code changes are reviewed before being released. The idea behind pairing is that if code reviews are good, continuous code reviews are better. Like my friend Pete McBreen in his post Still Questioning Extreme Programming I think that pairing makes sense in specific cases, especially in troubleshooting and helping people new to the team, but it is not necessary all the time – our developers pair up when it makes sense.
Test First Development
We rely extensively on automated testing, testing early and testing often. Our developers choose to follow TFD or TDD practices, or write tests after the code is written, as they see fit, following the principle that
“code and testing can be written in either order… write tests in advance when possible”.We put a lot of emphasis on testing, on automating developer testing – this is another area where XP is in agreement:
“in XP testing is as important as programming”.This is a critical idea in maintenance, where even small changes to an existing system can have significant consequences.
Unlike some “pure” XP teams, we don’t rely only on the automated test suites: we have a senior team of testing specialists who conduct exploratory and destructive testing, system and integration testing, operational testing; we schedule regular application penetration tests; we run system trials and simulations with the team involved in interactive, loosely structured “war games”; and we take advantage of technologies such as static analysis in our Continuous Integration environment. Our automated regression testing safety net is an important asset, but we find valuable, higher-risk problems in exploratory testing, reviews and in the war games simulations.
Incremental Delivery
We follow a model closer to the first edition description of XP, releasing to production every 2-3 weeks rather than holding to a 1-week cycle – extremely short, rapid cycling isn’t sustainable, and doesn’t allow enough time for the reviews and other checkpoints that we have found necessary and valuable.
The Development/Maintenance Problem
In many organizations there are “developers” and “maintainers”: a team of hotshots is hired to design and build the initial system, and once the “hard work” is done, they hand it off to the maintenance and support or “sustained engineering” crew: kids and old-timers and other misfits who don’t have what it takes (yet, or any more) to do “real software development”; or the work of maintenance and support is offshored to a team in India or Eastern Europe to save costs.
But this is not the case in XP, as Pete McBreen points out in Questioning Extreme Programming
"The interesting thing about XP, however, is that it assumes that applications are never really going to be handed off to a separate maintenance team. The assumption is that after each incremental release, the customer will want more functionality and keep funding the development team… As such, there is never a need to hand the application over to a maintenance team; the original development team can continue to support the application indefinitely”.This idea of keeping the team together, preserving the knowledge that has been built up over time, the team’s deep and shared understanding of the domain, their proven ability to deliver, is critical and fundamental. This is the real intellectual property, the real value that you have created. Erich Brechner at Microsoft explores this in his post on Sustained engineering idiocy where he shows that keeping the development team engaged in maintenance and support builds accountability, creates a deeper understanding of the system and of the customer’s needs, and informs future development, using the feedback from support to improve the quality and reliability of future releases or future products. Eric provides some useful ideas on how to balance the requirements of maintenance and support against future development: structuring your team around a core with an evaluation team that investigates and triages issues, and using backlog management to feed fixes into the incremental development schedule.
Most of the work that will be done on a piece of software, up to 70% in some studies, is done during maintenance. It just makes sense to have your best people working on what is most important: protecting the investment that you and your customer made in building the software in the first place; supporting the ongoing business operations of your customers; and ensuring that you and your customers will continue to succeed in the future.