Thursday, March 22, 2012

Is Copy and Paste Programming really a problem?

Copy and Paste Programming – taking a copy of existing code in your project and repurposing it – violates coding best practices like Don’t Repeat Yourself (DRY). It’s one of the most cited examples of technical debt, a lazy way of working, sloppy and short-sighted: an antipattern that adds to the long term cost of keeping a code base alive.

But it’s also a natural way to get stuff done – find something that already works, something that looks close to what you want to do, take a copy and use it as a starting point. Almost everybody has done in at some point. This is because there are times when copy and paste programming is not only convenient, but it might also be the right thing to do.

First of all, let’s be clear what I mean by copy and paste. This is not copying code examples off of the Internet, a practice that comes with its own advantages and problems. By copy and paste I mean when programmers take a shortcut in reuse – when they need to solve a problem that is similar to another problem in the system, they’ll start by taking a copy of existing code and changing what they need to.

Early in design and development, copy and paste programming has no real advantage. The code and design are still plastic, this is your chance to come up with the right set of abstractions, routines and libraries to do what the system needs to do. And there’s not a lot to copy from anyways. It’s late in development when you already have a lot of code in place, and especially when you are maintaining large, long-lived systems, that the copy and paste argument gets much more complicated.

Why Copy and Paste?


Programmers copy and paste because it saves time. First, you have a starting point, code that you know works. All you have to do is figure out what needs to be changed or added. You can focus on the problem you are trying to solve, on what is different, and you only need to understand what you are going to actually use. You are more free to iterate and make changes to fit the problem in front of you – you can cleanup code when you need to, delete code that you don’t need. All of this is important, because you may not know what you will need to keep, what you need to change, and what you don’t need at all until you are deeper into solving the problem.

Copy and paste programming also reduces risk. If you have to go back and change and extend existing code to do what it does today as well as to solve your new problem, you run the risk of breaking something that is already working. It is usually safer and less expensive (in the short term at least) to take a copy and work from there.

What if you are building a new B2B customer interface that will be used by a new set of customers? It probably makes sense to take an existing interface as a starting point, reuse the scaffolding and plumbing and wiring at least and as much of the the business code as makes sense, and then see what you need to change. In the end, there will be common code used by both interfaces (after all, that’s why you are taking a copy), but it could take a while before you know what this code is.

Finding a common design, the right abstractions and variations to support different implementations and to handle exceptions can be difficult and time consuming. You may end up with code that is harder to understand and harder to maintain and change in the future – because the original design didn’t anticipate the different exceptions and extensions, and refactoring can only take you so far. You may need a new design and implementation.

Changing the existing code, refactoring or rewriting some of it to be general-purpose, shared and extendable, will add cost and risk to the work in front of you. You can’t afford to create problems for existing customers and partners just because you want to bring some new customers online. You’ll need to be extra careful, and you’ll have to understand not only the details of what you are trying to do now (the new interface), but all of the details of the existing interface, its behavior and assumptions, so that you can preserve all of it.

It’s naïve to think that all of this behavior will be captured in your automated tests – assuming that you have a good set of automated tests. You’ll need to go back and redo integration testing on the existing interface. Getting customers and partners who may have already spent weeks or months to test the software to retest it is difficult and expensive. They (justifiably) won’t see the need to go through this time and expense because what they have is already working fine.

Copying and pasting now, and making a plan to come back later to refactor or even redesign if necessary towards a common solution, is the right approach here.

When Copy and Paste makes sense


In Making Software’s chapter on “Copy-Paste as a Principled Engineering Tool”, Michael Godfrey and Cory Kapser explore the costs of copy and paste programming, and the cases where copy and paste make sense:
  1. Forking – purposely creating variants for hardware or platform variation, or for exploratory reasons.
  2. Templating –some languages don’t support libraries and shared functions well so it may be necessary to copy and paste to share code. Somewhere back in the beginning of time, the first COBOL programmer wrote a complete COBOL program – everybody else after that copied and pasted from each other.
  3. Customizing – creating temporary workarounds – as long as it is temporary.
  4. Microsoft’s practice of “clone and own” to solve problems in big development organizations. One team takes code from another group and customizes it or adapts it to their own purposes – now they own their copy. This is a common approach with open source code that is used as a foundation and needs to be extended to solve a proprietary problem.

When Copy and Paste becomes a Problem


When to copy and paste, and how much of a problem it will become over time, depends on a few important factors.

First, the quality of what you are copying – how understandable the code is, how stable it is, how many bugs it has in it. You don’t want to start off by inheriting somebody else’s problems.

How many copies have been made. A common rule of thumb from Fowler and Beck`s Refactoring book is “three strikes and you refactor”. This rule comes from recognizing that by making a copy of something that is already working and changing the copy, you’ve created a small maintenance problem. It may not be clear what this maintenance problem is yet or how best to clean it up, because only two cases are not always enough to understand what is common and what is special.

But the more copies that you make, the more of a maintenance problem that you create – the cost of making changes and fixes to multiple copies, and the risk of missing making a change or fix to all of the copies increases. By the time that you make a third copy, you should be able to see patterns – what’s common between the code, and what isn’t. And if you have to do something in three similar but different ways, there is a good chance that there will be a fourth implementation, and a fifth. By the third time, it’s worthwhile to go back and restructure the code and come up with a more general-purpose solution.

How often you have to change the copied code and keep it in sync – particularly, how often you have to change or fix the same code in more than one place.

How well you know the code, do you know that there are clones and where to find them? How long it takes to find the copies, and how sure you are that you found them all. Tools can help with this. Source code analysis tools like clone detectors can help you find copy and paste code – outright copies and code that is not the same but similar (fuzzier matching with fuzzier results). Copied code is often fiddled with over time by different programmers, which makes it harder for tools to find all of the copies. Some programmers recommend leaving comments as markers in the code when you make a copy, highlighting where the copy was taken from, so that a maintenance programmer in the future making a fix will know to look for and check the other code.

Copy and Paste programming doesn’t come for free. But like a lot of other ideas and practices in software development, copy and paste programming isn’t right or wrong. It’s a tool that can be used properly, or abused.

Brian Foote, one of the people who first recognized the Big Ball of Mud problem in software design, says that copy and paste programming is the one form of reuse that programmers actually follow, because it works.

It’s important to recognize this. If we’re going to Copy and Paste, let's do a responsible job of it.

No comments:

Site Meter