Tuesday, September 18, 2012

What are you doing for Application Security?

The SANS Institute is surveying companies to understand what tools and practices they are using to build security into applications, what their greatest risks and challenges are and how they are managing them. You can find the survey here.

Technical Debt – when do you have to pay it off?

There are 2 times to think about technical debt:

  1. When you are building a system and making trade-off decisions between what can be done now and what will need to be done “sometime in the future”.
  2. “Sometime in the future”, when have to deal with those decisions, when you need to pay off that debt.

What happens when “sometime in the future” is now? How much debt is too much to carry? When do you have to pay if off?

How much debt is too much?

Every system carries some debt. There is always code that isn’t as clean or clear as it should be. Methods and classes that are too big. Third party libraries that have fallen out of date. Changes that you started in order to solve problems that went away. Design and technology choices that you regret making and would do differently if you had the chance.

But how much is this really slowing the team? How much is this really costing you? You can try to measure if technical debt is increasing over time by looking at your code base. Code complexity is one factor. There is a simple relationship between complexity and how hard it is to maintain code, looking at the chance of introducing a regression:

Complexity % Chance of bad fix
1-10 5%
20-30 20%
>50 40%
100 60%

Complexity by itself isn’t enough. Some code is essentially complex, or accidentally complex but it doesn’t need to be changed, so it doesn’t add to the real cost of development. Tools like Sonar look at complexity as well as other variables to assess the technical risk of a code base:

Cost to fix duplications + cost to fix style violations + cost to comment public APIs + cost to fix uncovered complexity (complex code that has less than 80% automated code coverage) + cost to bring complexity below threshold (splitting methods and classes)

This gives you some idea of technical debt costs that you can track over time or compare between systems.But when do you have to fix technical debt? When do you cross the line?

Deciding on whether you need to pay off debt depends on two factors:

  1. Safety / risk. Is the code too difficult or too dangerous to change? Does it have too many bugs? Capers Jones says that every system, especially big systems, has a small number of routines where bugs concentrate (the 20% of code that has 80% of problems), and that cleaning up or rewriting this code is the most important thing that you can do to improve reliability as well as to reduce long the term costs of running a system.
  2. Cost – real evidence that it is getting more expensive to make changes over time, because you’ve taken on too much debt. Is it taking longer to make changes or to fix bugs because the code is too hard to understand, or because it is too hard to change, or too hard to test?

While apparently for some teams it’s obvious that if you are slowing down it must be because of technical debt, I don’t believe it is that simple.

There are lots of reasons for a team to slow down over time, as systems get bigger and older, reasons that don’t have anything to do with technical debt. As systems get bigger and are used by more customers in more ways, with more features and customization, the code will take longer to understand, changes will take longer to test, you will have more operational dependencies, more things to worry about and more things that could break, more constraints on what you can do and what risks you can take on. All of this has to slow you down.

How do you know that it is technical risk that is slowing you down?

A team will slow down when people have to spend too much time debugging and fixing things – especially fixing things in the same part of the system, or fixing the same things in different parts of the system. When you see the same bugs or the same kind of bugs happening over and over, you know that you have a debt problem. When you start to see more problems in production, especially problems caused by regressions or manual mistakes, you know that you are over your head in debt. When you see maintenance and support costs going up – when everyone is spending more time on upgrades and bug fixing and tuning than they are on adding new features, you're running in circles.

The 80:20 rule for paying off Technical Debt

Without careful attention, all code will get worse over time, but whatever problems you do have are going to be worse in some places than others. When it comes to paying back debt, what you care about most are the hot spots:

  • Code that is complex and
  • Code that changes a lot and
  • Code that is hard to test and
  • Code that has a history of bugs and problems.

You can identify these problem areas by reviewing check-in history, mining your version control system (the work that Michael Feathers is doing on this is really cool) and your bug database, through static analysis checks, and by talking with developers and testers.

This is the code that you have to focus on. This is where you get your best return on investment from paying down technical debt. Everything else is good hygiene – it doesn't hurt, but it won’t win the game either. If you’re going to pay down technical debt, pay it down smart.

Thursday, September 13, 2012

Can you get by without estimating? Should you try?

Estimating remains one of the hardest problems in software development. So hard in fact that more people lately are advocating that we shouldn’t bother estimating at all.

David Anderson, the man behind Kanban, says that we should stop estimating, and that estimates are a waste of time. In his case study about introducing Kanban ideas at Microsoft, one of the first steps that they took to improve a team’s productivity was to get them to stop estimating and start focusing instead on prioritizing work and getting the important work done.

Then you have experts like Ron Jeffries saying things like

I believe that most estimation is waste and that it is more common to use estimation as a replacement for proper steering, and to use it as a whip on the developers, than it is to use it for its only valid purpose in Release Planning, which is more like "decide whether to do this project" than "decide just how long this thing we just thought of is going to take, according to people who don't as yet understand it or know how they'll do it”

and

Estimation is clearly "waste". It's not software…If estimation IS doing you some good, maybe you should think about it as a kind of waste, and try to get rid of it.

And, from others on the “If you do bother estimating, there’s no point in putting a lot of effort into it” theme:

Spending effort beyond some minutes to make an estimate "less wrong" is wasted time. Spending effort calculating the delta between estimates and actuals is wasted time. Spending effort training, working and berating people to get "less wrong" estimates is wasted time and damaging to team performance.

In “Software estimation considered harmful?” Peter Seibel talks about a friend running a startup, who found that it was more important to keep people focused and motivated on delivering software as quickly as possible. He goes on to say

If the goal is simply to develop as much software as we can per unit time, estimates (and thus targets), may be a bad idea.

He bases this on a 1985 study in Peopleware which showed that programmers were more productive when working against their own estimates than estimates from somebody else, but that people were most productive on projects where no estimates were done at all.

Seibel then admits that maybe “estimates are needed to coordinate work with others” – so he looks at estimating as a “tool for communication”. But from this point of view, estimates are an expensive and inefficient way to communicate information that is of low-quality – because of the cone of uncertainty all estimates contain variability and error anyways.

What’s behind all of this?

Most of this thinking seems to come out of the current fashion of applying Lean to everything, treating anything that you do as potential waste and eliminating waste wherever you find it. It runs something like: Estimating takes time and slows you down. You can’t estimate perfectly anyways, so why bother trying?

A lot of this talk and examples focus on startups and other small-team environments where predictability isn’t as important as delivering. Where it’s more important to get something done than to know when everything will be done or how much it will cost.

Do you need to estimate or not?

I can accept that estimates aren’t always important in a startup – once you’ve convinced somebody to fund your work anyways.

If you’re firefighting, or in some kind of other emergency, there’s not much point in stopping and estimating either – when it doesn’t matter how much something costs, when all you care about is getting whatever it is that you have to do done as soon as possible.

Estimating isn’t always important in maintenance – the examples where Kanban is being followed without estimating are in maintenance teams. This is because most maintenance changes are small by definition - maintenance is usually considered to be fixing bugs and making changes that take less than 5 days to complete. In order to really know how long a change is going to take, you need to review the code to know what and where to make changes. This can take up to half of the total time of making the change – and if you’re already half way there, you might as well finish the job rather than stopping and estimating the rest of the work. Most of the time, a rule of thumb or placeholder is a good enough estimate.

In my job, we have an experienced development team that has been working on the same system for several years. Almost all of the people were involved in originally designing and coding the system and they all know it inside-out.

The development managers triage work as it comes in. They have a good enough feel for the system to recognize when something looks big or scary, when we need to get some people involved upfront and qualify what needs to get done, work up a design or a proof of concept before going further.

Most of the time, developers can look at what’s in front of them, and know what will fit in the time box and what won’t. That’s because they know the system and the domain and they usually understand what needs to be done right away – and if they don’t understand it, they know that right away too. The same goes for the testers – most of the time they have a good idea of how much work testing a change or fix will take, and whether they can take it on.

Sure sometimes people will make mistakes, and can’t get done what they thought they could and we have to delay something or back it out. But spending a little more time on analysis and estimating upfront probably wouldn't have changed this. It’s only when they get deep into a problem, when they’ve opened the patient up and there’s blood everywhere, it’s only then that they realize that the problem is a lot worse than they expected.

We’re not getting away without estimates. What we’re doing is taking advantage of the team’s experience and knowledge to make decisions quickly and efficiently, without unnecessary formality.

This doesn't scale of course. It doesn’t work for large projects and programs with lots of inter-dependencies and interfaces, where a lot of people need to know when certain things will be ready. It doesn’t work for large teams where people don’t know the system, the platform, the domain or each other well enough to make good quick decisions. And it’s not good enough when something absolutely must be done by a drop dead date – hard industry deadlines and compliance mandates. In all these cases, you have to spend the time upfront to understand and estimate what needs to get done, and probably re-estimate again later as you understand the problem better. Sometimes you can get along without estimates. But don’t bet on it.

Tuesday, September 11, 2012

How to Cheat at Application Security

Developers need to know a lot in order to build secure applications. Some of this is good software engineering and defensive design and programming – using (safe) APIs properly, carefully checking for errors and exceptions, adding diagnostics and logging, and never trusting anything from outside of your code (including data and other people’s code). But there are also lots of technical details about security weaknesses and vulnerabilities in different architectures and platforms and technology-specific risks that you have to understand and that you have to make sure that you deal with properly. Even appsec specialists have trouble keeping up with all of it.

This is where OWASP’s Cheat Sheets come in. They provide a clear explanation of security problems, and tools and patterns and practical steps that you can follow to prevent them or solve them.

There are more than 30 cheat sheets available today, on everything from how to handle authentication in web apps to using HTML5 safely to what IOS developers should look out for when developing secure mobile apps.

Some of the cheat sheets are easy for developers to understand and use right away. For example, the cheat sheets on common security problems like SQL injection and CSRF explain what these vulnerabilities are, and what works and what doesn’t to protect from them. Simple and practical advice from people who know.

There are also cheat sheets on basic development problems and requirements that you might think that you already understand – things that seem straightforward, but that need to be done carefully and correctly to make sure that your system is secure. Cheat sheets on how to do logging securely and the right way to use parameterized queries (prepared statements) and how to properly implement a Forgot Password feature, and on Session Management. Make sure that you read the cheat sheet on Input Validation - there’s a lot more to doing it right than you think.

Then there are cheat sheets on harder, uglier technical problems like secure cryptographic storage or what you have to do to avoid XSS. XSS is so ugly that there is also a second cheat sheet that tries to explain the problem and solutions in a simpler way; and another cheat sheet just on DOM-based XSS prevention; and a technical cheat sheet on XSS filter evasion to help test for XSS vulnerabilities.

The OWASP Cheat Sheets are shortcuts that take you straight to the explanation of specific problems and how to solve them, checklists that you can follow without demanding that you understand everything about appsec. It’s OK. Go ahead and cheat.

Tuesday, September 4, 2012

Devops and Maintenance go together like Apple Pie and Ice Cream

One of the things I like about devops is that it takes on important but neglected problems in the full lifecycle of a system: making sure that the software is really ready to go into production, getting it into production, and keeping it running in production.

Most of what you read and hear about devops is in online startups – about getting to market faster and building tight feedback loops with Continuous Delivery and Continuous Deployment.

But devops is even more important in keeping systems running – in maintenance and sustaining engineering and support. Project teams working on the next new new thing can gloss over the details of how the software will actually run in production, how it will be deployed and how it should be hardened. If they miss something the problems won’t show up until the system starts to get used by real customers for real business under real load – which can be months after the system is launched, by which time the system might already be handed over to a sustaining engineering team to keep things turning.

This is when priorities change. The system always has to work. You can’t ignore production – you’re dragged down into the mucky details of what it takes to keep a system running. The reality is that you can’t maintain a system effectively without understanding operational issues and without understanding and working with the people who operate and support the system and its infrastructure.

Developers on maintenance teams and Ops are both measured on
  • System reliability and availability
  • Cycle time / turnaround on changes and fixes
  • System operations costs
  • Security and compliance
Devops tools and practices and ideas are the same tools and practices and ideas that people maintaining a system also need:
  • Version control and configuration management to track everything that you need to build and test and deploy and run the system
  • Fast and simple and repeatable build and deployment to make changes safe and cheap
  • Monitoring and alerting and logging to make support and troubleshooting more effective
  • Developers and operations working together to investigate and solve problems and to understand and learn together in blameless postmortems, building and sharing a culture of trust and transparency

Devops isn’t just for online startups

Devops describes the reality that maintenance and sustaining engineering teams would be if they could be working in. An alternative to late nights trying to get another software release out and hoping that this one will work; and to fire fighting in the dark; and to ass covering and finger pointing; and to filling out ops tickets and other bullshit paperwork. A reason to get up in the morning.

The dirty secret is that as developers most of us will spend most of our careers maintaining software so more of us should learn more about devops and start living it.