Monday, October 29, 2012

Ridin’ that AppSec Bull: OWASP AppSec USA in Austin

OWASP held its annual USA conference on application security last week in Austin. The conference was well run and well attended: more than 800 people, lots of developers and infosec experts. Here’s what I learned:

The Web Started off Broken

Javascript, the browser Same Origin Policy, and what eventually became the DOM were all created in about 10 days by 1 guy back in 1995 while Netscape was fighting for its life against Microsoft and IE. This explains a lot.

Fundamental decisions were made without enough time and thought – the results are design weaknesses that make CSRF and XSS attacks possible, as well as other problems and threats that we still face today.

The Web is Broken Today

We’re continuing to live with these problems because the brutal zero sum competitive game between browser vendors stops any of them from going back and trying to undo early decisions – nobody can afford to risk “breaking the web” and losing customers to a competitor. Browser vendors have to keep moving forward, adding new features and capabilities and more complexity on a broken foundation.

But it’s worse than just this. Apps are being hacked because of simple, stupid and sloppy mistakes that people keep making today, not just naive mistakes that were made more than 10 years ago. Most web apps (and now mobile apps) are being broken through SQL Injection even though this is easy for everyone to understand and prevent. Secure password storage is another fundamental thing that people keep doing wrong. We have to get together, make sure developers understand these basic problems, and get them fixed. Now.

The Web is going to stay Broken

Even crypto, done properly (which is asking a lot of most people), won’t stay safe for long. It’s becoming cheap enough and easy enough to rent enough Cloud resources to break long-established crypto algorithms. We need to keep aware of the shifting threats against crypto algorithms, and become what Michael Howard calls “crypto agile”.

HD Moore showed that the Cloud can also be used for Internet-wide reconnaissance. Not just scanning for vulnerabilities or inconsistencies in different parts of the web. This is opening up the entire web to researchers and attackers to find new correlations and patterns and attack points. With the resources of the Cloud, “$7 own the Internet hacks” are now possible.

Then there are new languages and development platforms like HTML5, which provides a rich set of capabilities to web and mobile developers, including audio and video, threads in Javascript, client-side SQL, client-side virtual file systems, and WebSockets – making the browser into a “a mini-OS”. This also means that the attack surface of HTML 5 apps has exploded, opening up new attack vectors for XSS and CSRF, and lots of other new types of attacks. From a security view point, HTML 5 is really, truly, deeply scary.

But some things are Getting Fixed

There are some things being done to make the web ecosystem safer like improvements to Javascript, ADsafe and the like.

Content Security Policy

But the most important and useful thing that I learned about is Content Security Policy, an attempt to fix fundamental problems in the Same Origin Policy, by letting companies define a white list of domains that browsers should consider to be valid sources of executable scripts.

CSP was mentioned in a lot of the talks as a new way of dealing with problems as well as finding out what is happening out there in the web, even helping to make HTML 5 safer. Twitter is already using Content Security Policy to detect violations, so, with some caveats, it works. CSP won’t work for every situation (consumer mashups for example), it doesn’t support inline Javascript, it only works in the latest browsers and then only if developers and administrators know about it and use it properly, and I am sure that people will find attacks to get around it. But it is a simple, understandable and practical way for enterprise app developers to close off, or at least significantly reduce the risk of, XSS attacks – something that developers could actually figure out and might actually use.

I agree with one of the speakers that CSP is “maybe the single most important thing we've done for security in the last several years” – it was worth going to the conference just to learn more about it. Now I'm looking forward to a Cheat Sheet from OWASP on how people can use Content Security Policy to protect their web sites from XSS.

Devops

Another key theme at the conference was how Devops – getting developers, operations and infosec working together, using common tools to deliver software faster and in smaller batches – can help make applications more secure. Through better communication, collaboration and automation, better configuration management and deployment and run-time checking and monitoring, and faster feedback to and from developers, you can prevent more problems from happening and find and fix problems much faster.

There was a Devops keynote, a Devops panel, and several case studies: Twitter, Etsy, Netflix and Mozilla showed what was possible if you have the discipline, talent, culture, tools, money and management commitment to do this. Of course for the rest of us who don’t work at Web 2.0 firms with more money than God, the results and constraints and approach will be different, but Devops can definitely help break down the cultural and organizational and information walls between Appsec and development. If you’re in AppSec, this is something to get behind of and help with, don’t get in the way.

Other Things Seen and Overheard

You apparently need a beard to work on the manly and furry AppSec team at Twitter.

If you’re asking executives for funding, don’t say “Defence in Depth” – they know this just means wasting more money. Talk about “compensating controls” instead.

“The most reliable, effective way of injecting evil code is buying an ad”.

There was a lot more. Austin was an excellent venue: friendly people, great restaurants, cool bats (I like bats), lots of good places to chill out and enjoy the nice weather. And the conference was great. I'm already looking forward to next year.

Tuesday, October 23, 2012

You can’t Refactor your way out of every Problem

Refactoring is a disciplined way to clarify, retain or restore the design of a system as you make changes, and to help cleanup and correct the mistakes and mess that we all make as we work, to clear away the evidence of false starts and changes in direction and back tracking and to help fill in gaps and misunderstandings.

As a colleague of mine has pointed out, you can get a lot out of even the most simple and obvious refactoring changes: eliminating duplication, changing variable and method names to be more meaningful, extracting methods, simplifying conditional logic, replacing a magic number with a named constant. These are easy things to do, and will give you a big return in understandability and maintainability.

But refactoring has limitations – there are some problems that refactoring won’t solve.

Refactoring can’t help you if the design is fundamentally wrong

Some people naively believe that you can refactor your way out of any design mistake or misunderstanding – and that you can use refactoring as a substitute for upfront design. This assumes that you will be able to immediately recognize mistakes and gaps from customer feedback and correct the design as you are developing.

But it can take a long time, usually only once the system is being used in the real world by real customers to do real things, before you learn how wrong you actually were, how much you missed and misunderstood, exceptions and edge cases and defects piling up before you finally understand (or accept) that no, the design doesn't hold up, you can’t just keep on extending it and patching what you have – you need a different set of abstractions or a different architecture entirely.

Refactoring helps you make course corrections. But what if you find out that you've been driving the entire time in the wrong direction, or in circles?

Barry Boehm, in Balancing Agility and Discipline, explains that starting simple and refactoring your way to the right answer sometimes falls down:

“Experience to date also indicates that low-cost refactoring cannot be depended upon as projects scale up. The most serious problems that arise with simple design are problems known as “architecture breakers”. These highly expensive problems can occur when early, simple design decisions result in forseeable changes that cause breakage in design beyond the ability of refactoring to handle.”

This is another argument in the “Refactor or Design” holy war over how much design should be / needs to be done upfront and how much can be filled in as you go through incremental change and refactoring.

Deep Decisions

Many design ideas can be refined, elaborated, iterated and improved over time, and refactoring will help you with this. But some early decisions on approach, packaging, architecture, and technology platform are too fundamental and too deep to change or correct with refactoring.

You can use refactoring to replace in-house code with standard library calls, or to swap one library for another – doing the same thing in a different way. Making small design changes and cleaning things up as you go with refactoring can be used to extend or fill in gaps in the design and to implement cross-cutting features like logging and auditing, even access control and internationalization – this is what the XP approach to incremental design is all about.

But making small-scale design changes and improvements to code structure, extracting and moving methods, simplifying conditional logic and getting rid of case statements isn’t going to help you if your architecture won’t scale, or if you chose the wrong approach (like SOA) or the wrong application framework (J2EE with Enterprise Java Beans, any multi-platform UI framework or any of the early O/R mapping frameworks – remember the first release of TopLink?, or something that you rolled yourself before you understood how the language actually worked), or the wrong language (if you found out that Ruby or PHP won’t scale), or a core platform middleware technology that proves to be unreliable or that doesn't hold up under load or that has been abandoned, or if you designed the system for the wrong kind of customer and need to change pretty much everything.

Refactoring to Patterns and Large Refactorings

Joshua Kerievsky’s work on Refactoring to Patterns provides higher-level composite refactorings to improve – or introduce – structure in a system, by properly implementing well-understood design patterns such as factories and composites and observers, replacing conditional logic with strategies and so on.

Refactoring to Patterns helps with cleaning up and correcting problems like

“duplicated code, long methods, conditional complexity, primitive obsession, indecent exposure, solution sprawl, alternative classes with different interfaces, lazy classes, large classes, combinatorial explosions and oddball solutions”.

Lippert and Roock’s work on Large Refactorings explains how to take care of common architectural problems in and between classes, packages, subsystems and layers, doing makeovers of ugly inheritance hierarchies and reducing coupling between modules and cleaning up dependency tangles and correcting violations between architectural layers – the kind of things that tools like Structure 101 help you to see and understand.

They have identified a set of architectural smells and refactorings to correct them:

  • Smells in dependency graphs: Visible dependency graphs, tree-like dependency graphs, cycles between classes, unused classes
  • Smells in inheritance hierarchies: Parallel inheritance hierarchies, list-like inheritance hierarchy, inheritance hierarchy without polymorphic assignments, inheritance hierarchy too deep, subclasses without redefinitions
  • Smells in packages: Unused packages, cycles between packages, too small/large packages, packages unclearly named, packages too deep or nesting unbalanced
  • Smells in subsystems: Subsystem overgeneralized, subsystem API bypassed, subsystem too small/large, too many subsystems, no subsystems, subsystem API too large
  • Smells in layers: Too many layers, no layers, strict layers violated, references between vertically separate layers, upward references in layers, inheritance between protocol-oriented layers (coupling).

Composite refactorings and large refactorings raise refactoring to higher levels of abstraction and usefulness, and show you how to identify problems on your own and how to come up with your own refactoring patterns and strategies.

But refactoring to patterns or even large-scale refactoring still isn't enough to unmake or remake deep decisions or change the assumptions underlying the design and architecture of the system. Or to salvage code that isn't safe to refactor, or worth refactoring.

Sometimes you need to rewrite, not refactor

There is no end of argument over how bad code has to be before you should give up and rewrite it rather than trying to refactor your way through it.

The best answer seems to be that refactoring should always be your first choice, even for legacy code that you didn’t write and don’t understand and can’t test (there is an entire book written on how and where to start refactoring legacy spps).

But if the code isn’t working, or is so unstable and so dangerous that trying to refactor it only introduces more problems, if you can’t refactor or even patch it without creating new bugs, or if you need to refactor too much of the code to get it into acceptable shape (I’ve read somewhere than 20% is a good cut-off, but I can’t find the reference), then it’s time to declare technical bankruptcy and start again. Rewriting the code from scratch is sometimes your only choice. Some code shouldn't be – or can’t be – saved.

"Sometimes code doesn't need small changes—it needs to be tossed out so that you can start over. If you find yourself in a major refactoring session, ask yourself whether instead you should be redesigning and reimplementing that section of code from the ground up." Steve McConnell, Code Complete

You can use refactoring to restore, repair, cleanup or adapt the design or even the architecture of a system. Refactoring can help you to go back and make corrections, reduce complexity, and help you fill in gaps. It will pay dividends in reducing the cost and risk of ongoing development and support.

But refactoring isn’t enough if you have to reframe the system – if you need to do something fundamentally different, or in a fundamentally different way – or if the code isn’t worth salvaging. Don’t get stuck believing that refactoring is always the right thing to do, or that you can refactor yourself out of every problem.

Wednesday, October 17, 2012

Should you care about Conway's Law?

Conway’s Law says that

“organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.” [emphasis mine]

This was an assertion made in the 1960s based on a small study which has now become a truism in software development (it’s fascinating how much of what we do and think today is based on data that is 50 or more years old). There are lots of questions to ask about Conway’s Law. Should we believe it – is there evidence to support it? How important is the influence of the structure of the team that designed and built the system compared to the structure of the team that continued to change and maintain it for several years – are initial decisions more or less important? What happens as the organization structure changes over time – are these changes reflected in the structure of the code? What organization structures result in better code, or is it better to have no organization structure at all?

Conway's Law and Collective Code Ownership

Conway’s Law is sometimes used as an argument for a “Whole Team” approach and “Collective Code Ownership” in Agile development. The position taken is that systems that are designed by teams structured around different specializations are of lower quality (because they are artificially constrained) than systems built by teams of “specialized generalists” or “generalizing specialists” who share responsibilities and the code (in small Scrum teams for example).

Communications Structure and Seams

First it is important to understand that the argument in Conway’s Law is not necessarily about how organizations are structured. It is about how people inside an organization communicate with each other – whether and how they talk to each other and share information, the freedom and frequency and form, is communication low-bandwidth and formal/structured, or high-bandwidth and informal. It’s about the “social structure” of an organization.

There are natural seams that occur in any application architecture, as part of decomposition and assignment of responsibilities (which is what application architecture is all about). Client and server separation (UI and UX work is quite different from what needs to be done on the server, and is often done with completely different technology), API boundaries with outside systems, data management, reporting, transaction management, workflow. Different kinds of problems that require different skills to solve them.

The useful argument that Conway’s Law makes is that unnatural seams, unnecessary complexity, misunderstandings and disconnects will appear in the system where people don’t communicate with each other effectively.

Conway’s Corollary

Much more interesting is what Conway’s Law means to how you should structure your development organization. This is Conway’s Corollary:

“A software system whose structure closely matches its organization’s communication structure works better (defined broadly) than a subsystem who structure differs from its organization’s communication structure.”
“Better” means higher productivity for the people developing and maintaining the system, through more efficient communication and coordination, and higher quality.

In Making Software, Christian Bird at Microsoft Research (no relation) explains how important it is that an organization’s “social structure” mirrors the architecture of the system that they are building or working on. He walks through a study on the relationship between development team organization structure and post-release defects, in this case the organization that built Microsoft Windows Vista. This was a very large project, with thousands of developers, working on tens of millions of LOC. The study found that organization structure was a better indicator of software quality than any attributes of the software itself. The more complex the organization, the more coordination required, the more chances for bugs (obvious, but worth verifying). What is most important is “geographic and structural congruence” – work that is related should be done by people who are working closely together (also obvious, and now we have data to prove it).

Conway's Corollary and Collective Code Ownership

Conway’s Corollary argues against the “Collective Code Ownership” principle in XP where everyone can and should work on any part of the code at any time. The Microsoft study found that where developers from different parts of the organization worked on the same code, there were more bugs. It was better to have a team own a piece of code, or at the very least act as a gatekeeper and review all changes. Work is best done by the people (or person) who understand the code the most.

Making Organizational Decisions

A second study of 5 OSS projects was also interesting, because it showed that even in Open Source projects, people naturally form teams to work together on logically related parts of a code base.

The lessons from Conway's Corollary are that you should delay making decisions on organization until you understand the architectural relationships in a system; and that you need to reorganize the team to fit as the architecture changes over time. Dan Pritchett even suggests that that if you want to change the architectural structure of a system, you should start by changing the organization structure of the team to fit the target design – forcing the team to work togeter to “draw the new architecture out of the code”.

Conway’s Law is less important and meaningful than people believe. Applying the argument to small teams, especially co-located Agile teams where people are all working closely together and talking constantly, is effectively irrelevant.

Conway’s Corollary however is valuable, especially for large, distributed development organizations. It’s important for managers to ensure that the structure of the team is aligned with the architectural structure of the system – the way it is today, or the way you want it to be.

Thursday, October 11, 2012

Bad Things Happen to Good Code

We need to understand what happens to code over time and why, and what a healthy, long-lived code base looks like. What architectural decisions have the most lasting impact, and what decisions made early will make the most difference over the life of a system.

Forces of Compromise

Most of the discussion around technical debt assumes that code degrades over time because of sloppiness and lazy coding practices and poor management decisions, by programmers who don’t know or don’t care about what they are doing or who are forced to take short-cuts under pressure. But it’s not that simple. Code is subject to all kinds of pressures and design compromises, big and small, in the real world.

Performance optimization trade-offs can force you to bend the design and code in ways that were never expected. Dealing with operational dependencies and platform quirks and run-time edge cases also adds complexity. Then there are regulatory requirements – things that don’t fit the design and don’t necessarily make sense but you have to do anyways. And customization: customer-specific interfaces and options and custom workflow variants and custom rules and custom reporting, all to make someone important happy.

Integration with other systems and API lock-in and especially maintaining backwards compatibility with previous versions can all make for ugly code. Michael Feathers, who I think is doing the most interesting and valuable work today in understanding what happens to code and what should happen to code over time, has found that code around APIs and other hard boundaries becomes especially messy – because some interfaces are so hard to change, this forces programmers to do extra work (and workarounds) behind the scenes.

All of these forces contribute to making a system more complex, harder to understand, harder to change and harder to test over time – and harder to love.

Iterative Development is Erosive

In Technical Debt, Process and Culture, Feathers explains that “generalized entropy in software systems” is inevitable, the result of constant and normal wear and tear in an organization. As more people work on the same code, the design will naturally deteriorate as each person interprets the design in their own way and makes their own decisions on how to do something. What’s interesting is that the people working with this code can’t see how much of the design has been lost because their familiarity with the code makes it appear to be simpler and clearer than it really is. It’s only when somebody new joins the team that it becomes apparent how bad things have become.

Feathers also suggests that highly iterative development accelerates entropy, and that code which is written iteratively is qualitatively different than code in systems where the team spent more time in upfront design. Iterative development and maintenance tend to bias towards the existing structure of the system, meaning that more compromises will end up being made.

Iterative design and development involves making a lot of small mistakes, detours and false starts as you work towards the right solution. Testing out new ideas in production through A/B split testing amplifies this effect, creating more options and complexity. As you work this way some of the mistakes and decisions that you make won’t get unmade – you either don’t notice them, or it’s not worth the cost. So you end up with dead abstractions and dead ends, design ideas that aren't meaningful any more or are harder to work with than they should be. Some of this will be caught and corrected later in the course of refactoring, but the rest of it becomes too pervasive and expensive to justify ripping out.

Dealing with Software Sprawl

Software, at least software that gets used, gets bigger and more complicated over time – it has to, as you add more features and interfaces and deal with more exceptions and alternatives and respond to changing laws and regulations. Capers Jones analysis shows that the size of the code base for a system under maintenance will increase between 5-10% per year. Our own experience bears this out - the code base for our systems has doubled in size in the last 5 years.

As the code gets bigger it also gets more complex – code complexity tends to increase an average of between 1% and 3% per year. Some of this is real, essential complexity – not something that you can wish your way out of. But the rest is due to how changes and fixes are done.

Feathers has confirmed by mining code check-in history (Discovering Startling Things from your Version Control System) that most systems have a common shape or “power curve”. Most code is changed only infrequently or not at all, but the small percentage of methods and classes in the system that are changed a lot tend to get bigger and more complex over time. This is because it is

easier to add code to an existing method than to add a new method and easier to add another method to an existing class than to add a new class.
The key to keeping a code base healthy is disciplined refactoring of this code, taking the time to come up with new and better abstractions, and preventing the code from festering and becoming malignant.

There is also one decision upfront that has a critical impact on the future health of a code base. Capers Jones has found that the most important factor in how well a system ages is, not surprisingly, how complex the design was in the beginning:

The rate of entropy increase, or the increase in cyclomatic complexity, seems to be proportional to the starting value. Applications that are high in complexity when released will experience much faster rates or entropy or structural decay than applications put into production with low complexity levels. The Economics of Software Quality
Systems that were poorly designed only get worse – but Jones has found that systems that were well-designed can actually get better over time.