Tuesday, July 29, 2014

Devops isn't killing developers – but it is killing development and developer productivity

Devops isn't killing developers – at least not any developers that I know.

But Devops is killing development, or the way that most of us think of how we are supposed to build and deliver software. Agile loaded the gun. Devops is pulling the trigger.

Flow instead of Delivery

A sea change is happening in the way that software is developed and delivered. Large-scale waterfall software development projects gave way to phased delivery and Spiral approaches, and then to smaller teams delivering working code in time boxes using Scrum or other iterative Agile methods. Now people are moving on from Scrum to Kanban, and to One-Piece Continuous Flow with immediate and Continuous Deployment of code to production in Devops.

The scale and focus of development continues to shrink, and so does the time frame for making decisions and getting work done. Phases and milestones and project reviews to sprints and sprint reviews to Lean controls over WIP limits and task-level optimization. The size of deliverables: from what a project team could deliver in a year to what a Scrum team could get done in a month or a week to what an individual developer can get working in production in a couple of days or a couple of hours.

The definition of “Done” and “Working Software” changes from something that is coded and tested and ready to demo to something that is working in production – now (“Done Means Released”).

Continuous Delivery and Continuous Deployment replace Continuous Integration. Rapid deployment to production doesn't leave time for manual testing or for manual testers, which means developers are responsible for catching all of the bugs themselves before code gets to production – or do their testing in production and try to catch problems as they happen (aka “Monitoring as Testing").

Because Devops brings developers much closer to production, operational risks become more important than project risks, and operational metrics become more important than project metrics. System uptime and cycle time to production replace Earned Value or velocity. The stress of hitting deadlines is replaced by the stress of firefighting in production and being on call.

Devops isn't about delivering a project or even delivering features. It’s about minimizing lead time and maximizing flow of work to production, recognizing and eliminating junk work and delays and hand offs, improving system reliability and cutting operational costs, building in feedback loops from production to development, standardizing and automating steps as much as possible. It’s more manufacturing and process control than engineering.

Devops kills Developer Productivity too

Devops also kills developer productivity.

Whether you try to measure developer productivity by LOC or Function Points or Feature Points or Story Points or velocity or some other measure of how much code is written, less coding gets done because developers are spending more time on ops work and dealing with interruptions, and less time writing code.

Time learning about the infrastructure and the platform and understanding how it is setup and making sure that it is setup right. Building Continuous Delivery and Continuous Deployment pipelines and keeping them running. Helping ops to investigate and resolve issues, responding to urgent customer requests and questions, looking into performance problems, monitoring the system to make sure that it is working correctly, helping to run A/B experiments, pushing changes and fixes out… all take time away from development and pre-empt thinking about requirements and designing and coding and testing (the work that developers are trained to do and are good at).

The Impact of Interruptions and Multi-Tasking

You can’t protect developers from interruptions and changes in priorities in Devops, even if you use Kanban with strict WIP limits, even in a tightly run shop – and you don’t want to. Developers need to be responsive to operations and customers, react to feedback from production, jump on problems and help detect and resolve failures as quickly as possible. This means everyone, especially your most talented developers, need to be available for ops most if not all of the time.

Developers join ops on call after hours, which means carrying a pager (or being chased by Pager Duty) after the day’s work is done. And time wasted on support calls for problems that end up not being real problems, and long nights and weekends on fire fighting and tracking down production issues and helping to recover from failures, coming in tired the next day to spend more time on incident dry runs and testing failover and roll-forward and roll-back recovery and participating in post mortems and root cause analysis sessions when something goes wrong and the failover or roll-forward or roll-back doesn’t work.

You can’t plan for interruptions and operational problems, and you can’t plan around them. Which means developers will miss their commitments more often. Then why make commitments at all? Why bother planning or estimating? Use just-in-time prioritization instead to focus in on the most important thing that ops or the customer need at the moment, and deliver it as soon as you can – unless something more important comes up and pre-empts it.

As developers take on more ops and support responsibilities, multi-tasking and task switching – and the interruptions and inefficiency that come with it – increase, fracturing time and destroying concentration. This has an immediate drag on productivity, and a longer term impact on people’s ability to think and to solve problems.

Even the Continuous Deployment feedback loop itself is an interruption to a developer’s flow.

After a developer checks in code, running unit tests in Continuous Integration is supposed to be fast, a few seconds or minutes, so that they can keep moving forward with their work. But to deploy immediately to production means running through a more extensive set of integration tests and systems tests and other checks in Continuous Delivery (more tests and more checks takes more time), then executing the steps through to deployment, and then monitoring production to make sure that everything worked correctly, and jumping in if anything goes wrong. Even if most of the steps are automated and optimized, all of this takes extra time and the developer’s attention away from working on code.

Optimizing the flow of work in and out of operations means sacrificing developer flow, and slowing down development work itself.

Expectations and Metrics and Incentives have to Change

In Devops, the way that developers (and ops) work change, and the way that they need to be managed changes. It’s also critical to change expectations and metrics and incentives for developers.

Devops success is measured by operational IT metrics, not on meeting project delivery goals of scope, schedule and cost, not on meeting release goals or sprint commitments, or even meeting product design goals.

  • How fast can the team respond to important changes and problems: Change Lead Time and Cycle Time to production instead of delivery milestones or velocity
  • How often do they push changes to production (which is still the metric that most people are most excited about – how many times per day or per hour or minute Etsy or Netflix or Amazon deploy changes)
  • How often do they make mistakes - Change / Failure ratio
  • System reliability and uptime – MTBF and especially MTTD and MTTR
  • Cost of change – and overall Operations and Support costs

Devops is more about Ops than Dev

As more software is delivered earlier and more often to production, development turns into maintenance. Project management is replaced by incident management and task management. Planning horizons get much shorter – or planning is replaced by just-in-time queue prioritization and triage.

With Infrastructure as Code Ops become developers, designing and coding infrastructure and infrastructure changes, thinking about reuse and readability and duplication and refactoring, technical debt and testability and building on TDD to implement TDI (Test Driven Infrastructure). They become more agile and more Agile, making smaller changes more often, more time programming and less on paper work.

And developers start to work more like ops. Taking on responsibilities for operations and support, putting operational risks first, caring about the infrastructure, building operations tools, finding ways to balance immediate short-term demands for operational support with longer-term design goals.

None of this will be a surprise to anyone who has been working in an online business for a while. Once you deliver a system and customers start using it, priorities change, everything about the way that you work and plan has to change too.

This way of working isn't better for developers, or worse necessarily. But it is fundamentally different from how many developers think and work today. More frenetic and interrupt-driven. At the same time, more disciplined and more Lean. More transparent. More responsibility and accountability. Less about development and more about release and deployment and operations and support.

Developers – and their managers – will need to get used to being part of the bigger picture of running IT, which is about much more than designing apps and writing and delivering code. This might be the future of software development. But not all developers will like it, or be good at it.

Thursday, July 17, 2014

Trust instead of Threats

According to Dr. Gary McGraw’s ground breaking work on software security, up to half of security mistakes are made in design rather than in coding. So it’s critical to prevent – or at least try to find and fix – security problems in design.

For the last 10 years we’ve been told that we are supposed to do this through threat modeling aka architectural risk analysis – a structured review of the design or architecture of a system from a threat perspective to identify security weaknesses and come up with ways to resolve them.

But outside of a few organizations like Microsoft threat modeling isn’t being done at all, or at best only on an inconsistent basis.

Cigital’s work on the Build Security In Maturity Model (BSIMM), which looks in detail at application security programs in different organizations, has found that threat modeling doesn't scale. Threat modeling is still too heavyweight, too expensive, too waterfally, and requires special knowledge and skills.

The SANS Institute’s latest survey on application security practices and tools asked organizations to rank the application security tools and practices they used the most and found most effective. Threat modeling was second last.

And at the 2014 RSA Conference, Jim Routh at Aetna, who has implemented large-scale secure development programs in 4 different major organizations, admitted that he has not yet succeeded in injecting threat modeling into design anywhere “because designers don’t understand how to make the necessary tradeoff decisions”.

Most developers don’t know what threat modeling is, or how do to it, never mind practice it on a regular basis. With the push to accelerate software delivery, from Agile to One-Piece Continuous Flow and Continuous Deployment to production in Devops, the opportunities to inject threat modeling into software development are disappearing.

What else can we do to include security in application design?

If threat modeling isn’t working, what else can we try?

There are much better ways to deal with security than threat modelling... like not being a tool.
JeffCurless, comment on a blog post about threat modeling

Security people think in terms of threats and risks – at least the good ones do. They are good at exploring negative scenarios and what-ifs, discovering and assessing risks.

Developers don’t think this way. For most of them, walking through possibilities, things that will probably never happen, is a waste of time. They have problems that need to be solved, requirements to understand, features to deliver. They think like engineers, and sometimes they can think like customers, but not like hackers or attackers.

In his new book on Threat Modeling Adam Shostack says that telling developers to “think like an attacker” is like telling someone to think like a professional chef. Most people know something about cooking, but cooking at home and being a professional chef are very different things. The only way to know what it’s like to be a chef and to think like a chef is to work for some time as a chef. Talking to a chef or reading a book about being a chef or sitting in meetings with a chef won’t cut it.

Developrs aren’t good at thinking like attackers, but they constantly make assertions in design, including important assertions about dependencies and trust. This is where security should be injected into design.

Trust instead of Threats

Threats don’t seem real when you are designing a system, and they are hard to quantify, even if you are an expert. But trust assertions and dependencies are real and clear and concrete. Easy to see, easy to understand, easy to verify. You can read the code, or write some tests, or add a run-time check.

Reviewing a design this way starts off the same as a threat modeling exercise, but it is much simpler and less expensive. Look at the design at a system or subsystem-level. Draw trust boundaries between systems or subsystems or layers in the architecture, to see what’s inside and what’s outside of your code, your network, your datacenter:

Trust boundaries are like software firewalls in the system. Data inside a trust boundary is assumed to be valid, commands inside the trust boundary are assumed to have been authorized, users are assumed to be authenticated. Make sure that these assumptions are valid. And make sure to review dependencies on outside code. A lot of security vulnerabilities occur at the boundaries with other systems, or with outside libraries because of misunderstandings or assumptions in contracts.
OWASP Application Threat Modeling

Then, instead of walking through STRIDE or CAPEC or attack trees or some other way of enumerating threats and risks, ask some simple questions about trust:

Are the trust boundaries actually where you think they are, or think they should be?

Can you trust the system or subsystem or service on the other side of the boundary? How can you be sure? Do you know how it works, what controls and limits it enforces? Have you reviewed the code? Is there a well-defined API contract or protocol? Do you have tests that validate the interface semantics and syntax?

What data is being passed to your code? Can you trust this data – has it been validated and safely encoded, or do you need to take care of this in your code? Could the data have been tampered with or altered by someone else or some other system along the way?

Can you trust the code on the other side to protect the integrity and confidentiality of data that you pass to it? How can you be sure? Should you enforce this through a hash or an HMAC or a digital signature or by encrypting the data?

Can you trust the user’s identity? Have they been properly authenticated? Is the session protected?

What happens if an exception or error occurs, or if a remote call hangs or times out – could you lose data or data integrity, or leak data, does the code fail open or fail closed?

Are you relying on protections in the run-time infrastructure or application framework or language to enforce any of your assertions? Are you sure that you are using these functions correctly?

These are all simple, easy-to-answer questions about fundamental security controls: authentication, access control, auditing, encryption and hashing, and especially input data validation and input trust, which Michael Howard at Microsoft has found to be the cause of half of all security bugs.

Secure Design that can actually be done

Looking at dependencies and trust will find – and prevent – important problems in application design.

Developers don’t need to learn security jargon, try to come up with attacker personas or build catalogs of known attacks and risk weighting matrices, or figure out how to use threat modeling tools or know what a cyber kill chain is or understand the relative advantages of asset-centric threat modeling over attacker-centric modeling or software-centric modeling.

They don’t need to build separate models or hold separate formal review meetings. Just look at the existing design, and ask some questions about trust and dependencies. This can be done by developers and architects in-phase as they are working out the design or changes to the design – when it is easiest and cheapest to fix mistakes and oversights.

And like threat modeling, questioning trust doesn’t need to be done all of the time. It’s important when you are in the early stages of defining the architecture or when making a major design change, especially a change that makes the application’s attack surface much bigger (like introducing a new API or transitioning part of the system to the Cloud). Any time that you are doing a “first of”, including working on a part of the system for the first time. The rest of the time, the risks of getting trust assumptions wrong should be much lower.

Just focusing on trust won’t be enough if you are building a proprietary secure protocol. And it won’t be enough for high-risk security features – although you should be trying to leverage the security capabilities of your application framework or a special-purpose security library to do this anyways. There are still cases where threat modeling should be done – and code reviews and pen testing too. But for most application design, making sure that you aren’t misplacing trust should be enough to catch important security problems before it is too late.

Wednesday, July 9, 2014

10 things you can do to as a developer to make your app secure: #10 Design Security In

There’s more to secure design and architecture besides properly implementing Authentication, Access Control and Logging strategies, and choosing (and properly using) a good framework.

You need to consider and deal with security threats and risks at many different points in your design.

Adam Shostack’s new book on Threat Modeling explores how to do this in detail, with lots of exercises and examples on how to look for and plug security holes in software design, and how to think about design risks.

But some important basic ideas in secure design will take you far:

Know your Tools

When deciding on the language(s) and technology stack for the system, make sure that you understand the security constraints and risks that your choices will dictate. If you’re using a new language, take time to learn about how to write code properly and safely in that language. If you’re programming in Java, C or C++ or Perl check out CERT’s secure coding guidelines for those languages. If you're writing code on iOS, read Apple's Secure Coding Guide. For .NET, review OWASP's .NET Security project.

Look for static analysis tools like Findbugs and PMD for Java, JSHint for Javascript, OCLint for C/C++ and Objective-C, Brakeman for Ruby, RIPS for PHP, Microsoft's static analysis tools for .NET, or commercial tools that will help catch common security bugs and logic bugs in coding or Continuous Integration.

And make sure that you (or ops) understand how to lock down or harden the O/S and to safely configure your container and database (or NoSQL data) manager.

Tiering and Trust

Tiering or layering, and trust in design are closely tied together. You must understand and verify trust assumptions at the boundaries of each layer in the architecture and between systems and between components in design, in order to decide what security controls need to be enforced at these boundaries: authentication, access control, data validation and encoding, encryption, logging.

Understand when data or control crosses a trust boundary: to/from code that is outside of your direct control. This could be an outside system, or a browser or mobile client or other type of client, or another layer of the architecture or another component or service.

Thinking about trust is much simpler and more concrete than thinking about threats. And easier to test and verify. Just ask some simple questions:

Where is the data coming from? How can you be sure? Can you trust this data – has it been validated and safely encoded? Can you trust the code on the other side to protect the integrity and confidentiality of data that you pass to it? Do you know what happens if an exception or error occurs – could you lose data or data integrity, or leak data, does the code fail open or fail closed?

Before you make changes to the design, make sure that you understand these assumptions and make sure that the assumptions are correct.

The Application Attack Surface

Finally, it’s important to understand and manage the system’s Attack Surface: all of the ways that attackers can get in, or get data out, of the system, all of the resources that are exposed to attackers. APIs, files, sockets, forms, fields, URLs, parameters, cookies. And the security plumbing that protects these parts of the system.

Your goal should be to try to keep the Attack Surface as small as possible. But this is much easier said than done: each new feature and new integration point expands the Attack Surface. Try to understand the risks that you are introducing, and how serious they are. Are you creating a brand new network-facing API or designing a new business workflow that deals with money or confidential data, or changing your access control model, or swapping out an important part of your platform architecture? Or are you just adding yet another CRUD admin form, or just one more field to an existing form or file. In each case you are changing the Attack Surface, but the risks will be much different, and so will the way that you need to manage these risks.

For small, well-understood changes the risks are usually negligible – just keep coding. If the risks are high enough you’ll need to do some abuse case analysis or threat modeling, or make time for a code review or pen testing.

And of course, once a feature or option or interface is no longer needed, remove it and delete the code. This will reduce the system’s Attack Surface, as well as simplifying your maintenance and testing work.

That’s it. We’re done.

The 10 things you can do as a developer to make your app secure: from thinking about security in architectural layering and technology choices, including security in requirements, taking advantage of other people’s code by using frameworks and libraries carefully, making sure that you implement basic security controls and features like Authentication and Access Control properly, protecting data privacy, logging with security in mind, and dealing with input data and stopping injection attacks, especially SQL injection.

This isn’t an exhaustive list. But understanding and dealing with these important issues in application security – including security when you think about requirements and design and coding and testing, knowing more about your tools and using them properly – is work that all developers can do, and will take you a long, long way towards making your system secure and reliable.

Monday, July 7, 2014

10 things you can do as a developer to make your app secure: #9 Start with Requirements

To build a secure system, you should start thinking about security from the beginning.

Legal and Compliance Constraints

First, make sure that everyone on the team understands the legal and compliance requirements and constraints for the system.

Regulations will drive many of the security controls in your system, including authentication, access control, data confidentiality and integrity (and encryption), and auditing, as well as system availability and reliability.

Agile teams in particular should not depend only on their Product Owner to understand and communicate these requirements. Compliance restrictions can impose important design constraints which may not be clear from a business perspective, as well as assurance requirements that dictate how you need to build and test and deliver the system, and what evidence you need to show that you have done a responsible job.

As developers you should try to understand what all of this means to you as early as possible. As a place to start, Microsoft has a useful and simple guide (Regulatory Compliance Demystified: An Introduction to Compliance for Developers) that explains common business regulations including SOX, HIPAA, and PCI-DSS and what they mean to developers.

Tracking Confidential Data

The fundamental concern in most regulatory frameworks is controlling and protecting data.

Make sure that everyone understands what data is private/confidential/sensitive and therefore needs to be protected. Identify and track this data throughout the system. Who owns the data? What is the chain of custody? Where is the data coming from? Can the source be trusted? Where is the data going to? Can you trust the destination to protect the data? Where is the data stored or displayed? Does it have to be stored or displayed? Who is authorized to create it, see it, change it, and do these actions need to be tracked and reviewed?

The answers to these questions will drive requirements for data validation, data integrity, access control, encryption, and auditing and logging controls in the system.

Application Security Controls

Think through the basic functional application security controls: Authentication, Access Control, Auditing – all of which we’ve covered earlier in this series of posts. Where do these controls need to be added? What security stories need to be written? How will these controls be tested?

Business Logic Abuse Can be Abused

Security also needs to be considered in business logic, especially multi-step application workflows dealing with money or other valuable items, or that handle private or sensitive information, or command and control functions. Features like online shopping carts, online banking account transactions, user password recovery, bidding in online auctions, or online trading and root admin functions are all potential targets for attack.

The user stories or use cases for these features should include exceptions and failure scenarios (what happens if a step or check fails or times out, or if the user tries to cancel or repeat or bypass a step?) and requirements derived from "abuse cases" or “misuse cases”. Abuse cases explore how the application's checks and controls could be subverted by attackers or how the functions could be gamed, looking for common business logic errors including time of check/time of use and other race conditions and timing issues, insufficient entropy in keys or addresses, information leaks, failure to prevent brute forcing, failing to enforce workflow sequencing and approvals, and basic mistakes in input data validation and error/exception handling and limits checking.

This isn’t defence-against-the-dark-arts black hat magic, but getting this stuff wrong can be extremely damaging. For some interesting examples of how bad guys can exploit small and often just plain stupid mistakes in application logic, read Jeremiah Grossman’s classic paper “Seven Business Logic Flaws that put your Website at Risk”.

Make time to walk through important abuse cases when you’re writing up stories or functional requirements, and make sure to review this code carefully and include extra manual testing (especially exploratory testing) as well as pen testing of these features to catch serious business logic problems.

We’re close to the finish line. The final post in this series is coming up: Design and Architect Security In.

Wednesday, July 2, 2014

10 things you can do as a developer to make your app secure: #8 Leverage other people's Code (Carefully)

As you can see from the previous posts, building a secure application takes a lot of work.

One short cut to secure software can be to take advantage of the security features of your application framework. Frameworks like .NET and Rails and Play and Django and Yii provide lots of built-in security protection if you use them properly. Look to resources like OWASP’s .NET Project and .NET Security Cheat Sheet, the Ruby on Rails Security Guide, the Play framework Security Guide, Django’s security documentation, or How to write secure Yii applications, Apple’s Secure Coding Guide or the Android security guide for developers for framework-specific security best practices and guidelines.

There will probably be holes in what your framework provides, which you can fill in using security libraries like Apache Shiro, or Spring Security, or OWASP’s comprehensive (and heavyweight) ESAPI, and special purpose libraries like Jasypt or Google KeyCzar and the Legion of the Bouncy Castle for crypto, and encoding libraries for XSS protection and protection from other kinds of injection.

Keep frameworks and libraries up to date

If you are going to use somebody else’s code, you also have to make sure to keep it up to date. Over the past year or so, high-profile problems including a rash of serious vulnerabilities in Rails in 2013 and the recent OpenSSL Heartbleed bug have made it clear how important it is to know all of the Open Source frameworks and libraries that you use in your application (including in the run-time stack), and to make sure that this code does not have any known serious vulnerabilities.

We’ve known for a while that popular Open Source software components are also popular (and easy) attack targets for bad guys. And we’re making it much too easy for the bad guys.

A 2012 study by Aspect Security and Sonatype looked at 113 million downloads of the most popular Java frameworks (including Spring, Apache CXF, Hibernate, Apache Commons, Struts and Struts2, GWT, JSF, Tapestry and Velocity) and security libraries (including Apache Shiro, Jasypt, ESAPI, BouncyCastle and AntiSamy). They found that 37% of this software contained known vulnerabilities, and that people continued to download obsolete versions of software with well-known vulnerabilities more than ¼ of the time.

This has become a common enough and serious enough problem that using software frameworks and other components with known vulnerabilities is now in the OWASP Top 10 Risk list.

Find Code with Known Vulnerabilities and Patch It - Easy, Right?

You can use a tool like OWASP’s free Dependency Check or commercial tools like Sonatype CLM to keep track of Open Source components in your repositories and to identify code that contains known vulnerabilities.

Once you find the problems, you have to fix them - and fix them fast. Research by White Hat Security shows that serious security vulnerabilities in most Java apps take an average of 91 days to fix once a vulnerability is found. That’s leaving the door wide open for way too long, almost guaranteeing that bad guys will find their way in. If you don't take responsibility for this code, you can end up making your app less secure instead of more secure.

Next: let’s go back to the beginning, and look at security in requirements.