Thursday, April 30, 2015

Can DevOps(Sec) make Software more Secure?

There was a lot of talk at RSA this year about DevOps and security: DevOpsSec or DevSecOps or Rugged DevOps or whatever people want to call it. This included a full-day seminar on DevOps before the conference opened and several talks and workshops throughout the conference which tried to make the case that DevOps isn’t just about delivering software faster, but making software better and more secure; and that DevOps isn't just for the Cloud, but that it can work in the enterprise.

Rugged DevOps

The Rugged DevOps story is based on a few core ideas:

Delivering smaller changes, more often, reduces complexity. Smaller, less complex changes are easier to code and test and review, and easier to troubleshoot when something goes wrong. And this should result in safer and more secure code: less complex code has fewer bugs, and code that has fewer bugs also has fewer vulnerabilities.

If you’re going to deliver code more often, you need to automate and streamline the work of testing and deployment. A standardized, repeatable and automated build and deployment pipeline, with built-in testing and checks, enables you to push changes out much faster and with much more confidence, which is important when you are trying to patch a critical vulnerability.

And using an automated deployment pipeline for all changes – changes to application code and configuration and changes to infrastructure – provides better change control. You know what was changed, by who and when, on every system, and you can track all changes back to your version control system.

But this means that you need to re-tool and re-think how you do deployment and configuration management, which is why so many vendors – not just Opscode and Puppet Labs, but classic enterprise vendors like IBM – are so excited about DevOps.

The DevOps Security Testing Problem

And you also need to re-tool and re-think how you do testing, especially system testing and security testing.

In DevOps, with Continuous Delivery or especially Continuous Deployment to production, you don’t have a “hardening sprint” where you can schedule a pen test or in-depth scans or an audit or operational reviews before the code gets deployed. Instead, you have to do your security testing and checks in-phase, as changes are checked-in. Static analysis engines that support incremental checking can work here, but most other security scanning and testing tools that we rely on today won’t keep up.

Which means that you’ll need to write your own security tests. But this raises a serious question. Who’s going to write these tests?

Infosec? There’s already a global shortage of people who understand application security today. And most of these people – the ones who aren’t working at consultancies or for tool vendors – are busy doing risk assessments and running scans and shepherding the results through development to get vulnerabilities fixed, or maybe doing secure code reviews or helping with threat modeling in a small number of more advanced shops. They don’t have the time or often the skills to write automated security tests in Ruby or whatever automated testing framework that you select.

QA? In more and more shops today, especially where Agile or DevOps methods are followed, there isn't anybody in QA, because manual testers who walk through testing checklists can’t keep up, so developers are responsible for doing their own testing.

When it comes to security testing, this is a problem. Most developers still don’t have the application security knowledge to understand how to write secure code, which means that they also don’t understand enough about security to know what security tests need to be written. And writing an automated attack in Gauntlt (and from what I can tell, more people are talking about Gauntlt than writing tests with it) is a lot different than writing happy path automated unit tests in JUnit or UI-driven functional tests in Selenium or Watir.

So we shouldn’t expect too much from automated security testing in DevOps. There’s not enough time in a Continuous Delivery pipeline to do deep scanning or comprehensive fuzzing especially if you want to deploy each day or multiple times per day, and we won’t get real coverage from some automated security tests written in Gauntlt or Mittn.

But maybe that’s ok, because DevOps could force us to change the way that we think about and the way that we do application security, just as Agile development changed the way that most of us design and build applications.

DevOpsSec – a Forcing Factor for Change

Agile development pushed developers to work more closely with each other and with the Customer, to understand real requirements and priorities and to respond to changes in requirements and priorities. And it also pushed developers to take more responsibility for code quality and for making sure that their code actually did what it was supposed to, through practices like TDD and relentless automated testing.

DevOps is pushing developers again, this time to work more closely with operations and infosec, to understand what’s required to make their code safe and resilient and performant. And it is pushing developers to take responsibility for making their code run properly in production:

“You build it, you run it”
Werner Vogels, CTO Amazon

When it comes to security, DevOps can force a fundamental change in how application security is done today, from "check-then-fix" to something that will actually work: building security in from the beginning, where it makes the most difference. But a lot of things have to change for this to succeed:

Developers need better appsec skills, and they need to work more closely with ops and with infosec, so that they can understand security and operational risks and understand how to deal with them proactively. Thinking more about security and reliability in requirements and design, understanding the security capabilities of their languages and frameworks and using them properly, writing more careful code and reviewing code more carefully.

Managers and Product Owners need to give developers the time to learn and build these skills, and the time to think through design and to do proper code reviews.

Infosec needs to become more iterative and more agile, to move out front, so that they can understand changing risks and threats as developers adopt new platforms and new technologies (the Cloud, Mobile, IoT, …). So that they can help developers design and write tools and tests and templates instead of preparing checklists – to do what Intuit calls “Security as Code”.

DevOps isn’t making software more secure – not yet. But it could, if it changes the way that developers design and build software and the way that most of us think about security.

Wednesday, April 15, 2015

Backdoors, Sabotage or Just Plain Stupidity

Someone on your development team, or a contractor or a consultant, or one of your sys admins, or a bad guy who stole one of these people’s credentials, might have put a backdoor, a logic bomb, a Trojan or other “malcode” into your application code. And you don’t know it.

How much of a real problem is this? And how can you realistically protect your organization from this kind of threat?

The bad news is that it can be difficult to find malcode planted by a smart developer, especially in large legacy code bases. And it can be card to distinguish between intentionally bad code and mistakes.

The good news is that according to research by CERT’s Insider Threat Program less than 5% of insider attacks involve someone intentionally tampering with software: (for a fascinating account of real-world insider software attacks, check out this report from CERT). h Which means that most of us are in much greater danger from sloppy design and coding mistakes in our code and in the third party code that we use, than we are from intentional fraud or other actions by malicious insiders.

And the better news is that most of the work in catching and containing threats from malicious insiders is the same work that you need to do to catch and prevent security mistakes in coding. Whether it is sloppy/stupid or deliberate/evil, you look for the same things, for what Brenton Kohler at Cigital calls “red flags”:

  1. Stupid or small accidental or “accidental” mistakes in security code such as authentication and session management, access control, or in crypto or secrets handling
  2. Hard-coded URLs or IPs or other addresses, hard-coded user-ids and passwords or password hashes or keys in the code or in configuration. Potential backdoors for insiders, whether they were intended for support purposes or not, are also holes that could be exploited by attackers
  3. Test code or debugging code or diagnostics
  4. Embedded shell commands
  5. Hidden commands, hidden parameters and hidden options
  6. Logic mistakes in handling money (like penny shaving) or risk limits or managing credit card details, or in command or control functions, or critical network-facing code
  7. Mistakes in error handling or exception handling that could leave the system open
  8. Missing logging or missing audit functions, and gaps in sequence handling
  9. Code that that is overly tricky, or unclear or that just doesn’t make sense. A smart bad guy will probably take steps to obfuscate what they are trying to do, and anything that doesn’t make sense should raise red flags. Even if this code isn’t intentionally malicious, you don’t want it in your system
  10. Self-modifying code. See above.

Some of these issues can be found through static analysis. For example, Veracode explains how some common backdoors can be detected by scanning byte code.

But there are limits to what tools can find, as Mary Ann Davidson at Oracle, in a cranky blog post from 2014 points out:

"It is in fact, trivial, to come up with a “backdoor” that, if inserted into code, would not be detected by even the best static analysis tools. There was an experiment at Sandia Labs in which a backdoor was inserted into code and code reviewers told where in code to look for it. They could not find it – even knowing where to look."

If you’re lucky, you might find some of these problems through fuzzing, although it’s hard to fuzz code and interfaces that are intentionally hidden.

The only way that you can have confidence that your system is probably free of malcode – in the same way that you can have confidence that your code is probably free of security vulnerabilities and other bugs – is through disciplined and careful code reviews, by people who know what they are looking for. Which means that you have to review everything, or at least everything important: framework and especially security code, protocol libraries, code that handles confidential data or money, …

And to prevent programmers from colluding, you should rotate reviewers or assign them randomly, and spot check reviews to make sure that they are being done responsibly (that reviews are not just rubber stamps), as outlined in the DevOps Audit Defense Toolkit.

And if the stakes are high enough, you may also need eyes from outside on your code, like the Linux Foundations’s Core Infrastructure Initiative is doing, paying experts to do a detailed audit of OpenSSL, NTP and OpenSSH.

You also need to manage code from check-in through build and test to deployment, to ensure that you are actually deploying what you checked-in and built and tested, and that code has not been tampered with along the way. Carefully manage secrets and keys. Use checksums/signatures and change detection tools like OSSEC to watch out for unexpected or unauthorized changes to important configs and code.

This will help you to catch malicious insiders as well as honest mistakes, and attackers who have somehow compromised your network. The same goes for monitoring activity inside your network: watching out for suspect traffic to catch lateral movement should catch bad guys regardless of whether they came from the outside or the inside.

If and when you find something, the next problem is deciding if it is stupid/sloppy/irresponsible or malicious/intentional.

Cigital’s Kohler suggests that if you have serious reasons to fear insiders, you should rely on a small number of trusted people to do most of the review work, and that you try to keep what they are doing secret, so that bad developers don’t find out and try to hide their activity.

For the rest of us who are less paranoid, we can be transparent, shine a bright light on the problem from the start.

Make it clear to everyone that your customers, shareholders and regulators require that code must be written responsibly, and that everybody’s work will be checked.

Include strict terms in employment agreements and contracts for everyone who could touch code (including offshore developers and contractors and sys admins) which state that they will not under any circumstances insert any kind of time bomb, backdoor or trap door, Trojan, Easter Egg or any kind of malicious code into the system – and that doing so could result in severe civil penalties as well as possible criminal action.

Make it clear that all code and other changes will be reviewed for anything that could be malcode.

Train developers on secure coding and how to do secure code reviews so that they all know what to look for.

If everyone knows that malcode will not be tolerated, and that there is a serious and disciplined program in place to catch this kind of behavior, it is much less likely that someone will try to get away with it – and even less likely that they will be able to get away with it.

You can do this without destroying a culture of trust and openness. Looking out for malcode, like looking out for mistakes, simply becomes another part of your SDLC. Which is the way it should be.

Tuesday, April 7, 2015

Towards Compliance as Code

Infrastructure as Code is fundamental to DevOps. Automating the work of setting up and maintaining systems infrastructure. Making it defined, efficient, testable, auditable and standardized.

For the many of us who work in regulated environments, we need more. We need Compliance as Code.

Take regulatory constraints and policies and compliance procedures and the processes and constraints that they drive, and wire as much of this as possible into automated workflows and tests. Making it defined, efficient, testable, auditable and standardized.

DevOps Audit Defense Toolkit

Some big steps towards Compliance as Code are laid out in the Devops Audit Defense Toolkit, a freely-available document which explains how compliance requirements such as separation of duties between developers and operations, and detecting/preventing unauthorized changes, can be met in a DevOps environment, using some common, basic controls:

  1. Code Reviews. All code changes must be peer reviewed before check-in. Any changes to high-risk code must be reviewed a second time by an expert. Reviewers check code and tests for functional and operational correctness and consistency. They look for coding and design mistakes and gaps, operational dependencies, for back doors and for security vulnerabilities. Which means that developers must be trained and guided in how to do reviews properly. Peer reviews also ensure that changes can’t be pushed without at least one other person on the team understanding what is going on.
  2. Static analysis. Static analysis is run on all changes to catch security bugs and other problems. Any violations of coding rules will break the build.
  3. Automated testing is done in Continuous Integration/Continuous Delivery – unit and integration testing, and security testing. The Audit Toolkit assumes that developers follow TDD to ensure a high level of test coverage. All tests must pass.
  4. Traceability of all changes back to the original request, using a ticketing system like Jira (you can’t just use index cards on a wall to describe stories and throw them out when you are done).
  5. Operations checks/asserts after deployment and startup, and feedback from operations monitoring and especially from production failures. Metrics and post mortem review findings are used to drive improvements to testing and instrumentation, as well as deeper changes to policy definition, training and hiring – see John Allspaw’s presentation Ops Meta-Metrics: The Currency you use to pay for Change, from Velocity 2010, on how this can be done.
  6. All changes to code and infrastructure definitions, including bug fixes and patches, are deployed through the same automated, auditable Continuous Delivery pipeline.

A starting point

The DevOps Audit Defense Toolkit provides a starting point, an example to build on. You can add your own rules, checks, reviews, tests, and feedback loops.

It is also a work in progress. There are a few important problems still to be worked out:

Major Changes

The Audit Toolkit describes how standard changes can be handled in Continuous Delivery: small, well-defined, low-impact changes that are effectively pre-approved. Operations and management are notified as these changes are deployed (the changes are logged, information is displayed on screens and included in reports), but there is no upfront communication or coordination of these changes, because it shouldn’t be necessary. Developers can push changes out as soon as they are ready, and they get deployed immediately after all reviews and tests and other checks pass.

But the Audit Toolkit is silent on how to manage larger scale changes, including changes to data and databases, changes to interfaces with other systems, changes required to comply with new laws and regulations, major new customer features and technical upgrades. Changes that are harder to rollout, that have wider impact and higher risk, and require much more coordination. Which is, of course, the stuff that matters most.

You need clear and explicit hand-offs to operations and customer service for larger changes, so that all stakeholders understand the dependencies and risks and impact on how they work so that they can plan ahead. This can still be done in a DevOps way, but it does require meetings and planning, and some project management and paperwork. As an example, see how Etsy manages feature launches.

You also need to ensure that the policies for defining which changes are small enough and simple enough to be pre-approved, and for deciding which code changes are high risk and need additional review, are reasonable and unambiguous and consistent. You need to do frequent reviews to ensure that these policies are rigorously followed and that people don’t misunderstand or try to get away with pushing higher-risk, non-standard changes through without management/CAB oversight and explicit change approval.

Done properly, this means that the full weight of change control is only brought to bear when it is needed – for changes that have real operational or business risk. Then you want to find ways to minimize these risks, to break changes down into smaller pieces, to simplify, streamline and automate as much of the work required as possible, leveraging the same testing and delivery infrastructure.

Security testing

There is a lot of attention to responsible security testing in the Audit Toolkit. Because changes are made incrementally and iteratively, and pushed out automatically, you’ll need tools and tests that work automatically, incrementally and iteratively. Which is unfortunately not how most security tools work, and not how most security testing is done today.

There aren’t that many organizations using tools like Gauntlt or BDD-Security to write higher-level automated security tests and checks as part of Continuous Integration or Continuous Delivery. Most of us depend on dynamic and static scanners and fuzzers that can take hours to run and require manual review and attention, or expensive, time-consuming manual pen tests. This clearly can’t be done on every check-in.

But as more teams adopt Agile and now DevOps practices, the way that security testing is done is also changing, in order to keep up. Static analysis tools are getting speedier, and many tools can provide feedback directly to developers in their IDEs, or work against incremental change sets. Dynamic testing tools and services are becoming more scriptable and more scalable and simpler to use, with open APIs.

Interactive security testing tools like Contrast or Quotium Seeker can catch security errors at run-time as the system is being tested in Continuous Integration/Delivery. And companies like Signal Sciences are working on new ways to do agile security for online systems. But this is new ground: there’s still lots of digging and hoeing that needs to be done.

Do developers need access to production?

The Audit Toolkit assumes that developers will have read access to production logs, and that they may also need direct access to production in order to help with troubleshooting and support. Even if you restrict developers to read only access, this raises concerns around data privacy and confidentiality.

And what if read access is not enough? What if developers need to make a hot fix to code or configuration that can’t be done through the automated pipeline, or repair production data? Now you have problems with separation of duties and data integrity.

What should developers be able to do, what should they be able to see? And how can this be controlled and tracked? If you are allowing developers in production, you need to have solid answers for these questions.

Continuous Deployment or Continuous Delivery?

The Audit Toolkit makes the argument that with proper controls in place, developers should be able to push changes directly out to production when they are ready – provided that these changes are low-risk and only if the changes pass through all of the reviews and tests in the automated deployment pipeline.

But this is not something that you have to do or even can do – not because of compliance constraints necessarily, but because your business environment or your architecture won’t support making changes on the fly. Continuous Delivery does not have to mean Continuous Deployment. You can still follow disciplined Continuous Delivery through to pre-production, with all of the reviews and checks in place, and then bundle changes together and release them when it makes sense.

Selling to regulators and auditors

You will need to explain and sell this approach to regulators and auditors – to lawyers or wanna-be lawyers. Convincing them – and helping them – to look at code and logs instead of legal policies and checklists. Convincing them that it’s ok for developers to push low-risk, pre-approved changes to production, if you want to go this far.

Just as beauty is in the eye of the beholder, compliance is in the opinion of the auditor. They may not agree with or understand what you are doing. And even if one auditor does, the next one may not. Be prepared for a hard sell, and for set backs.

Disciplined, Agile and Lean

The DevOps Audit Defense Toolkit describes a disciplined, but Agile/Lean approach to managing software and system changes in a highly regulated environment.

This is definitely not easy. It’s not lightweight. It takes a lot of engineering discipline. And a lot of investment in automation and in management oversight to make it work.

But it’s still Agile. It supports the rapid pace and iterative, incremental way that development teams want to work today. And Lean. Because all of the work is clearly laid out and automated wherever possible. You can map the value chains and workflows, measure delays and optimize, review and improve.

Instead of detailed policies and procedures and checklists that nobody can be sure are actually being followed, you have automated delivery and deployment processes that you exercise all of the time, so you know they work. Policies and guidelines are used to drive decisions, which means that they can be simpler and clearer and more practical. Procedures and checklists are burned into automated steps and controls.

This could work. It should work. And it’s worth trying to make work. Instead of compliance theater and tedious and expensive overhead, it promises that changes to systems can be made simpler, more predictable, more efficient and safer. That’s something that’s worth doing.