Building Real Software

Thursday, March 13, 2014

Application Security – Can you Rely on the Honeymoon Effect?

I learned about some interesting research from Dave Mortman at this year’s RSA conference in San Francisco which supports the Devops and Agile arguments that continuous, incremental, iterative changes can be made safely: a study by by the MIT Lincoln lab (Milk or Wine: Does Software Security Improve with Age?) and The Honeymoon Effect, by Sandy Clark at the University of Pennsylvania

These studies show that most software vulnerabilities are foundational (introduced from start of development up to first release), the result of early decisions and not the result of later incremental changes. And there is a “honeymoon period” after software is released, before bad guys understand it well enough to find and exploit vulnerabilities. Which means the more often that you release software changes, the safer your system could be.

Understanding the Honeymoon Effect

Research on the honeymoon period, the time “after the release a software product (or version) and before the discovery of the first vulnerability” seems to show that finding security vulnerabilities is “primarily a function of familiarity with the system”. Software security vulnerabilities aren't like functional or reliability bugs which are mostly found soon after release, slowing down over time:

“…we would expect attackers (and legitimate security researchers) who are looking for bugs to exploit to have the easiest time of it early in the life cycle. This, after all, is when the software is most intrinsically weak, with the highest density of ”low hanging fruit” bugs still unpatched and vulnerable to attack. As time goes on, after all, the number of undiscovered bugs will only go down, and those that remain will presumably require increasing effort to find and exploit.

But our analysis of the rate of the discovery of exploitable bugs in widely-used commercial and open-source software, tells a very different story than what the conventional software engineering wisdom leads us to expect. In fact, new software overwhelmingly enjoys a honeymoon from attack for a period after it is released. The time between release and the first 0-day vulnerability in a given software release tends to be markedly longer than the interval between the first and second vulnerability discovered, which in turn tends to be longer than the time between the second and the third…”

It may take a while for attackers to find the first vulnerability, but then it gets progressively easier – because attackers use information from previous vulnerabilities to find the next ones, and because the more vulnerabilities they find, the more confident they are in their ability to find even more (there's blood in the water for a reason).

This means that software may actually be safest when it should be the weakest:

“when the software is at its weakest, with the ‘easiest’ exploitable vulnerabilities still unpatched, there is a lower risk that this will be discovered by an actual attacker on a given day than there will be after the vulnerability is fixed!”

Code Reuse Shortens your Honeymoon

Clark’s team also found that re-use of code shortens the honeymoon, because this code may already be known to attackers:

“legacy code resulting from code-reuse [whether copy-and-paste or using frameworks or common libraries] is a major contributor to both the rate of vulnerability discovery and the numbers of vulnerabilities found…

We determined that the standard practice of reusing code offers unexpected security challenges. The very fact that this software is mature means that there has been ample opportunity to study it in sufficient detail to turn vulnerabilities into exploits.”

In fact, reuse of code can lead to “less than Zero day” vulnerabilities – software that is already known to be vulnerable before your software is released.

Leveraging Open Source or frameworks and libraries and copying-and-pasting from code that is already working obviously saves times and reduces development costs, and helps developers to minimize technical risks, including security risks – it should be safer to use a special-purpose security library or the security features of your application framework than it is to try to solve security problems on your own. But this also brings along its own set of risks, especially the dangers of using popular software components with known vulnerabilities – software that attackers know and can easily exploit on a wide scale. This means that if you’re going to use Open Source (and just about everybody does today), then you need to put in proactive controls to track what code is being used and make sure that you keep it up to date.

Make the Honeymoon Last as Long as you can

One risk of Agile development and Devops is that security can’t keep up with the rapid pace of change - at least not the way that most organizations practice security today. But if you’re moving fast enough, the bad guys might not be able to keep up either. So speed can actually become a security advantage:

“Software that was changed more frequently had a significantly longer median honeymoon before the first vulnerability was discovered.”

The idea of constant change as protection is behind Shape Shifter, an interesting new technology which constantly changes attributes of web application code so that attackers, especially bots, can’t get a handle on how the system works or execute simple automated attacks.

But speed of change isn't enough by itself to protect you, especially since a lot changes that developers make don’t materially affect the Attack Surface of the application – the points in the system that an attacker can use to get into (or get data out of) an application. Changes like introducing a new API or file upload, or a new user type, or modifying the steps in a key business workflow like an account transfer function could make the system easier or harder to attack. But most minor changes to the UI or behind the scenes changes to analytics and reporting and operations functions don't factor in.

The honeymoon can’t last forever any ways: it could be as long as 3 years, or as short as 1 day. If you are stupid or reckless or make poor technology choices or bad design decisions it won’t take the bad guys that long to find the first vulnerability, regardless of how often you fiddle with the code, and it will only get worse from there. You still have to do a responsible job in design and development and testing, and carefully manage code reuse, especially use of Open Source code – whatever you can to make the honeymoon last as long as possible.

Wednesday, March 5, 2014

Appsec and Devops at RSA 2014

At last week’s RSA security conference in San Francisco the talk was about how the NSA is compromising the security of the Internet as well as violating individual privacy, the NSA and RSA (did RSA take the money or not), the NSA and Snowden and how we all have to take insider threats much more seriously, Apple’s SSL flaw (honest mistake, or plausibly deniable back door?) and how big data is rapidly eroding privacy and anonymity on the Internet anyways (with today’s – or at least tomorrow’s – analytic tools and enough Cloud resources, everyone’s traces can be found and put back together). The Cloud is now considered safe – or at least as safe as anywhere else. Mobile is not safe. Bitcoin is not safe. Critical infrastructure is not safe. Point of Sale systems are definitely not safe – and neither are cars or airplanes or anything in the Internet of Things.

I spent most of my time on Appsec and Devops security issues. There were some interesting threads that ran through these sessions:

Third Party Controls

FS ISAC’s recent paper on Appropriate Software Security Control Types for Third Party Service and Product Providers got a lot of play. It outlines a set of controls that organizations (especially enterprises) should require of their third party service providers – and that anyone selling software or SaaS to large organizations should be prepared to meet:

vBSIMM - a subset of BSIMM to evaluate a provider’s secure development practices
Open Source Software controls – it’s not enough that an organization needs to be responsible for the security of the software that they write, they also need to be responsible for the security of any software that they use to build their systems, especially Open Source software. Sonatype has done an especially good job of highlighting the risk of using Open Source software, with scary data like “90% of a typical application is made up of Open Source components”, “2/3 of developers don’t know what components they use” and “more than 50% of the Global 500 use vulnerable Open Source components”.
Binary Static Analysis testing – it’s not enough that customers ask how a provider secures their software, they should also ask for evidence, by demanding independent static analysis testing of the software using Veracode or HP Fortify on Demand.

Threat Modeling

There was also a lot of talk about threat modeling in secure development: why we should do it, how we could do it, why we aren’t doing it.

In an analysis of results from the latest BSIMM study on software security programs, Gary Mcgraw at Cigital emphasized the importance of threat modeling / architectural risk analysis in secure development. However he also pointed out that it does not scale. While 56/67 of firms in the BSIMM study who have application security programs in place conduct threat modeling, they limit this to only security features.

Jim Routh, now CISO at Aetna (and who lead the FS ISAC working group above while he was at JPMC), has implemented successful secure software development programs in 4 different organizations, but admitted that he failed at injecting threat modeling into secure development in all of these cases, because of how hard it is to get designers to understand the necessary tradeoff decisions.

Adam Shostack at Microsoft outlined a pragmatic and understandable approach to threat modeling. Ask 4 simple questions:

What are you building?
What can go wrong?
What are you going to do about the things that can go wrong?
Did you do a good job of 1-3?

Asking developers to “think like an attacker” is like asking developers to “think like a professional chef” or “think like an astronaut”. They don’t know how. They still need tools to help them. These tools (like STRIDE, Mitre’s CAPEC, and attack trees) are described in detail in his new book Threat Modeling: Designing for Security which at over 600 pages is unfortunately so detailed that few developers will get around to actually reading it.

Devops, Unicorns and Rainbows

A panel on Devops/Security Myths Debunked looked at security and Devops, but didn’t offer anything new. The same people and same stories, the same Unicorns and rainbows: n thousand of deploys per day at Amazon, developer self-service static analysis at twitter, Netflix’s Chaos Monkey… If Devops is taking the IT operations world by storm, there should be more leaders by now, with new success stories outside of big Cloud providers or social media.

The Devops story when it comes to security is simple. More frequent, smaller changes are better than less frequent big changes because small changes are less likely to cause problems, are easier to understand and manage and test, easier to test, and easier to debug, fix or roll back if something goes wrong. To deploy more often you need standardized, automated deployment – which is not just faster but also simpler and safer and easier to audit, and which makes it faster and simpler and safer to push out patches when you need to. Developers pushing changes to production does not mean developers signing on in production and making changes to production systems (see automated deployment above).

Dave Mortman tried to make it clear that Devops isn’t about, or shouldn’t only be about, how many times a day you can deploy changes: it should really about getting people talking together and solving problems together. But most of the discussion came back to speed of development, reducing cycle time, cutting deployment times in half – speed is what is mostly driving Devops today. And as I have pointed out before, infosec (the tools, practices, ways of thinking) is not ready to keep up with the speed of Agile and Devops.

Mobile Insecurity

The state of mobile security has definitely not improved over the past couple of years. One embarrassing example: the RSA conference mobile app was hacked multiple times by security researchers in the days leading up to and during the conference.

Dan Cornell at Denim Group presented some research based on work that his firm has done on enterprise mobile application security assessments. He made it clear that mobile security testing has to cover much more than just the code on the phone itself. Only 25% of serious vulnerabilities were found in code on the mobile device. More than 70% were in server code, in the web services that the mobile app called (web services that “had never seen adversarial traffic”), and the remaining 5% were in third party services. So it’s not the boutique of kids who your marketing department hired to write the pretty mobile front end – it’s the enterprise developers writing web services that are your biggest security problem.

Denim Group found more than half of serious vulnerabilities using automated testing tools (58% vs 42% found in manual pen testing and reviews). Dynamic analysis (DAST) tools are still much less effective in testing mobile than for web apps, which means that you need to rely heavily on static analysis testing. Unfortunately, static analysis tools are extremely inefficient - for every serious vulnerability found, they also report hundreds of false positives and other unimportant noise. We need better and more accurate tools, especially for fuzzing web services.

Tech, Toys and Models

Speaking of tools... The technology expo was twice the size of previous years. Lots of enterprise network security and vulnerability management and incident containment (whatever that is) solutions, governance tools, and application firewalls and NG firewalls and advanced NNG firewalls, vulnerability scanners, DDOS protection services, endpoint solutions, a few consultancies and some training companies. A boxing ring, lots of games, coffee bars and cocktail bars, magicians, a few characters dressed up like Vikings, some slutty soccer players – the usual RSA expo experience, just more of it.

Except for log file analysis tools (Splunk and LogRhythm) there was no devops tech at the tradeshow. There were lots of booths showing off Appsec testing tools (including all of the major static analysis players) and a couple of interesting new things:

Shape Shifter from startup Shape Security, is a new kind of application firewall technology for web apps: “a polymorphic bot wall” that dynamically replaces attributes of HTML or Javascript with random strings to deter automated attacks. Definitely cool, it will interesting to see how effective it is.

Denim Group showed off a major upgrade to ThreadFix, a tool that maps software vulnerabilities found through different testing methods including automated dynamic analysis, static analysis and manual testing. This gives you the same kind of vulnerability management capabilities that you get with enterprise appsec testing suites: hybrid mapping of the attack surface with drill down to line of code, tracking metrics across applications and portfolios, integration with different bug tracking systems and the ability to create virtual patches for WAFs, but in a cross-vendor way. You can correlate findings from HP Fortify static analysis testing and dynamic testing results from IBM Appscan, or from a wide range of other tools and security testing platforms, including Open Source tools like Findbugs or Brakeman or ZAP and Arachni and Skipfish. And ThreadFix is Open Source itself (available for free or in a supported enterprise version).

The best free giveway at the conference was viaProtect an app from mobile security firm viaForensics that lets you check the security status of your iOS or Android device. Although it is just out of beta and still a bit wobbly, it provides a useful summary of the security status of your mobile devices, and if you choose to register, you can see track the status of multiple devices on a web portal.

Another year, another RSA

This was my third time at RSA. What has changed over that time? Compliance, not fear of being hacked, is still driving infosec: “attackers might show up; auditors definitely will”. But most organizations have accepted that they will be hacked soon, or have been hacked but they just haven’t found out yet, and are trying to better be prepared for when the shit hits the fan. More money is being poured into technology and training and consulting, but it’s not clear what’s making a real difference. It’s not money or management that’s holding many organizations back from improving their security programs – it’s a shortage of infosec skills.

With a few exceptions, the general quality of presentations wasn't as good as previous years – disappointing and surprising given how hard it is to get a paper accepted at RSA. There wasn't much that I had not already heard at other conferences like OWASP Appsec or couldn't catch in a free vendor-sponsored webinar. The expert talks that were worth going to were often over-subscribed. But at least the closing keynote was a good one: Stephen Colbert was funny, provocative and surprisingly well-informed about the NSA and related topics. And the the parties – if getting drunk with security wonks and auditors is your idea of crazy fun, nothing, but nothing, beats RSA.

Monday, February 3, 2014

Data Privacy and Security in ThoughtWorks Radar, Sort of

Once or twice a year the thought leaders at ThoughtWorks, including their Chief Scientist and book writer Martin Fowler, get together and put together a Radar report listing software development techniques and technologies (tools, platforms, languages and frameworks) that they think are interesting, and that they think other developers should be interested in too. Unlike analyses from Gartner, the Radar only includes things that ThoughtWorks teams have actually tried and seen work, or tried and seen not work, or are trying and think might work.

The Radar is always a good read, a way to keep up with the latest fashions, and is an especially good resource on practices and tools for mobile, Web and Cloud development projects, and Open Source tools and platforms for automated testing and build and deployment.

ThoughtWorks was a pioneer in continuous build and Continuous Integration, and in devops: ideas and tools for Continuous Deployment and Continuous Delivery have been included in the Radar going back to 2009, and ThoughtWorks has an entire practice built around Continuous Delivery.

And now, maybe because they were shamed into this by Matt Konda at Jemurai Security, ThoughtWorks have included data privacy and application security in the latest Radar, although in an unfortunately obscure and limited way.

Data Privacy – Assess Datensparsamkeit

There are four rings in the ThoughtWorks Radar:

Adopt (ThoughtWorks feels strongly that everyone should be doing this)
Trial (worth pursuing, but maybe start off carefully)
Assess (try it out, it might work, at least learn something about it)
Hold (proceed with caution – i.e., you should probably not do/use this, or if you are doing/using this you should probably stop doing/using this)

Concerns for Data Privacy were added to the Jan 2014 Radar. The idea is sound:

“only store as much personal information as is absolutely required for the business or applicable laws… If you never store the information, you do not need to worry about someone stealing it.”

But the way it was presented was unfortunate. Data privacy was added as a Radar blip in the early stage “Assess” try-it-out ring, and with a cute but obscure label (“Datensparsamkeit”) taken from German privacy legislation.

This is a recognized good practice, demanded by many regulations. Why is this in “Assess”, and why is it hidden under a German name?

Application Security – Hold Ignoring OWASP Top 10

ThoughtWorks has now recognized that security is important:

“Barely a week goes by without the IT industry being embarrassed by yet another high profile loss of data, leak of passwords, or breach of a supposedly secure system.”

The way that this report works, people should stop doing what is in the “Hold” ring, and focus most of their attention on what is in the “Adopt” ring because these are proven, key technologies and practices that are wroth following. Instead of asking developers to Adopt secure design and development practices, they've added security as a “first-class concern during software construction” by putting “Ignoring OWASP Top 10” in the Hold ring.

Like “Assess Datensparsamkeit”, “Hold Ignoring OWASP Top 10” won’t make a lot of sense to most developers, unless they take extra time to read and understand more on their own.

Oh Well, at least this is something for now

Although this could have been done in a much more understandable and straightforward way, at least this Radar shows that ThoughtWorks is actively thinking about security and privacy in their projects, and that they think that other developers should too. The ThoughtWorks Radar will reach a different (and probably bigger) audience than most software security-focused publications, including developers who have never heard of the OWASP Top 10 or Datensparsamkeit, so this is a good thing.

All of this is likely to be temporary, however, because of the attention-deficit way that the Radar works. ThoughtWorks only lists things that they currently find interesting in each report. A few practices and technologies stay on the Radar for a while as they move from Assess to Trial to Adopt (if they prove to be key) or Hold (if they don’t work out), and because they are fundamental to the way that ThoughtWorks teams work (like evolutionary architecture and continuous build and automated testing). But most ideas and tools drop off the Radar often and quickly, as ThoughtWorkers move on to the next shiny new thing.

So, for the moment at least, security and privacy will get some extra attention from ThoughtWorks and the developers that they influence.

Past Radars

If you are interested in following the changing ideas, cool tools and recent fashions in software development hilighted in the Radar, here are links going back to 2009:

Jan 2014 (the Radar discussed in this post)

Thursday, January 30, 2014

Small Projects and Big Programs

The Standish Group’s CHAOS 2013 Report has some interesting things to say about what is driving software development project success today. More projects are succeeding (39% in 2012, up from 29% in 2004), mostly because projects are getting smaller (which also means more projects done using Agile, since small projects are the sweet spot for Agile):

“Very few large projects perform well to the project management triple constraints of cost, time, and scope. In contrast to small projects, which have more than a 70% chance of success, a large project has virtually no chance of coming in on time, on budget, and within scope… Large projects have twice the chance of being late, over budget, and missing critical features…. A large project is more than 10 times more likely to fail outright, meaning it will be cancelled or will not be used because it outlived its useful life prior to implementation.”

So don’t run large projects. Of course it’s not that simple: many problems, especially in enterprises, are much too big to be solved by small teams in small projects. Standish Group says that you can get around this if you “Think Big, Act Small”:

“…there is no need for large projects… any IT project can be broken into a series of small projects that could also be done in parallel if necessary.”

Anything that can be done in one big project can obviously be done in a bunch of smaller projects. You can make project management simple – by pushing the hard management problems and risks up to the program level.

Program Management isn't Project Management

Understanding and orchestrating work across multiple projects isn’t as simple as breaking a big project down into small projects and running them independently. Managing programs, managing horizontally across projects, is different than managing projects. There are different risks, different problems to be solved. It requires different skills and strengths, and a different approach.

PMI recognizes this, and has a separate certification for Program Managers (PgMP). Program management is more strategic than project management. Program Managers are not just concerned with horizontal and cross-project issues, coordinating and managing interdependencies between projects – managing at scale. They are also responsible for understanding and helping to achieve business goals, for managing organizational risks and political risks, and they have to take care of financing and contracts and governance: things that Agile coaches running small projects don’t have to worry much about (and that Agile methods don’t help with).

Agile and Program Management

The lightweight tools and practices that you use to successfully coach an Agile team won’t scale to managing a program. Program management needs all of those things that traditional, plan-driven project management is built on. More upfront planning to build a top-down roadmap for all of the teams to share: project teams can’t be completely free to prioritize work and come up with new ideas on the fly, because they have to coordinate handoffs and dependencies. Architecture and technology strategy. More reporting. Working with the PMO. More management roles and more management. More people to manage. Which = more politics.

Johanna Rothman talks a little bit about program management in her book Managing Your Project Portfolio, and has put up a series of blog posts on Agile and Lean Program Management as work in progress for another book she is writing on program management and Agile.

Rothman looks at how to solve problems of organization in programs using the Scrum of Scrums hierarchy (teams hold standups, then Scrum Masters have their own standups together every day to deal with cross-program issues). Because this approach doesn't scale to handle coordination and decision making and problem solving in larger programs, she recommends building loose networks between projects and teams using Communities of Practice (a simple functional matrix in which architects, testers, analysts, and especially the Product Owners in each team coordinate continuously with each other across teams).

Rothman also looks at the problems of coordinating work on the backlog between teams, and evolving architecture, and how Program Managers need to be Servant Leaders and not care what teams do or how they do it, only about the results.

Rothman believes that Program Managers should establish and maintain momentum from the beginning of the program. Rather than taking time upfront to initiate and plan (because, who actually needs to plan a large, complex program?!), get people learning how to work together from the start. Release early, release often, and keep work in progress to a minimum – the larger the program, the less work in progress you should have. Finally she describes some tools that you could use to track and report progress and provide insight into a program’s overall status, and explains how and why they need to be different than the tools used for projects.

There are some ideas here that make sense and would probably work, and some others that don’t - like skipping planning.

Get Serious about Program Management

A more credible and much more comprehensive approach for managing large programs in large organizations would be one of the heavyweight enterprise Agile hybrids: the Scaled Agile Framework (SAFe) or Disciplined Agile Delivery which take Agile ideas and practices and envelop them inside a structured, top-down governance-heavy process/project/program/portfolio management framework based on the Rational Unified Process. But now you’re not trying to manage and coordinate small, simple Agile projects any more, you’re doing something quite different, and much more expensive.

The most coherent and practical framework I have seen for managing programs is laid out in Executing Complex Programs, a course offered by Stanford University, as part of its Advanced Project Management professional development certificate.

This framework covers how to manage distributed cross-functional and cross-organizational teams in global environments; managing organizational and political and logistical and financial risks; and modeling and understanding and coordinating the different kinds of interdependencies and interfaces between projects and teams (shared constraints and resources, APIs and shared data, hand-offs and milestones and drop-dead dates, experts and specialists…) in large, complex programs. The course explores case studies using different approaches, some Agile, some not, some in high reliability / safety critical and regulated environments. This should give you everything that you need to manage a program effectively.

You can and should make projects simpler and smaller – which means that you’ll have to do more program management. But don’t try to get by at the program level with improvising and iterating and leveraging the same simple techniques that have worked with your teams. Nothing serious gets done outside of programs. So take program management seriously.

Thursday, January 23, 2014

Can you Learn and Improve without Agile Retrospectives? Of course you can…

Retrospectives – bringing the team together on a regular basis to examine how they are working and identify where and how they can improve – are an important part of Agile development.

Scrum and “Inspect and Adapt”

So important that Schwaber and Sutherland burned retrospectives into Scrum at the end of every Sprint, to make sure that teams will continuously Inspect and Adapt their way to more effective and efficient ways of working.

End-of-Sprint retrospectives are now commonly accepted as the right way to do things, and are one of the more commonly followed practices in Agile development. VersionOne’s latest State of Agile Development survey says that 72% of Agile teams are doing retrospectives.

Good Retrospectives are Hard Work

Good retrospectives are a lot of work.

For the leader/Coach/Scrum Master who needs to sell them to the team – and to management – and build a safe and respectful environment to hold the meetings and guide everyone through the process properly.

For the team, who need to take the time to learn and understand together and act on what they've learned and then follow-up and actually get better at how they work.

So hard that there several books written just on how to do retrospectives,(Agile Retrospectives: Making Good Teams Great, The Retrospective Handbook, Getting Value out of Agile Retrospectives), as well as several chapters written about retrospectives in other books on Agile, and retrospective websites (including one just on how to make retrospectives fun) and a wiki and at least one prime directive for running retrospectives, and dozens of blog posts with suggestions and coaching tips and alternative meeting formats and collaborative games and tools and techniques to help teams and coaches through the process, to energize retrospectives or re-energize them when teams lose momentum and focus.

Questioning the need for Retrospectives

Because retrospectives are so much work, some people have questioned how useful running retrospectives each Sprint really is, whether they can get by without a retrospective every time, or maybe without doing them at all.

There are good and bad reasons for teams to skip – or at least want to skip – retrospectives.

Because not everyone works in a safe environment where people trust and respect each other, so retrospectives can be dangerous and alienating, a forum for finger pointing and blame and egoism.

Because they don’t result in meaningful change, because the team doesn’t act on what they find – or aren’t given a chance to – and so the meetings become a frustrating and pointless waste of time, rehashing the same problems again and again.

Because the real problems that they need to solve in order to succeed are larger problems that they don’t have the authority or ability to do anything about, and so the meetings become a frustrating and pointless waste of time….

Because the team is under severe time pressure, they have to deliver now or there may not be a chance to get better in the future.

Because the team is working well together, they've “inspected and adapted” their way to good practices and don’t have any serious problems that have to be fixed or initiatives that are worth spending a lot of extra time and energy on, at least for now. They could keep on trying to look for ways to get even better, or they could spend that time getting more work done.

Inspecting and Adapting – without Regular Retrospectives

Regular, frequent retrospectives can be useful – especially when you are first starting off in a new team on a new project. But once the team has learned how to learn, the value that they can get from retrospectives will decline.

This is especially the case for teams working in rapid cycles, short Sprints every 2 weeks or every week or sometimes every few days. As the Sprints get shorter, the meetings need to be shorter too, which doesn’t leave enough time to really review and reflect. And there’s not enough time to make any meaningful changes before the next retrospective comes up again.

At some point it makes good sense to stop and try something different. Are there other ways to learn and improve that work as well, or better than regular team retrospective meetings?

XP and Continuous Feedback

Retrospectives were not part of Extreme Programming as Kent Beck et al defined it (in either the first or second edition).

XP teams are supposed to follow good engineering (at least coding and testing) practices and work together in an intelligent way from the beginning – it should be enough to follow the rules of XP, and fix things when they are broken.

XP relies on built-in feedback loops: TDD, Continuous Integration and continuous testing, pair programming, frequently delivering small releases of software for review. The team is expected to learn from all of this feedback, and improve as they go. If tests fail, or they get negative feedback from the Customer, or find other problems, they need to understand what went wrong, why, and correct it.

Devops and Continuous Delivery/Deployment

Delivering software frequently, or continuously, to production pushes this one step further. If you are delivering working software to real customers on a regular basis, you don’t need to ask the team to reflect internally, to introspect – your customers will tell you if you are doing a good job, and where you need to improve:

Are you delivering what customers need and want? Is it usable? Do they like it?

Is the software quality good – or at least good enough?

Are you delivering fast enough?

By understanding and acting on this feedback, the team will improve in ways that make a real difference.

Root Cause Analysis

If and when something seriously goes wrong in testing or production or within the team, call everyone together for an in depth review and carefully step through Root Cause Analysis to understand what happened, why, what you need to change to prevent problems like this from happening again, and put together a realistic plan to get better.

Reviews like this, where the team works together to confront serious problems in a serious way and genuinely understand them and commit to fixing them, are much more important than a superficial 2-hour meeting every couple of weeks. These can be – and often are – make or break situations. Handled properly, this can pull teams together and make them much stronger. Never waste a crisis.

Kanban and Micro-Optimization

Teams following Kanban are constantly learning and improving.

By making work visible and setting work limits, they can immediately detect delays and bottlenecks, then get together and correct them. This micro-optimization at the task level, always tuning and fixing problems as they come up, might seem superficial, but the results are immediate (recognizing and correcting problems as soon as they come up makes more sense than waiting until the next scheduled meeting), and small improvements are all that many teams are actually able to make anyways.

Take advantage of audits and reviews

In large organizations and highly regulated environments, audits and other reviews (for example security penetration tests) are a fact of life. Instead of trying to get through them with the least amount of effort and time wasted, use them as valuable learning opportunities. Build on what the auditors or reviewers ask for and what they find. If they find something seriously missing or wrong, treat it as a serious problem, understand it and correct it at the source.

Moving Beyond Retrospectives

There are other ways to keep learning and improving, other ways to get useful feedback, ways that can be as effective or more effective and less expensive than frequent retrospectives, from continuous checking and tuning to deep dives if something goes wrong.

You can always schedule regular retrospective meetings if the circumstances demand it: if quality or velocity start to slide noticeably, or conflicts arise in the team, or if key people leave, or there’s been some other kind of shock, a sudden change in direction or priorities that requires everyone to work in a much different way, and start learning all over again.

But don’t tie people down and force them to go through a boring, time-wasting exercise because it’s the “right way to do Agile”, or turn retrospectives into a circus because it’s the only way you can keep people engaged. Find other, better ways to keep learning and improving.

Thursday, January 16, 2014

How much can Testers help in Appsec?

It’s not clear how much of a role QA – which in most organizations means black box testers who do manual functional testing or write automated functional acceptance tests – can or should play in an Application Security program.

Train QA, not Developers, on Security

At RSA 2011, Caleb Sima asserted that training developers in Appsec is mostly a waste of time (“Don’t Teach Developers Security”).

Because most developers won’t get it; they don’t have the time to worry about security even if they do get it; and the rate of turnover in most development teams is too high so if you train them, they are not likely to be around for long enough to make much of a difference.

Sima suggests to start with QA instead. Because testers are paid to find where things break, and Appsec gives them more broken things to find.

Instead of putting a test team through general Appsec training, he recommends taking a more targeted, incremental approach.

Start with a security scan or pen test. Have a security expert review the results and identify the 1 or 2 highest risk types of vulnerabilities found, problems like SQL injection or XSS.

Then get that security expert to train the testers on what these bugs are about and how to find them, and help them to explain the bugs to development. Developers will also learn something about security by working through these bugs. When all of the highest priority bugs are fixed, then train the test team on the next couple of important vulnerabilities, and keep going.

Unfortunately, this won't work...

This approach is flawed in a couple of important ways.

First, it doesn’t address the root cause of software security problems: developers making security mistakes when designing and writing software. It’s a short-term bandage.

And in the short term, there is a fundamental problem with asking the QA team to take a leadership role in application security: most testers don’t understand security, even after training.

A recent study by Denim Group assessed how much developers and QA understood about application security before and after they got training. Only 22% of testers passed a basic application security test after finishing security training.

Testing is not the same as Pen Testing

This is disappointing, but not surprising. A few hours or even a few days, of security training can’t make a black box functional QA tester into an application security pen tester. Nick Coblentz points out that some stars will emerge from security training. Some testers, like some developers, will “get” the White Hat/Black Hat stuff and fall in love with it, and make the investment in time to really get good at it. However, these people probably won’t stay in testing any ways – there's too much demand for talented Appsec specialists today.

But most testers won’t get good at it. Because it’s not their job. Because there are too many technical details to understand about the architecture and platform and languages for people who are just as likely to have a degree in Art History as Computer Science, who are often inexperienced, and already over worked. And these details are important – in Appsec, making even small mistakes, and missing small mistakes, matters.

Cigital has spent a lot of time helping setup Appsec programs at different companies and studying what works for these companies. They have found that:

Involving QA in software security is non-trivial... Even the "simple" black box Web testing tools are too hard to use.

In order to scale to address the sheer magnitude of the software security problem we've created for ourselves, the QA department has to be part of the solution. The challenge is to get QA to understand security and the all-important attackers' perspective. One sneaky trick to solving this problem is to encapsulate the attackers' perspective in automated tools that can be used by QA. What we learned is that even today's Web application testing tools (badness-ometers of the first order) remain too difficult to use for testers who spend most of their time verifying functional requirements…

Software [In]security: Software Security Top 10 Surprises

But there’s more to Security Testing than Pen Testing

There’s more to security testing than pen testing and running black box scans. So Appsec training can still add value even if it can’t make QA testers into pen testers.

Appsec training can help testers to do a better job of testing security features and the system’s privacy and compliance requirements: making sure that user setup and login and password management work correctly, checking that access control rules are applied consistently, reviewing audit and log files to make sure that activities are properly recorded, and tracing where private and confidential data are used and displayed and stored.

And Appsec training can give testers a license to test the system in a different way.

Most testers spend most of their time verifying correctness: walking through test matrices and manual checklists, writing automated functional tests, focused on test coverage and making sure that the code conforms to the specification, or watching out for regressions when the code is changed. A lot of this is CYA verification. It has to be done, but it is expensive and a poor use of people and time. You won’t find a lot of serious bugs this way unless the programmers are doing a sloppy job. As more development teams adopt practices like TDD, where developers are responsible for testing their own code, having testers doing this kind of manual verification and regression will become less useful and less common.

This kind of testing is not useful at all in security, outside of verifying security features. You can’t prove that a system is secure, that it isn’t vulnerable to injection attacks or privilege escalation or other attacks by running some positive tests. You need to do negative testing until you are satisfied that the risks of a successful exploit are low. You still won’t know that the system is secure, only that it appears to be “secure enough”.

Stepping off of the Happy Path

It’s when testers step off of the tightly-scripted happy path and explore how the system works that things get more interesting. Doing what ifs. Testing boundary conditions. Trying things that aren’t in the specification. Trying to break things, and watching what happens when they break.

Testing high-risk business logic like online shopping or online trading or online banking functions in real-world scenarios, testing in pairs or teams to check for timing errors or TOC/TOU problems or other race conditions or locking problems, injecting errors to see what happens if the system fails half way through a transaction, interrupting the workflow by going back to previous steps again or trying to skip the next step, repeating steps two or three or more times. Entering negative amounts or really big numbers or invalid account numbers. Watching for information leaks in error messages. Acting unpredictably.

This is the kind of testing that can be done better by a QA tester who understands the domain and how the system works than by a pen tester on a short-term engagement. As long as they are willing to do a little hacking.

It shouldn’t take a lot of training, or add a lot to the cost of testing, to get testers doing some exploratory and negative testing in the high-risk areas of an application. A lot of important security bugs (and functional bugs and usability problems) can be found testing this way – bugs that can’t be found by walking through test checklists, or by running vulnerability scanners and dynamic analysis tools. Application Security training should reinforce to testers – and developers, and managers – how important it is to do this kind of testing, and that the bugs found this way are important to fix.

Moving from functional testing of security features to edge and boundary condition testing and “adversarial testing” is the first major step that QA teams need to take in playing a role in Application Security, according to Cigital’s Build Security in Maturity Model: From there, some QA teams may go on to integrate black box security testing tools, and possibly other more advanced security testing tools and practices.

The real value of Security Testing

But even if you can farm some security testing out to QA, you’ll still need to rely on security experts. You need someone who really understands the tools and the technical issues, who has spent a lot of time hacking, who understands security risks and who is paid to keep up with the changing threat landscape. Someone who is at least as good or better than whoever you expect to be attacking you.

This might mean relying on full-time security experts in your own organization, or contracting outside consultants to do pen tests and other reviews, or taking advantage of third party on-demand security testing platforms from companies like WhiteHat, Qualys, Veracode, HP and IBM.

The important thing is to take the results of all of this testing – whether it’s done by QA or a pen tester or by a third party testing service – and act on it.

Developers – and managers – need to understand what these bugs mean and why they need to be fixed and how to fix them properly, and more importantly how to prevent them from happening in the future. The real point of testing is to provide information back to development and management, about risks, about where the design looks weak, or where there are holes in the SDLC that need to be plugged. Until we can move past test-then-fix-then-test-then-fix… to stopping security problems upfront, we aren’t accomplishing anything important. Which means that testers and developers and managers all have to understand security a lot better than they do today. So teach developers security. And testers. And managers too.

Thursday, January 9, 2014

Developers working in Production. Of course! Maybe, sometimes. What, are you nuts?

One of the basic ideas in Devops is that developers and operations should share responsibility for designing systems, for implementing them and keeping them running. Developers should be on call in case something goes wrong, and be the one to fix whatever breaks. Because the person who wrote the code is often the only one who knows how it really works. And because of the moral hazard argument: if programmers are held fully accountable for the work that they do, they will be incented to do a better job, instead of writing garbage and handing it off to somebody else.

But this means that developers need some kind of access to production. How much access developers need, how often, and how this can be made safe, are important questions that have to be answered.

Hire wicked smart people and give them all access to root.
Unnamed devops evangelist, Is Devops Subversive?

If you ask whether developers should have access to production you’ll find that people fall into one of 3 camps:

Yeah, sure, of course – who else is going to support the system?

This is a simple decision for online startups, where there’s often nobody else to install, configure and support the application any ways.

As these organizations grow, developers often continue to stay closely involved in deployment, support and application operations, and in some cases, still play a primary role, especially in shops that heavily leverage cloud infrastructure (think Netflix).

Read my lips: Never Ever! Are you out of your freakin’ mind?

Question: Should developers have access to production?

Answer: Not only no, but hell no.

kperrier, Slashdot: Should Developers Have Access to Production

The situation is much different in large enterprises and government organizations, where walls have been built up between development and operations for many different reasons. It’s not just mergers and acquisitions and inertia and internal politics and protectionism that made this happen. It’s also SOX and PCI and HIPAA and GLBA and other overlapping regulations and privacy rules, and ITIL and COBIT and ISOxxx and CMMI and other IT governance frameworks, and internal and external auditors enforcing separation of duties and need-to-know access limitations in order to ensure the integrity and confidentiality of system data.

The same rules also apply to leaner-and-meaner Devops shops. For example, at Etsy (a Devops leader), PCI DSS compliant functions are managed and supported by a different team in a different way from the rest of their online systems: while developers have R/O access to a lot of production “data porn” (metrics and graphs and logs), they do not have access to production databases; there are more requirements for activity logging; a push to QA is handled in a clearly different way than a push to production; and all changes to production must be tracked and approved through a ticketing system.

And there’s also the problem of shared infrastructure: the same networks and servers and databases and other parts of the stack may be used by many different applications and different business units. Developers of course only understand the applications that they are working on and are only familiar with the simplified test configurations that they use day-to-day – they may not know about other systems and their shared dependencies, and could easily make changes that break these systems without being aware of the risks.

In case of emergency, break glass

Most organizations fall somewhere in between a Noops web startup in the cloud and a legacy-bound enterprise weighted down by too much governance and management politics. Operations is usually run separately, management is still accountable to regulators and auditors, but most people understand and recognize the need for developers to help out, especially when something goes wrong.

When the shit has indeed and truly hit the fan, developers – although usually only senior developers or team leads – are brought in to help troubleshoot and recover. Their access is temporary, maybe using a “fire id” extracted from a vault, then locked down again as soon as they are done. Developers are often paired up with an operations buddy who does most of the driving, or at least watches them carefully when the developer has to take the wheel.

Question: Should developers have access to production?

Answer: Everyone agrees that developers should never have access to production… Unless they’re the developer, in which case it’s different.

SatanicPuppy, Slashdot: Should Developers Have Access to Production

Problems in production can be fixed much faster if developers can see the logs, stack traces and core dumps and look at production data when something goes wrong. Giving at least some developers read access to production logs and alerts and monitors – enough to recognize that something has gone wrong and to figure out what needs to be fixed – makes sense.

Sometimes really bad things happen and all that matters is getting the system back and up and running as quickly as possible. You want the best people you can find working on the problem, and this includes developers. You’ll need their help with diagnosis and deciding what options are safest to take for roll back or roll forward, putting in an emergency fix or workaround, and data repair and reconciliation. Everyone will need to check later to make sure that any temporary fixes or workarounds are implemented properly, checked-in and redeployed.

When you run incident management fire drills, make sure that developers are included. And developers should also be included in incident postmortem reviews, even if they weren't part of the incident management team, because this is an important opportunity to learn more about the system and to improve it. But if you have developers firefighting in production more than almost never, then you’re doing almost everything wrong.

Debugging in production?

Some problems, intermittent failures and timing-related problems and heisenbugs, only happen in production and can’t be reproduced in test – or at least not without a lot of time, expense and luck. To debug these problems a developer may need to examine the run-time state of the system when the problem happens. But these problems should be the exception, not the rule. Debugging in production opens up security problems (exposing private data in memory) and run-time risks that developers and Ops both need to be aware of.

Question: Should developers have access to production?

Answer: Whenever an error occurs that I can’t replicate in a dev environment, I'm always SO tempted to hop into prod and start adding in some output statements... Yeah, it’s probably a good thing I don’t have access to prod.

Enderjsy, Slashdot: Should Developers Have Access to Production

Deploying to production?

Auditors will tell you that the people who write the code cannot be the same people who deploy it in production. But some developers will tell you that they need to take care of deployment, because Ops won’t understand all the steps involved, or at least that they need to manually check that all of the config changes were made correctly, and to run the data conversion and check that it worked, and to make sure that the right code was installed in the right places. If this is how your deployment is done, you’re doing it wrong.

And you’re doing a lot of things wrong if Ops won’t trust development enough to push changes out at all:

Most times, when I see devs screwing with production it's either a "hero" coder who is way too good to use best practices, or a situation in which the environment is so hostile that the "best" solution seems to be breaking the rules.

I once did some contract work for a company where the QA and testing process took a minimum of two weeks for the most trivial changes, and where the admins on the production servers refused to deploy things like security patches without a testing period that ran close to a month. The devs there had a hundred tricks for sneaking their code into production, and linking production code to the development servers in an attempt to meet their productivity goals.

SatanicPuppy, Slashdot: Should Developers Have Access to Production

Testing in production?

The only testing that has to be done in production is A/B split testing to see what features customers like or don’t like. You should not need to test in production to see if something works – that’s what test environments are for – except maybe when you are deploying and launching a system for the first time, or some limited integration test cases with other systems that can’t be reached from a test environment. Or load testing done with Ops – a lot of shops can’t afford to have a test environment sized big enough for real load testing.

Making Production Safe for (and from) Developers

Whether developers should have production access (and how much access you can allow them) also depends on how much developers can be trusted to be careful and responsible with the systems and with customer data. It’s inconsistent that while organizations will trust developers to write the software that runs in production, they won’t trust them with the production system. But development and production are different worlds.

Most developers lack the necessary situational awareness. They are used to experimenting and trying things to see what happens. I've seen smart, experienced developers do dangerous things in production without realizing it while they are deep into problem solving. Developers should be scared of working in production. Not too scared to think, but scared enough to think before they act. They need to understand the risks, and be held to the same duties of care as anyone in Ops.

You can spend a lot of time breaking down the wall between development and Ops, only to see it built back up overnight (much thicker and higher too) the first time that a developer blows away a production database when they thought they were in test, or kills the wrong process or hot deploys the wrong version of code or deletes the wrong config file and causes a widespread outage. Make sure that test and development environments are firewalled from production so that it isn’t possible for anything running in test to touch production through hard-wired links. Make it clear to developers when they are in production. Force them to make a jump: open a tunnel, sign on with a different id and password, see a different prompt.

With great power comes great responsibility

Nobody supporting an app should need – or even want – root access for day-to-day support and troubleshooting. Developers should only be granted the access that they need and no more, so that they can’t do things they shouldn't do, they can’t see things that they shouldn't see, and so that they can’t cause more damage than you can afford.

At the Velocity Conference in 2009, John Allspaw and Paul Hammond explained how important and useful it is for developers to have access to production - but that most of this access can be and should be limited:

Allspaw: “I believe that ops people should make sure that developers can see what’s happening on the systems without going through operations… There’s nothing worse than playing phone tag with shell commands. It’s just dumb.”

“Giving someone [i.e., a developer] a read-only shell account on production hardware is really low risk. Solving problems without it is too difficult.”

Hammond: “We’re not saying that every developer should have root access on every production box.”

Developers who need access to the system should be given a read-only account that allows them to monitor the run-time – logs and metrics. Then force them to make another jump to gain whatever command or write access they need to do admin functions or help with repair and recovery.

One problem is that a lot of systems aren’t designed with fine grained access control at the admin level: there’s an admin user (that owns the application and can see and do everything needed to setup and run the system) and there’s everybody else. It can be painful to break out the application and the environment ownership structure and permissioning scheme and separate read-only monitoring access from support and control functions, to setup sudo privilege escalation rules, and to track and manage all of the user accounts properly.

And none of this works if you aren’t properly protecting confidential and private data or other data that somebody could use for their own benefit. Tokenizing or masking or encrypting data so it can’t be read, hashing critical data to make sure that it hasn’t been tampered with; making sure that confidential data isn’t written to logs or temporary files.

You also have to make sure that you can track what everyone in production does, what they looked at and what they changed through auditing in the application, database and OS; and track changes to important files (including the code) using a detective change control tool like OSSEC.

All of these checks and safeties also make it safer for developers, as well as for Ops, and will hopefully be enough to keep the auditors satisfied.

Try to make it work

There are advantages to having developers working in production besides getting their help with support and troubleshooting.

The more time that developers spend working in production on operations issues with operations staff, the more that they will learn about what it takes to design and build a real-world system. Hopefully they will take this and design better, more resilient systems with more transparency and more consideration for support and admin requirements.

And having developers share responsibility for making the software work and support it, proving that they care and helping out, will go a long way to breaking down the wall of confusion between operations and development.

It’s not a simple thing to do. It might not even be possible in your organization – at least not in your lifetime. You need to understand and balance the risks and advantages. You need to understand the political and governance constraints and how to deal with them. You need to put in the proper safeguards. And you need to make sure that you stay onside of compliance and regulations. But you’re leaving too much on the table if you don’t try.