Tuesday, July 7, 2015

Don’t Blame Bad Software on Developers – Blame it on their Managers

There’s a lot of bad software out there. Unreliable, insecure, unsafe and unusable. It’s become so bad that some people are demanding regulation of software development and licensing software developers as “software engineers” so that they can be held to professional standards, and potentially sued for negligence or malpractice.

Licensing would ensure that everyone who develops software has at least a basic level of knowledge and an acceptable level of competence. But licensing developers won’t ensure good software. Even well-trained, experienced and committed developers can’t always build good software. Because most of the decisions that drive software quality aren’t made by developers – they’re made by somebody else in the organization.

Product managers and Product Owners. Project managers and program managers. Executive sponsors. CIOs and CTOs and VPs of Engineering. The people who decide what’s important to the organization, what gets done and what doesn’t, and who does it – what problems the best people work on, what work gets shipped offshore or outsourced to save costs. The people who do the hiring and firing, who decide how much money is spent on training and tools. The people who decide how people are organized and what processes they follow. And how much time they get to do their work.

Managers – not developers – decide what quality means for the organization. What is good, and what is “good enough”.

Management Mistakes

As a manager, I’ve made a lot of mistakes and bad decisions over my career. Short-changing quality to cut costs. Signing teams up for deadlines that couldn’t be met. Giving marketing control over schedules and priorities, trying to squeeze in more features to make the customer or a marketing executive happy. Overriding developers and testers who told me that the software wouldn’t be ready, that they didn’t have enough time to do things properly. Letting technical debt add up. Insisting that we had to deliver now or never, and that somehow we would make it all right later.

I’ve learned from these mistakes. I think I know what it takes to build good software now. And I try to hold to it. But I keep seeing other managers make the same mistakes. Even at the world’s biggest and most successful technology companies, at organizations like Microsoft and Apple.

These are organizations that control their own destinies. They get to decide what they will build and when they need to deliver it. They have some of the best engineering talent in the world. They have all the good tools that money can buy – and if they need better tools, they just write their own. They’ve been around long enough to know how to do things right, and they have the money and scale to accomplish it.

They should write beautiful software. Software that is a joy to use, and that the rest of us can follow as examples. But they don’t even come close. And it’s not the fault of the engineers.

Microsoft Quality

Problems with software quality at Microsoft are so long-running that “Microsoft Quality” has become a recognized term, for software that is just barely “good enough” to be marginally accepted – and sometimes not even that good.

Even after Microsoft became a dominant, global enterprise vendor, quality has continued to be a problem. A 2014 Computer World article “At Microsoft, quality seems to be job none” complains about serious quality and reliability problems in early versions of Windows 10. But Windows 10 is supposed to represent a sea change for Microsoft under their new CEO, a chance to make up for past mistakes, to do things right. So what's going wrong?

The culture and legacy of “good enough” software has been in place for so long that Microsoft seems to be trapped, unable to improve even when they have recognized that good enough isn’t good enough anymore. This is a deep-seated organizational and cultural problem. A management problem. Not an engineering problem.

Apple’s Software Quality Problems

Apple sets themselves apart from Microsoft and the rest of the technology field, and charges a premium based on their reputation for design and engineering excellence. But when it comes to software, Apple is no better than anyone else.

From the epic public face plant of Apple Maps, to constant problems in iTunes and the App Store, problems with iOs updates that fail to install, data lost somewhere in the iCloud, serious security vulnerabilities, error messages that make no sense, and baffling inconsistencies and restrictions on usability, Apple’s software too often disappoints in fundamental and embarrassing ways.

And like Microsoft, Apple management seems have lost their way:

I fear that Apple’s leadership doesn’t realize quite how badly and deeply their software flaws have damaged their reputation, because if they realized it, they’d make serious changes that don’t appear to be happening. Instead, the opposite appears to be happening: the pace of rapid updates on multiple product lines seems to be expanding and accelerating.

I suspect the rapid decline of Apple’s software is a sign that marketing is too high a priority at Apple today: having major new releases every year is clearly impossible for the engineering teams to keep up with while maintaining quality. Maybe it’s an engineering problem, but I suspect not — I doubt that any cohesive engineering team could keep up with these demands and maintain significantly higher quality.

Marco Arment, Apple has lost the functional high ground, 2015-01-04

Recent announcements at this year’s WWDC indicate that Apple is taking some extra time to make sure that their software works. More finish, less flash. We’ll have to wait and see whether this is a temporary pause or a sign that management is starting to understand (or remember) how important quality and reliability actually is.

Managers: Stop Making the Same Mistakes

If companies like Microsoft and Apple, with all of their talent and money, can’t build quality software, how are the rest of us supposed to do it? Simple. By not making the same mistakes:

  1. Putting speed-to-market and cost in front of everything else. Pushing people too hard to hit “drop dead dates”. Taking “sprints” literally: going as fast as possible, not giving the team time to do things right or a chance to pause and reflect and improve.

    We all have to work within deadlines and budgets, but in most business situations there’s room to make intelligent decisions. Agile methods and incremental delivery provide a way out when you can’t negotiate deadlines or cost, and don’t understand or can’t control the scope. If you can’t say no, you can say “not yet”. Prioritize work ruthlessly and make sure that you deliver the important things as early as you can. And because these things are important, make sure that you do them right.

  2. Leaving testing to the end. Which means leaving bug fixing to after the end. Which means delivering late and with too many bugs.

    Disciplined Agile practices all depend on testing – and fixing – as you code. TDD even forces you to write the tests before the code. Continuous Integration makes sure that the code works every time someone checks in. Which means that there is no reason to let bugs build up.

  3. Not talking to customers, not testing ideas out early. Not learning why they really need the software, how they actually use it, what they love about it, what they hate about it.

    Deliver incrementally and get feedback. Act on this feedback, and improve the software. Rinse and repeat.

  4. Ignoring fundamental good engineering practices. Pretending that your team doesn’t need to do these things, or can’t afford to do them or don’t have time to do them, even though we’ve known for years that doing things right will help to deliver better software faster.

    As a Program Manager or Product Owner or a Business Owner you don’t need to be an expert in software engineering. But you can’t make intelligent trade-off decisions without understanding the fundamentals of how the software is built, and how software should be built. There’s a lot of good information out there on how to do software development right. There’s no excuse for not learning it.

  5. Ignoring warning signs.

    Listen to developers when they tell you that something can’t be done, or shouldn’t be done, or has to be done. Developers are generally too willing to sign on for too much, to reach too far. So when they tell you that they can’t do something, or shouldn’t do something, pay attention.

And when you make mistakes - which you will, learn from them, don’t waste them. When something goes wrong, get the team to review it in a retrospective or run a blameless post mortem to figure out what happened and why, and how you can get better. Learn from audits and pen tests. Take negative feedback from customers seriously. This is important, valuable information. Treat it accordingly.

As a manager, the most important thing you can do is to not set your team up for failure. That’s not asking for too much.

Wednesday, June 24, 2015

Top 10 Lists for Designing and Writing Secure and Safe Software

If you care about writing secure code, should know all about these Top 10 lists:

OWASP Top 10

The OWASP Top 10 is a community-built list of the 10 most common and most dangerous security problems in online (especially web) applications. Injection flaws, broken authentication and session management, XSS and other nasty security bugs.

These are problems that you need to be aware of and look for, and that you need to prevent in your design and coding. The Top 10 explains how to test for each kind of problem to see if your app is vulnerable (including common attack scenarios), and basic steps you can take to prevent each problem.

If you’re working on mobile apps, take time to understand the OWASP Top 10 Mobile list.

IEEE Top Design Flaws

The OWASP Top 10 is written more for security testers and auditors than for developers. It’s commonly used to classify vulnerabilities found in security testing and audits, and is referenced in regulations like PCI-DSS.

The IEEE Center for Secure Design, a group of application security experts from industry and university researchers, has taken a different approach. They have come up with a Top 10 list that focuses on identifying and preventing common security mistakes in architecture and design.

This list includes good design practices such as: earn or give, but never assume trust; identify sensitive data and how they should be handled; understand how integrating external components changes your attack surface. The IEEE’s list should be incorporated into design patterns and used in design reviews to try and deal with security issues early.

OWASP Proactive Controls

IEEE’s approach is principle-based – a list of things that you need to think about in design, in the same way that you think about things like simplicity and encapsulation and modularity.

The OWASP Proactive Controls, originally created by security expert Jim Manico, is written at the developer level. It is a list of practical, concrete things that you can do as a developer to prevent security problems in coding and design. How to parameterize queries, and encode or validate data safely and correctly. How to properly store passwords and to implement a forgot password feature. How to implement access control – and how not to do it.

It points you to Cheat Sheets and other resources for more information, and explains how to leverage the security features of common languages and frameworks, and how and when to use popular, proven security libraries like Apache Shiro and the OWASP Java Encoder.

Katy Anton and Jason Coleman have mapped all of these controls together (the OWASP Top 10, OWASP Proactive Controls and the IEEE Security Flaws), showing how the OWASP Proactive Controls implement safe design practices from the IEEE list and how they prevent or mitigate OWASP Top 10 risks.

You can use these maps to look for gaps in your application security practices, in your testing and coding, and in your knowledge, to identify areas where you can learn and improve.

Monday, June 22, 2015

What does DevOps mean to Developers?

A post that I wrote for JavaWorld DevOps for Developers: A New Agility, describes how DevOps changes the way that developers work, and what they need to know to succeed.

Wednesday, June 17, 2015

Does DevOps Reduce Technical Debt – or Make it Worse?

DevOps can help reduce technical debt in some fundamental ways.

Continuous Delivery/Deployment

First, building a Continuous Delivery/Deployment pipeline, automating the work of migration and deployment, will force you to clean up inconsistencies and holes in configuration and code deployment, and inconsistencies between development, test and production environments.

And automated Continuous Delivery and Infrastructure as Code gets rid of dangerous one-of-a-kind snowflakes and configuration drift caused by making configuration changes and applying patches manually over time. Which makes systems easier to setup and manage, and reduces the risk of an un-patched system becoming the target of a security attack or the cause of an operational problem.

A CD pipeline also makes it easier, cheaper and faster to pay down other kinds of technical debt. With Continuous Delivery/Deployment, you can test and push out patches and refactoring changes and platform upgrades faster and with more confidence.

Positive Feedback

The Lean feedback cycle and Just-in-Time prioritization in DevOps ensures that you’re working on whatever is most important to the business. This means that bugs and usability issues and security vulnerabilities don’t have to wait until after the next feature release to get fixed. Instead, problems that impact operations or the users will get fixed immediately.

Teams that do Blameless Post-Mortems and Root Cause(s) Analysis when problems come up will go even further, and fix problems at the source and improve in fundamental and important ways.

But there’s a negative side to DevOps that can add to technical debt costs.

Erosive Change

Michael Feathers’ research has shown that constant, iterative change is erosive: the same code gets changed over and over, the same classes and methods become bloated (because it is naturally easier to add code to an existing method or a method to an existing class), structure breaks down and the design is eventually lost.

DevOps can make this even worse.

DevOps and Continuous Delivery/Deployment involves pushing out lots of small changes, running experiments and iteratively tuning features and the user experience based on continuous feedback from production use.

Many DevOps teams work directly on the code mainline, “branching in code” to “dark launch” code changes, while code is still being developed, using conditional logic and flags to skip over sections of code at run-time. This can make the code hard to understand, and potentially dangerous: if a feature toggle is turned on before the code is ready, bad things can happen.

Feature flags are also used to run A/B experiments and control risk on release, by rolling out a change incrementally to a few users to start. But the longer that feature flags are left in the code, the harder it is to understand and change.

There is a lot of housekeeping that needs to be done in DevOps: upgrading the CD pipeline and making sure that all of the tests are working; maintaining Puppet or Chef (or whatever configuration management tool you are using) recipes; disciplined, day-to-day refactoring; keeping track of features and options and cleaning them up when they are no longer needed, getting rid of dead code and trying to keep the code as simple as possible.

Microservices and Technology Choices

Microservices are a popular architectural approach for DevOps teams.

This is because loosely-coupled Microservices are easier for individual teams to independently deploy, change, refactor or even replace.

And a Microservices-based approach provides developers with more freedom when deciding on language or technology stack: teams don’t necessarily have to work the same way, they can choose the right tool for the job, as long as they support an API contract for the rest of the system.

In the short term there are obvious advantages to giving teams more freedom in making technology choices. They can deliver code faster, quickly try out prototypes, and teams get a chance to experiment and learn about different technologies and languages.

But Microservices “are not a free lunch”. As you add more services, system testing costs and complexity increase. Debugging and problem solving gets harder. And as more teams choose different languages and frameworks, it’s harder to track vulnerabilities, harder to operate, and harder for people to switch between teams. Code gets duplicated because teams want to minimize coupling and it is difficult or impossible to share libraries in a polyglot environment. Data is often duplicated between services for the same reason, and data inconsistencies creep in over time.

Negative Feedback

There is a potentially negative side to the Lean delivery feedback cycle too.

Constantly responding to production feedback, always working on what’s most immediately important to the organization, doesn’t leave much space or time to consider bigger, longer-term technical issues, and to work on paying off deeper architectural and technical design debt that result from poor early decisions or incorrect assumptions.

Smaller, more immediate problems get fixed fast in DevOps. Bugs that matter to operations and the users can get fixed right away instead of waiting until all the features are done, and patches and upgrades to the run-time can be pushed out more often. Which means that you can pay off a lot of debt before costs start to compound.

But behind-the-scenes, strategic debt will continue to add up. Nothing’s broke, so you don’t have to fix anything right away. And you can’t refactor your way out of it either, at least not easily. So you end up living with a poor design or an aging technology platform, slowly slowing down your ability to respond to changes, to come up with new solutions. Or forcing you to continue filling in security holes as they come up, or scrambling to scale as load increases.

DevOps can reduce technical debt. But only if you work in a highly disciplined way. And only if you raise your head up from tactical optimization to deal with bigger, more strategic issues before they become real problems.

Friday, June 5, 2015

Software Architecture in DevOps

A new book by Len Bass, Ingo Weber and Liming Zhu “DevOps: A Software Architect’s Perspective”, part of the SEI Series in Software Engineering, looks at how DevOps affects architectural decisions, and a software architect’s role in DevOps.

The authors focus on the goals of DevOps: to get working software into production as quickly as possible while minimizing risk, balancing time-to-market against quality.

“DevOps is a set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, while insuring high quality”
These fundamental practices are:
  1. Engaging operations as a customer and partner, “a first-class stakeholder”, in development. Understanding and satisfying requirements for deployment, logging, monitoring and security in development of an application.
  2. Engaging developers in incident handling. Developers taking responsibility for their code, making sure that it is working correctly, helping (often taking the role of first responders) to investigate and resolve production problems.
    This includes the role of a “reliability engineer” on every development team, someone who is responsible for coordinating downstream changes with operations and for ensuring that changes are deployed successfully.
  3. Ensuring that all changes to code and configuration are done using automated, traceable and repeatable mechanisms – a deployment pipeline.
  4. Continuous Deployment of changes from check-in to production, to maximize the velocity of delivery, using these pipelines.
  5. Infrastructure as Code. Operations provisioning and configuration through software, following the same kinds of quality control practices (versioning, reviews, testing) as application software.

Culture and collaboration between developers and operations, shared values and organizational issues, the softer people-side of DevOps, are considered only insofar as they are factors that could affect time-to-market delivery velocity or quality.

Cloud Architecture and Microservices

As a reference for architects, the book focuses on architectural considerations for DevOps. It walks through how Cloud-based systems work, virtualization concepts and especially microservices.

While DevOps does not necessarily require making major architectural changes, the authors argue that most organizations adopting DevOps will find that a microservices-based approach, as pioneered at organizations like Netflix and Amazon, by minimizing dependencies between different parts of the system and between different teams, will also minimize the time required to get changes into production – the first goal of DevOps.

Conway’s Law also comes into play here. DevOps work is usually done by small agile cross-functional teams solving end-to-end problems independently, which means that they will naturally end up building small, independent services:

“Having an architecture composed of small services is a response to having small teams.”

But there are downsides and costs to a microservice-based approach.

As Martin Fowler and James Lewis point out, microservices introduce many more points of failure. Which means that resilience has to be designed and built into each service. Services cannot trust their clients or the other services that they call out to. You need to add defensive checking on data and anticipate failures of other services, implement time-outs and retries, and fall back alternatives or safe default behaviors if another service is unavailable. You also need to design your service to minimize the impact of failure on other services, and to make it easier and faster to recover/restart.

Microservices also increase the cost and complexity of end-to-end system testing. Run-time performance and latency degrade due to the overhead of remote calls. And monitoring and troubleshooting in production can be much more complicated, since a single action often involves many microservices working together (an example at LinkedIn, where a single user request may chain to as many as 70 services).

DevOps in Architecture: Monitoring

In DevOps, monitoring becomes a much more important factor in architecture and design, in order to meet operations requirements.

The chapter on monitoring explains what you need to monitor and why, DevOps metrics, challenges in monitoring systems under continuous change, monitoring microservices and monitoring in the Cloud, and common log management and monitoring tools for online systems.

Monitoring also becomes an important part of live testing in DevOps (Monitoring as Testing), and plays a key role in Continuous Deployment. The authors look at common kinds of live testing, including canaries, A/B testing, and Netflix’s famous Simian Army in terms of passive checking (Security Monkey, Compliance Monkey) and active live testing (Chaos Monkey and Latency Monkey).

DevOps in Architecture: Security

Security is another important cross-cutting concern in software architecture addressed in this book. It looks at security fundamentals including how to identify threats (using Microsoft’s STRIDE model) and the resources that need to be protected, CIA, identity management, access controls. It provides an overview of the security controls in NIST 800-53, and common security issues with VMs and in Cloud architectures (specifically AWS).

In DevOps, security needs to be wired into Continuous Deployment:

  1. Enforcing that all changes to code and configuration are done through the Continuous Deployment pipeline
  2. Security testing should be included in different stages of the Continuous Deployment pipeline
  3. Securing the pipeline itself, including the logs and the artifacts
and security checks need to be part of monitoring (such as Netflix’s Compliance Monkey and Security Monkey).

Continuous Deployment Pipeline and Gatekeepers

Developers – and architects – have to take responsibility for building their automated testing and deployment pipelines. The book explains how Continuous Deployment leverages Continuous Integration, and common approaches to code management and test automation. And it emphasizes the role of gatekeepers along the pipeline – manual decisions or automated checks at different points to determine if it is ok to go forward, from development to testing to staging to live production testing and then to production.

DevOps and Modern Software Architecture

“DevOps: A Software Architect’s Perspective” does a good job of explaining common DevOps practices, especially Continuous Deployment, in a development, instead of operations, context. It also looks at contemporary issues in software architecture, including virtualization and microservices.

It is less academic than Bass’s other book “Software Architecture in Practice”, and emphasizes the importance of real-world operations concerns like reliability, security and transparency (monitoring and live checks and testing) in architecture and deployment.

This is a book written mostly for enterprise software architects and managers who want to understand more about DevOps and Continuous Deployment and Cloud services.

If you’re already deep into DevOps and working with microservices in the Cloud, you probably won’t find much new here.

But if you are looking at how to apply DevOps at scale, or how to migrate legacy enterprise systems to microservices and the Cloud, or if you are a developer who wants to understand operations and what DevOps will mean to you and your job, this is worth reading.

Friday, May 22, 2015

Appsec: The gaps between Builders and Defenders

This year's SANS Institute State of Application Security Survey, which I worked on with Eric Johnson and Frank Kim, looks at the gaps between Builders (the people who design and develop software) and Defenders (application security and information security professionals and operations).

We found that more developers - and managers - are coming to understand the risks and costs of insecure software, and are taking security more seriously. And defenders are doing a better job of understanding software development and how to work with developers. But there's still a long way to go.

Developers still need better skills in secure software development and a better understanding of application security risks. And time to learn and apply these skills. Defenders are trying to catch up with developers and Lean/Agile development, injecting security earlier into requirements and design, leveraging automated tools and services to accelerate security testing. But they are coming up against organizational and communications silos, and managers who put marketing priorities (features and time-to-market) ahead of everything else.

More than 1/3 of the organizations surveyed are looking at secure DevOps as a way to help bridge these gaps, break down the silos and bring development and security together. This is going to require some serious changes to how application security and development are done, but it offers a new hope for secure software.

You can read the detailed report of the survey results here.

Friday, May 8, 2015

DevOps is Killing Maintenance. Let’s Celebrate.

DevOps probably isn't killing developers.

But it is changing how people think about development - from running projects to a focus on building and running services. And more importantly, DevOps is killing maintenance, or sustaining engineering, or whatever managers want to call it. And that’s something that we should all celebrate.

High-bandwidth collaboration and rapid response to change in Agile put a bullet in the head of offshore development done by outsourced CMMI Level 5 certified development factories. DevOps, by extending collaboration between development teams and operations teams and by increasing the velocity of delivery to production (up to hundreds or even thousands of times per day), and by using real feedback from production to drive development priorities and design decisions, has pulled the plug on the sick idea that maintenance should be done by sustaining engineering teams, offshored together with the help desk to somewhere far away in order to get access to cheap talent.

Agile started the job. Devops can finish it

While large companies were busy finding offshore development and testing partners, Agile changed the rules of the game on them.

Off shoring coding and testing work made sense in large-scale waterfall projects with lots of upfront planning and detailed specs that could be handed off from analysts to farms of programmers and testers.

But the success of Agile adoption in so many organizations, including large enterprises, made outsourcing or offshoring development work less practical and less effective. Instead of detailed analysis and documented hand-offs, Agile teams rely on high-bandwidth face-to-face collaboration with each other and especially with the Customer, and rapid iteration and feedback. Everything happens faster. Customers change priorities and requirements. Developers respond and build and deliver features faster.

Time-intensive and people-intensive work like manual testing and reviews are replaced with automated testing and static analysis in Continuous Integration, pair programming, and continuous review and improvement.

In this dynamic world, it doesn’t make sense to try to shovel work offshore. You have to give up too much in return for saving on staff costs. Teleconferencing across time zones and cultures, “virtual team rooms” using webcams, remote pair programming over Skype… these are all poor compromises that lead to misunderstandings, inefficiencies and mistakes. Sure you can do offshore Agile development, but just because you can do something doesn’t mean that it is a good idea.

Devops is going to finish the job

In DevOps, with Continuous Delivery and Continuous Deployment, changes happen even faster. Cycle times and response times get shorter, from months or weeks to days or hours. And feedback cycles are extended from development into production, letting the entire IT organization experiment and learn and improve from real customer use.

Developers collaborate even more, not just with each other and with customers, but with operations too, in order to make sure that the system is setup correctly and running optimally. This can’t be done effectively by development and operations teams working in different time zones. And it doesn’t need to be.

We all know how outsourcing has played out. In the name of efficiency we sliced out non-strategic parts of core IT and farmed them out to other companies, whether offshore or domestic. CIOs loved it because of the budgetary benefits. Meanwhile, it sparked a thousand conversations about what outsourcing meant for IT, the US economy, individual careers, and the relationship between people and businesses.

But it turned out that we took outsourcing too far. It makes sense for some functions, but it can also mean losing control over management, quality, and security, among other things. Now we're seeing a lot of those big contracts being pulled back, and the word of the day is insourcing.

InformationWeek, DevOps: The New Outsourcing

Blurred Lines

DevOps intentionally blurs the lines between developers and operations, between coding and support. Engineering is engineering. Project work gets broken down into piece work: individual features or fixes or upgrades that can be completed quickly and pushed into production as soon as possible. Development work is prioritized together with operations and support tasks. What matters is whatever is important to the business, whatever is needed for the system to run. If the business needs something fixed now, your best people are fixing it, instead of giving it to some kids or shipping it overseas.

In DevOps, developers are accountable for making sure that their code works in production:

You build it, you run it

Which means making sure that the code gets into production, monitoring to make sure that it is working correctly, diagnosing and fixing any problems if something breaks.

New features, changes, fixes, upgrades, support work, deployment… everything is done by the same people, working together. Which means that maintenance and support gets the same management focus as new development. Which means that nobody is stuck in dead end job sustaining a dead end system. Which means that customers get better results, when they need them.

Except for enterprise legacy systems on life support, maintenance as most of us think of it today should die soon, thanks to DevOps. That alone makes DevOps worth adopting.

Site Meter