Building Real Software: software development

Showing posts with label software development. Show all posts

Thursday, October 17, 2013

Programming: Thinking or Typing, Thinking and Typing

“If you don’t think carefully, you might think that programming is just typing statements in a programming language.”
Ward Cunningham, in the Forward to The Pragmatic Programmer

Software development – design, solving problems, coming up with optimal new algorithms, learning a new language, refactoring messy code into something tight and elegant – requires hard thinking.

When you’re trying to do something that you've never done before – or nobody has ever done before. Or when you've done it before but you sure as hell aren't going to make the same mistakes again and you need time to think your way to a better way. Or when you’re trying to understand code that somebody else wrote so you can change it, or when you’re hunting down an ugly bug. All of this can take a lot of time, but in the end you won’t have a lot of code to show for it.

Then there’s all the other work in development – work that requires a lot of typing, and not as much thinking. When it’s clear what you need to do and how you need to do it, but you have an awful lot of code to pound out before the job is done. You've done it before and just have to do it again: another script, another screen, another report, and another after that. Or where most of the thinking has already been done for you: somebody has prepared the wireframes and told you exactly how the app will look and feel and flow, or speced out the API in detail, so all you need to do is type it in and try not to make too many mistakes.

Debugging is thinking. Fixing the bug and getting the fix tested and pushed out is mostly typing. Early stage design and development, technical spikes to check out technology and laying out the architecture, is hard thinking. Doing the 3rd or 4th or 100th screen or report is typing. UX design and prototyping: thinking. Pounding out CRUD maintenance and config screens: typing. Coming up with a cool idea for a mobile app is thinking. Getting it to work is typing. Solving common business problems requires a lot of typing. Optimizing business processes through software requires hard thinking.

Someone who is mostly thinking and someone who is just typing are doing very different kinds of work and need to be managed in different ways.

Sometimes Programming is just Typing

“We are typists first, and programmers second.”
Jeff Atwood, Programming Horror

Many business applications are essentially shallow. Lots of database tables and files with lots of data elements and lots of data, and lots of CRUD screens and lots of reports that are a lot like a lot of other screens and reports, and lots of integration work with lots of fields to be mapped between different points and then there are compliance constraints and operational dependencies to take care of. Long lists of functional requirements, lots of questions to ask to make sure that everyone understands the requirements, lots of details to remember and keep track of.

Banking, insurance, government, accounting, financial reporting and billing, inventory management and ERP systems, CRM systems, and back-office applications and other book-keeping and record-keeping systems are like this. So are a lot of web portals and online shops. Some maintenance work – platform upgrades and system integration work and compliance and tax changes – is like this too.

You’re building a house or a bridge or a shopping mall, or maybe renovating one. Big, often sprawling problems that may be expensive to solve. A lot of typing that needs to be done. But it’s something that’s been done many times before, and the work mostly involves familiar problems that you can solve with familiar patterns and proven tools and ways of working.

"I saw the code for your computer program yesterday. It looked easy. It’s just a bunch of typing. And half of the words were spelled wrong. And don’t get me started on your over-use of colons."

The Pointy Haired Boss sees some actual code

Once the design is in place most of the work is in understanding and dealing with all of the details, and managing and coordinating the people to get all of that code out of the door. This is classic project/program management: budgeting and planning, tracking costs and changes and managing hand offs. It’s about logistics and scale and consistency and efficiency, keeping the work from going off of the rails.

Think Think Think

Other problems, like designing a gaming engine or a trading algorithm, or logistics or online risk management or optimizing real-time control systems, require a lot more thinking than typing. These systems have highly-demanding non-technical requirements (scalability, real-time performance, reliability, data integrity and accuracy) and complex logic, but they are focused on solving a tight set of problems. A few smart people can get their head around these problems and figure most of them out. There’s still typing that needs to be done, especially around the outside, the framing and plumbing and wiring, but the core of the work is often done in a surprisingly small amount of code – especially after you throw away the failed experiments and prototypes.

This is where a lot of the magic in software comes from – the proprietary or patented algorithms and the design insight that lies at the heart of a successful system. The kind of work that takes a lot of research and a lot of prototyping, problem solving ability, and real technical chops or deep domain knowledge, or both.

Typing and Thinking are different kinds of work

Whether need to do a lot of typing or mostly thinking dictates how many people you need and how many people you want on the team, and what kind of people you want to do the work. It changes how people work together and how you have to manage them. Typing can be outsourced. Thinking can’t. You need to recognize what problems can be solved with typing and what can’t, and when thinking turns to typing.

Thinking work can and should be done by small teams of specialists working closely together – or by one genius working on their own. You don’t need, or want, a lot of people while you are trying to come up with the design or think through a hard problem, run experiments and iterate. Whoever is working on the problem needs to be immersed in the problem space, with time to explore alternatives, chances to make mistakes and learn and to just stare out the window when they get stuck.

This is where fundamental mistakes can be made: architecture-breaking, project-killing, career-ending errors. Picking the wrong technology platform. Getting real-time tolerances wrong. Taking too long to find (or never finding) nasty reliability problems. Picking the wrong people or trying to solve the wrong problem. Missing the landing spot.

Managing this kind of work involves getting the best people you can find, making sure that they have the right information and tools, keeping them focused, looking out for risks from outside, and keeping problems out of their way.

Thinking isn’t predictable. There’s no copy-and-paste because there’s nothing to copy and paste from. You can’t estimate it, because you don’t know what you don’t know. But you can put limits on it – try to come up with the best solution in the time available.

Typing is predictable. You can estimate it – and you have to. The trick is including all of the things that need to be typed in, and accounting for all of the little mistakes and variances along the way – because they will add up quickly. Sloppiness and short cuts, misunderstanding the requirements, skipping testing, copy-and-paste, the kinds of things which add to costs now and in the future.

Typing is journeyman work. You don’t need experts, just people who are competent, who understand the fundamentals of the language and tools and who will be careful and follow directions and who can patiently pound out all the code that’s needed – although a few senior developers can out-perform a much larger team, at least until they get bored. Managing a bunch of typists requires a different approach and different skills: you need to be a politician and diplomat, a logistician, a standards setter, an administrator and an economist. You’re managing project risks and people risks, not technical risks.

Over time, projects change from thinking to typing – once most of the hard “I am not sure what we need to do or how we’re doing to do it” problems are solved, once the unknowns are mostly known, now it’s about filling in the details and getting things running.

The amount of typing that you need to do expands as you get more customers and have to deal with more interfaces to more places and more customizations and more administrivia and support and compliance issues. The system keeps growing, but most of the problems are familiar and solvable. There’s lots of other code to look at and learn from and copy from. You need people who can pick up what’s going on and who can type fast.

Thinking and Typing

Thinking and typing are both important parts of software development.

In “Programming is Not Just Typing”, Brendan Enrick explains that the reason that Pair Programming works is because it lets people focus on typing and thinking at the same time:

Both guys are thinking, but about different things. One developer has the keyboard at any given time and keeps in his head the path he is on. (This guy is concerned with typing speed.) He types along the path thinking about the code he is currently writing not the structure of the app, but the code he is typing right now. For a short time his typing speed matters.

The programmer in the pair who is not actively typing is spending all of his time thinking. He keeps in his head the path that the typist is taking, not concerned with the syntax of the programming language. That is the other guy thinking about the language syntax. The one sitting back without the keyboard is the guide. He must make sure that the pair stays on the right path using the most efficient route to success.

There’s more to being a good developer than typing - and there's more to typing than just being able to press some keys. It means being good at the fundamentals: knowing the language well enough, knowing your tools and how to use them, knowing how to navigate through code, knowing how to write code – as well as being fast at the keyboard. Mastering the mechanics, knowing the tools and being able to type fast, so that you're fluent and fluid, are all essential for succeeding as a developer. Don’t diminish the importance of typing. And don’t let typing – not being able to type – get in the way of thinking.

Thursday, February 2, 2012

Source Code is an Asset, Not a Liability

Some people have tried to argue that source code is a liability, not an asset. Apparently this “is now widely accepted”
and

“this is a very strong idea that has a lot of impact across the IT industry and in the way developers view and perform their day-to-day work”.

Really?

The argument, as far as I can follow it, is that while engineers are paid to help design and build bridges and power plants, as developers we’re paid to “deliver business value”, and…

“Source code is merely the necessary evil that’s required to create value”

Source code, the software that we create, is only a means to and end. The software itself has no value, or worse it has negative value, because it creates a drag on your ability to innovate and move forward. The more code that you have, the higher your maintenance costs will be, therefore…

“… the best code of all is the code that's never written.”

Michael Feathers, who has a lot of smart things to say about source code, joined in on this discussion. In The Carrying-Cost of Code he says that

“code is inventory. It is stuff lying around and it has substantial cost of ownership. It might do us good to consider what we can do to minimize it.”

He goes so far as to suggest a goofy thought experiment where “every line of code written disappears exactly three months after it is written”. The point of this would be to get developers and the business to understand that the “costs of carrying code are real, but no one accounts for them”.

Feathers reinforces the valid points about the drag that unmaintained or poorly maintained legacy code has on companies. Writing less code to solve a problem is a good thing – it’s (usually) more efficient and (usually) costs less to maintain a smaller code base. And yes there is a necessary cost to maintaining software and working with existing software and changing it.

But none of this changes the fact that software is an asset

If you build and operate a power plant or a bridge, you have to maintain it – just like software. And like a bridge or a power plant, a newer, more modern, better-designed, more efficient and simpler asset is better than a big, old, complicated, expensive-to-maintain one.

The “software is a liability” argument seems to be that it’s not the software that’s the asset, it’s the “features and options” – the capabilities that the software provides. This is like saying that it’s not the power plant (which a company spent millions of dollars to design and engineer) that’s a valuable asset to a company, it’s the energy that it generates. It’s not the bridge – it’s the ability to drive over water. It’s not the airplane, it’s the ability to fly.

Pretending that software has no value in itself is silly. Try explaining this to accountants (don’t depreciate the airplane, depreciate the ability to fly!) and IP lawyers and to investors who buy software companies for their IP. They all understand that software and the ideas embodied in it are valuable and need to be treated as assets. The ideas themselves are only worth so much, even if they’re patented. But the ideas realized in software, actualized and proven and ready to be used or (better) already being used – that’s where the real value is. And this is the value that needs to be maintained and preserved.

Software is more valuable than other assets

The important difference between software and other assets is that software is much more plastic than other engineering work. Software is “soft” – it can be changed easily and inexpensively and quickly. This makes software more strategically valuable than “hard” assets like a building because software can be continuously adapted and renewed in response to changing situations, and transformed to create new business opportunities.

Software has to be changed to stay useful. The problem is NOT that we HAVE TO maintain software and change it to do things that it was never intended to do, to work in ways that it was never designed to, to do things that we couldn’t imagine a few years ago. This is the opportunity that software gives us – that we CAN do this. This is why Software is Eating the World.

Monday, July 11, 2011

Developing and Testing in the Cloud

There’s a lot of hype around “the Cloud” and what it can do.

One of the things that I am interested in is Cloud solutions that can help small software companies, and especially to kickstart software startups. Good tools that development teams can take advantage of to build and test their own stuff, without all of the hassle and expense of internal IT, buying and provisioning their own gear, setting up the tools and systems and networks and finding someone who understands all of it well enough to do it right and to keep it running. I want to find Cloud-based technology that can do for software teams what Salesforce.com does for customer-service businesses, so that software teams can get started off quickly, and doing things right from the start.

Software teams need a core set of shared tools and capabilities:

Source code management / version control
Bug tracking
Collaboration and shared documentationBuild and Continuous Integration
Source code scanning and static analysis
Unit and functional testing
System, load and stress testing
Code deployment.

Can the Cloud provide all of this in an effective way?

Managing and Building Code in the Cloud

I keep running into people using GitHub to host their source code repositories. Not just people working on Open Source projects, but companies using GitHub to manage commercial projects in hosted private repositories, a service used by a number of startups. In GitHub, developers have a platform for managing code with the Git distributed version control system, and they get access to a good set of tools for a development team, especially distributed teams: admin functions to control permissioning, wikis, a bug tracking system and an online code review tool.

One alternative to GitHub is BitBucket which uses the Mercurial DVCS. Like GitHub, BitBucket can be used to manage Open Source and private projects, and it looks like it offers a similar set of management capabilities and tools. GitHub is free for Open Source projects and cheap for small teams while BitBucket’s pricing is based on the number of users, small teams (up to 5 users) are free for open or closed source.

Another On Demand SCM platform built on Mercurial is Fog Creek Software’s Kiln which integrates nicely with Fog Creek’s bug tracking and planning system FogBugz.

Atlassian, which bought BitBucket last year also offers Jira Studio a comprehensive hosted development platform centered on a Subversion code repository integrated with the rest of Atlaissian’s strong development toolset: Jira bug tracking, FishEye for code searching, Confluence wiki, Greenhopper for Agile team planning, Elastic Bamboo for build and Continuous Integration on Amazon EC2 and Crucible for online code review. That’s almost everything that a team needs (all that is missing is static analysis and a functional and load testing platform). Presumably the same integrated development tool support will extend to BitBucket over time, although there are already some overlaps between BitBucket and Jira Studio.

And there’s CloudBees Dev@cloud which offers a choice of secure Git, SVN or Maven repositories, and Continuous Integration through a hosted Jenkins server.

Spring Source’s Code2Cloud might be a good option especially for Java / Spring developers when it is ready – the details are fuzzy at the moment.

Google Code is cool, but for now is only available for Open Source projects.

For enterprises, IBM recently announced Smart Business Development and Test Cloud and IBM Smart Business Development and Test on the IBM Cloud (also known as SmartCloud Enterprise - I’m not sure what the difference is between them, I got too worn out reading through the marketing speak) , with all sorts of IBM technology and partner tools. This stuff isn’t for the faint of heart, or the small of wallet.

Continuous Integration in the Cloud

Continuous Integration is a problem that development teams need to solve before they get too far along, and setting up a CI server and keeping it running isn’t trivial. It’s another good fit for the elastic On Demand model of the Cloud, paying for more infrastructure only when you need it.

Besides Atlassian’s hosted end-to-end platform, and CloudBees’ Jenkins as a Service which can be used to build code hosted on a repository accessible through the Internet, there are a few options for CI in the Cloud (courtesy of Pascal Thivent on Stack Overflow):

For Open Source projects there is CodeBetter based on the TeamCity build server. For proprietary projects there is Mike CI which is hosted on Amazon EC2, and includes interfaces to Subversion, Git and Mercurial repositories; and CI Foundry. However, I don’t know how real these services are. Mike CI was down the last couple of times I went to check on it, and CI Foundry’s home page includes this not-very-convincing testimonial:

What people are saying...
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris quis ligula vel ligula varius cursus tristique eget sem. Mauris sagittis, ante id imperdiet auctor, sem mi condimentum neque, quis feugiat quam dui dictum risus.
- Chris Read
Continuous Deployment expert

Thivent suggests that you can roll-your- own CI solution on Amazon EC2 or maybe use something like “CI in a box”. Using an On Demand platform like EC2 makes sense for CI, rather than provisioning your own gear: it’s a bit more work for you to do the setup, but you can rely on Amazon’s infrastructure to reduce costs and simplify operations.

Bug Tracking in the Cloud

There are some decent and reasonably priced Cloud-based bug tracking systems suitable for small teams, including Fog Creek Software’s FogBugz and Atlassian’s Jira Hosted option both of which are also included in their hosted code management suites.

Testing in the Cloud

The Cloud has a lot of promise for app testing, especially for large-scale performance and load testing. You’re building a cool new Web platform that needs to support 10,000 or 100,000 concurrent sessions according to your business plan. Before your investors put more money in, they want to see just how real your software is, how far away you are from delivering something that could work. Of course you don’t have the money or time or IT skills to build a large-scale test center of your own at this point, and there’s a lot of waste in doing this – you need a lot of gear, but you won’t need to use all of it very often.

You can spin up a test farm on EC2 surprisingly quickly, even a big one – and as long as you don’t need to run the tests for a long time and you don’t ask for too high a service level, it won’t cost you much at all. You pay for what you need when you need it.

To run stress tests you need more than just the app deployed in a farm. You also need a scalable stress test harness to drive load scenarios and to measure the results. There are some viable options available in the Cloud today: SOASTA with its CloudTest solution for load testing and extreme stress testing, scaling from smallish sites to really big; HP Load Runner in the Cloud running on EC2; JMeter in the Cloud; LoadStorm and BrowserMob.

There are also some other interesting test capabilities, including technology from Sauce Labs which lets you run Selenium tests on Web apps in the Cloud or manually test your app against different Cloud-based instantly-available browsers, all with video records of failed tests and other problems found. Simple, useful and cool.

A different take on Cloud-based testing that could work for startups and smaller companies is crowd-sourced testing services like uTest, where you get people “in the Cloud” to test your app (web apps, desktop apps and mobile apps), including functional coverage and regression testing (you give them tests, they run them and maybe help improve them), exploratory testing, usability testing, load testing and test planning and management. Like other Cloud-based On Demand solutions, you pay for work when you need it. uTest has a community of something like 40,000 testers available, although it’s not clear how many of them are much good.

Mob4Hire is another crowd-sourced solution specifically for mobile apps – they offer functional testing and usability testing and other services like market research.

Static Analysis in the Cloud

Some of the leading static analysis suppliers have also gone to the Cloud. HP offers Fortify on Demand to scan code for security vulnerabilities, and IBM has Rational Appscan OnDemand which includes assistance from an experienced team to help you interpret the results and plan remediation. Both of these solutions are targeted more towards the enterprise than small teams.

Veracode, which runs binary code analysis for security vulnerabilities and now also offers web vulnerability scanning, is only available through the Cloud. They offer a free trial scan of small Java apps for XSS and SQL Injection flaws (two of the most common and fatal web application vulnerabilities).

For startups that care about building reliable and secure apps (and most of them should, especially Web 2.0 startups and mobile app developers) it probably makes more sense to look at simpler developer IDE-based tools like Findbugs and Google’s CodePro Analytix (both free) or Klocwork Solo for Java which is priced per user.

The Tradeoffs

Managing your code, building it and testing it in the Cloud has a number of clear advantages:

Faster time to market – you can get access to the infrastructure and tools quickly, with minimal hassle and setup time.
Savings on cost and Capex – especially for load testing – you don’t have to pay upfront, you only have to pay for what you use when you use it, so you can better manage your cash flow.
Convenience – you don’t have to provision and operate the gear yourself and add to your operational headaches.
Access to specialist skills – your Cloud provider will know more about these technologies and how to solve a specific problem much better than you can afford to, so you don’t need to contract or hire gurus to help solve problems or make sure everything is setup properly, or waste time figuring it out yourself.
Service levels – they’ll have more people to help keep things running, to make sure code and data are backed up, systems are cleaned up and monitored and patched. They’re less likely to lose your code and data than you are.

When it comes to security, the common argument for working with a Cloud-based provider is that they will be more responsible and serious about security than most of their customers will be or can be. Because the of economies of scale, they can invest in doing things right, and because they have to be secure to compete. GitHub for example looks like they have a good security program in place and they are hosted by Rackspace so the data center, network and servers will be setup and taken care of much better than a small company, especially a startup, can afford to or would know how to do.

The concern with using a Cloud platform like this comes down to some fundamental points:

1. It’s a shared platform, used by a lot of companies. The more successful the platform is, the more customers and interesting and valuable data / IP that they manage, the more attractive a target they are to bad guys. A small company by itself is an easy target, but not especially interesting or easy to find. But a platform that holds data for a lot of small, innovative companies? Yummy….[pronounced in a deep, growly voice with a foreign accent]

2. You take on some important risks any time that you outsource something that is core and critical to your business. Your data / code / customer base is more important to you than it is to whoever you outsource this responsibility to. If something goes badly wrong, if you can’t get an important change or an important bug fix out to a customer because the Cloud service isn’t working, or if the integrity or confidentiality of your code is compromised, they will be sad, and they will lose a customer (you)... but you may go out of business.

You can ask for, and pay for, good SLAs - but Amazon’s major EC2 outage earlier this year showed that even the best providers can’t meet their SLA commitments. While Amazon did a lot of things right in handling and recovering from that outage, it affected a lot of customers for too long.

3. There’s more to a secure online platform than using SSL and network firewalls and running in a good data center - pretending otherwise is at best naïve. As a customer, you need to be confident in how the Cloud services provider designed and built the software architecture and platform to protect the confidentiality and integrity of your IP and data, in their multi-tenant partitioning scheme and software security controls, and in their SDLC. Security vulnerabilities in application code are a serious source of risk to businesses on the Web, and most companies still do a poor job of building apps in a secure way.

DropBox, a Cloud-based document storage facility is an example of a platform that seemed to be secure and confidential, until somebody started to look deeper into it, and they continue to run into problems with basic security issues Unfortunately, none of the hosted platforms provide any kind of statement on their secure SDLC that I could find, on what steps they take to ensure that the platform is designed and implemented safely.

Testing in the Cloud, especially load testing for Web apps, looks to be a no-brainer – the business case is too compelling to ignore. Many of the other tools have a lot of promise: something like Atlassian’s end-to-end hosted studio would save a lot of time and trouble for a small company, and the upfront cost savings are real.

It comes down to how much trust you can put into your Cloud provider and their SLAs for availability and support, and whether in the end you can afford to trust your core IP to somebody else. 37 Signals, which builds and operates its own Cloud solutions for project management and document sharing, doesn’t think it is a good idea:

We host all the source code for our applications internally for obvious security reasons. That’s not to say Github’s private repository hosting isn’t a good option, especially if you want a hassle-free setup. It’s just not for us.
JH 22 Aug 08

For some companies, especially startups, IP embodied in their code is critically important – it may be all that they have. Ironically, it’s these companies that are most likely to use a Cloud platform like GitHub or Jira Studio or BitBucket. Deciding whether to trust your future to the Cloud is not an easy decision to make. But as the technology continues to get better, it’s a question that is worth asking.

Monday, November 15, 2010

Construx Software Executive Summit 2010

I spent a few days last week in Seattle at the Software Executive Summit hosted by Construx Software Builders. This is a small conference for senior software development managers: well organized, world class keynote speakers, and the opportunity to share ideas in supportive and open roundtables with smart and experienced people who are all focused on similar challenges, some (much) bigger, some smaller. It was good to reflect and reset, and a unique chance to spend time with some of the top thinkers in the field, swapping stories over lunch with Fred Brooks, chatting over drinks with Tim Lister and Tom DeMarco. This was my second time at this conference, and based on these experiences, I would highly recommend it to other managers looking for new ideas.

Key takeaways for me:

A lot of people are concerned that we (software developers in general) continue to do a poor job upfront, on requirements and architecture and design. And that the push to Agile methods, on collaborative and emergent design, isn’t helping: if anything, the problem is getting worse. We are getting better and faster at developing and delivering mediocre software.

The push for offshoring of programming and testing work continues to come mostly from the top. Even with the economic downturn, companies are having problems filling positions locally; and having problems keeping resources overseas, with turnover rates as high as 35% annually in India. The companies who are successful with offshoring make long-term commitments with outsourcing firms; invest in a lot of back-and-forth travel; rely on strong and hard-working onshore technical managers from their outsourcing partners to coordinate with their offshore teams; spend a lot of time on documentation and reviews; and spend more time and money on building and maintaining relationships. There can be real advantages at scale: for smaller companies I don’t see the return.

Some companies have backed away from adopting Agile methods, or are running into problems with Agile projects, because the team is unwilling or unable to commit to a product roadmap or rough project definition. Customers and sponsors are frustrated, because they don’t understand what they can expect or when they can expect it. Executives don’t want to see demos: they want to know when the project will be delivered. Working in an Agile way can’t prevent you from giving the customer what they need to make business decisions and meet business commitments. This is another argument for adapting Agile methods, for doing at least some high-level design and roadmap planning upfront, agreeing on the shape of what needs to be done and on priorities before getting to work.

Productivity: there is no clear way to define productivity or compare it across teams doing different kinds of work. Data on velocity or earned value is useful of course on bigger projects to see trends or catch problems: you want to know if you are getting faster or slower over time. Tracking time on maintenance work (bug fixes, small changes) can help to hilight technical debt or other problems: but what if most of the work that you do is maintenance? We need better ways to answer simple management questions: Am I getting my money’s worth? Is this group well-managed?

Two of the keynote presentations looked at innovation, and two others explored related problems in design: design integrity, dealing with complexity in design. Better decisions are made by small teams, the smaller the better: the ideal team size may be just two people (Holmes and Watson). It's important to give people a safe place and time to think – uninterrupted, unstructured time, preferably time blocked off together so that people can collaborate and play off of each other's ideas. If people are always on the clock and always focused on delivery, always dealing with short-term issues, they won’t have the opportunity to come up with new ways of looking at things and better ways of working, and worse they may lose track of the bigger picture, forget to look up, forget what’s important.

Technical debt: it’s our fault, not the business’s fault. It’s our house, so keep it clean. We have to find time to do the job right. One company uses the idea of a “tech tax”, a small (10%) overhead amount that’s included in all work for upgrades and refactoring and re-engineering.

The importance of building business cases for technical changes like re-architecture. Remember that business cases don’t have to be solid; they just need to be reasonable. Make sure that you have some data on cost and risk, and tie your case into the larger business strategy.

As a leader, you need to take responsibility. You have to deal with ambiguity: that's your job. Don't be a victim. Make a decision and execute.

Tuesday, July 13, 2010

Developers just don’t go to security conferences

OWASP has just announced their 2010 US Appsec conference. It looks like an interesting opportunity to explore the state of the art in software security. Last year, some of the leaders of this conference were concerned about how few software developers showed up for the sessions. I expect the same will happen this year: the audience will be made up of a self-referencing group of security specialists and consultants, and a handful of developers and managers who are looking to understand more about software security. And that, I think, is as it should be.

Travel and education budgets are tight. Developers and managers need to choose carefully where to spend their company’s money and time – or their own. Where can they get the most information for their own work, where can they meet people who will help them solve problems or move their careers forward?

Security experts like Jeremiah Grossman are right: developers don’t understand security, they aren’t taking ownership for building secure software, it’s not important to them.

What’s important to software developers? Delivery: if we don’t deliver, we fail. Check out the major software development conferences, the events that attract senior developers, architects, development managers, test managers, project managers. They are all about delivery, how to deliver faster, better, cheaper: Agile methods, understanding Lean/Kanban principles applied to software development, leadership and collaboration and communication and effecting organizational change, managing distributed teams and global development, getting requirements right, getting design right, tracking whether the project is on target, metrics, continuous integration, continuous delivery, continuous deployment, TDD and BDD, refactoring and improving code quality, improving the user experience, newer and better development platforms and languages and tooling.

With the notable exception of the recent NFJS UberConf, there isn’t any serious coverage of secure software development, secure SDLCs, software security problems at software development conferences. The question shouldn’t be why there are so few developers attending a software security conference. The question should be why there is so little coverage of software security at software development conferences and in the other places where developers and managers get their information: in the development-focused books and blogs and seminars.

Building Bridges

I’ve posted before about my concern about the gap between the software development and software security communities. But there is a way forward. Around 10 years ago, the development and testing community were far apart in values and goals; testing was inefficient and was seen as “somebody else’s problem”. But Agile development made it important for developers to test early and often, made it important for developers to understand testing and code quality, to find better and more efficient ways to test. Testing is cool now – and more than that, it’s expected. Developers look to professional testers for help in improving their own testing, and to find bugs that they don’t understand how to find, through exploratory testing and integration testing and more advanced testing techniques. The development community is taking responsibility for testing their own work, for the quality of their work. And I believe that software development, and software, are both better for it.

But software security is still “somebody else’s problem”. This needs to change. The solution isn’t to try to entice developers to attend security conferences. It's not to force certification in secure development through SANS or ISC. It’s not passive attempts to “infect” development managers with vague ideas about being "Rugged" that are supposed to somehow change how developers build software. And holding software producers liable for their mistakes, while clearly showing the frustration of security specialists, and while making for a provocative sound bite, is not likely to happen either.

The solution is to make secure software development a problem for software developers, a problem that we need to solve ourselves. Engage leaders in the wider software development community: the people who spend their lives thinking about and writing about better ways to build better software; the people who help shape the values and priorities of the development community; the people who help developers and managers decide where to spend time and effort and money. And help them to understand the problems and how serious these problems are, convince them that they need to be involved, convince them that we need to include security in software development, and ask them to help convince the rest of us.

Engage people like Steve McConnell, and Martin Fowler, and Kent Beck, Uncle Bob Martin, Michael Feathers, the NFJS uber geeks, Joel Spolsky and Mike Atwood, Scrum advocates like Mike Cohn and Ken Schwaber, Lean evangelists like Tom and Mary Poppendieck, David Anderson on Kanban, agile project managers like Johanna Rothman, and leaders from the software testing community like James Bach and Jonathan Kohl. And ask them who else they think should be engaged because they’ll know better who can make a difference.

Invite them to come in and work with the best of the software security community, to understand the challenges and issues in building secure software, ask them to consider how to “Build Security In” to software development. And with their help, maybe we will see software security problems owned by the people responsible for building software, and those problems solved with the help and guidance of experts in the security community in a supportive and collaborative way. Otherwise I am afraid that we will continue to see the communities drift apart, the gap between our priorities growing ever wider. And software, and software development, will not be the better for it.

Monday, May 24, 2010

Fast or Secure Software Development - but you can't have both

There is a lot of excitement in the software development community around Lean Software Development: applying Japanese manufacturing principles to software development to reduce waste and eliminate overhead, to streamline production, to get software out to customers and get their feedback as quickly as possible.

Some people are going so far as to eliminate review gates and release overhead as waste: “iterations are muda”. The idea of Continuous Deployment takes this to the extreme: developers push software changes and fixes out immediately to production, with all the good and bad that you would expect from doing this.

Continuous Deployment has been followed to success at IMVU and Wordpress and Facebook and other Web 2.0 companies. The CEO of Automattic, the team behind Wordpress, recently bragged about their success in following Continuous Deployment:

“The other day we passed product release number 25,000 for WordPress.com. That means we’ve averaged about 16 product releases a day, every day for the last four and a half years!”

I am sure that he is not proud of their history of security problems however, which you can read about here, here, here, here, here and elsewhere.

And Facebook? You can read about how they use Continuous Deployment practices to push code out to production several times a day. As for their security posture, Facebook has "faced" a series of severe security and privacy problems and continues to run into them, as recently as last week.

I’ve ranted before about the risks that Continuous Deployment forces on customers. Continuous Deployment is based on the rather naïve assumption that if something is broken, if you did something wrong, you will know right away: either through your automated tests or by monitoring the state of production, errors and changes in usage patterns, or from direct customer feedback. If it doesn’t look like it’s working, you roll it back as soon as you can, before the next change is pushed out. It’s all tied to a direct feedback loop.

Of course it’s not always that simple. Security problems don’t show up like that, they show up later as successful exploits and attacks and bad press and a damaged brand and upset customers and the kind of mess that Facebook is in again. I can’t believe that the CEO of Facebook appreciates getting this kind of feedback on his company's latest security and privacy problems.

Now, maybe the rules about how to build secure and reliable software don’t, and shouldn’t, all apply to Web 2.0, as Dr. Boaz Gelbord proposes:

“Facebook are not fools of course. You don't build a business that engages every tenth adult on the planet without honing a pretty good sense for which way the wind is blowing. The company realizes that it is under no obligation to provide any real security controls to its users.

Maybe to be truly open and collaborative, you are obliged to make compromises on security and data integrity and confidentiality. Some of these Web 2.0 sites like Facebook are phenomenally successful, and it seems that most of their customers don’t care that much about security and privacy, and as long as you haven’t been foolish enough to use tools like Facebook to support your business in a major way, maybe that’s fine.

And I also don’t care how a startup manages to get software out the door. If Continuous Deployment helps you get software out faster to your customers, and your customers are willing to help you test and put up with whatever problems they find, if it gives you a higher chance of getting your business launched, then by all means consider giving it a try.

Just keep in mind that some day you may need to grow up and take a serious look at how you build and release software – that the approach that served you well as a startup may not cut it any more.

But let’s not pretend that this approach can be used for online mission-critical or business-critical enterprise or B2B systems, where your system may be hooked up to dozens or hundreds of other systems, where you are managing critical business transactions. Enterprise systems are not a game:

“I understand why people would think that a consumer internet service like IMVU isn't really mission critical. I would posit that those same people have never been on the receiving end of a phone call from a sixteen-year-old girl complaining that your new release ruined their birthday party. That's where I learned a whole new appreciation for the idea that mission critical is in the eye of the beholder.”

This is a joke right?

But seriously, I get concerned when thoughtful people in the development community, people like Kent Beck and Michael Feathers start to explore Continuous Deployment Immersion and zero-length iterations. These aren’t kids looking to launch a Web 2.0 site, they are leaders who the development community looks to for insight, for what is important and right in building software.

There is a clear risk here of widening the already wide disconnect between the software development community and the security community.

On one side we have Lean and Continuous Deployment evangelists pushing us to get software out faster and cheaper, reducing the batch size, eliminating overhead, optimizing for speed, optimizing the feedback loop.

On the other side we have the security community pleading with us to do more upfront, to be more careful and disciplined and thoughtful, to invest more in training and tools and design and reviews and testing and good engineering, all of which adds to the cost and time of building software.

Our job in software development is to balance these two opposing pressures: to find a way to build software securely and efficiently, to take the good ideas from Lean, and from Continuous Deployment (yes, there are some good ideas there in how to make deployment more automated and streamlined and reliable), and marry them with disciplined secure development and engineering practices. There is an answer to be found, but we need to start working on it together.

Saturday, March 13, 2010

Secure Software Development: Awareness Training

I’ve been looking for a good basic awareness course on software security as an introduction for new developers and as a refresher for the rest of the team. A course that covers the important bases, and reinforces fundamental issues in building secure software. I prefer online training for something like this: if you have the discipline, online and on-demand training works because it is easier to schedule, people can go as fast or as slow as they like, and of course it is less expensive.

Stanford Software Security Foundations

Last year I took the Software Security Foundations course, the first course in Stanford University’s Advanced Computer Security certificate program:6 courses, all offered online, on-demand.

Foundations is a basic introduction to software security for developers and technical managers. This course was designed by (and principally delivered by)
Neil Daswani a former security program manager at Google, a Stanford alumni, and now a principal at Dasient, a web computer security startup.

The course is based on Mr. Daswani’s book Foundations of Computer Security: What Every Programmer Needs to Know: it is useful to have a copy of the book handy while going through the course. Foundations is a day’s worth of lectures made available as online videos for a 3-month period, with slides that can be downloaded for printing, and an exam at the end to ensure that you aren’t lazy about following the material. It covers the fundamentals of software security, including:

Security Principles

Authentication, authorization, access control, confidentiality, data integrity, non-repudiation, risk management, and secure system design principles including least privilege, fail-safe and defense-in-depth. Good, but no real surprises here.

Secure Programming

Buffer overflows, SQL injection, password management, cross site request forgery, cross site scripting, and common mistakes in crypto. There was good coverage of SQL injection and cross domain security threats, particularly XSRF. However I found the explanation of Cross Site Scripting confusing, even after reading through the section on cross domain security in the book – it’s not a straightforward problem, ok, but it can be explained in a simpler way. The examples and problems are biased slightly towards C/C++ programmers, but this should not present a problem for programmers working in other environments. It covers most of the important bases with the exception of input validation, which needs more attention.

Introduction to Cryptography

A walkthrough of symmetric encryption and public key cryptography, and then a brief discussion of advanced research in cryptography from Stanford Professor Dan Boneh, an expert in applied cryptography. The explanation of cryptographic primitives was especially lucid, and, well beautiful: I didn’t know that cipher block chaining could be beautiful, but it is. Some of the more advanced issues in cryptography are covered at a cursory level only (not a lot of it will stick with you if you aren’t already familiar with this subject), and you are referred to other courses offered in the program if you want to get a good, basic understanding of how to use cryptography and secure protocols.

SANS Software Security Awareness

In January I checked out the On Demand Software Security Awareness course from the SANS Institute.

I have had success with other courses from SANS before, classroom and On Demand. In particular the course SANS Security Leadership Essentials for Managers is excellent: a challenging and exhaustive 6-day course covering pretty much everything a technical manager would need to know about IT security from technical, project management, risk management, strategic, and legal and ethical perspectives.

Software Security Awareness is a short, 3-hour survey course, offered online as recorded audio lectures and a slide show. SANS prints and binds a copy of the slides and couriers a copy to you shortly after you register for the course. The course covers:

- vulnerability/patch cycle
- security architecture
- principles of software security
- security design
- implementation (coding and deployment)
- input issues
- code review and security testing.

Unfortunately, the course does not start out strong: the walkthrough of the vulnerability/patch cycle is supposed to help build a case for secure software development, but it takes too long to make too fine a point, and it’s hard to stay interested.

The next sections on architecture, principles of software security, and design are confused by a weakness in organization. It’s not clear where architecture ends and design begins, and there is unnecessary repetition between the sections. It would flow better if the discussion started with foundational principles of software security (which are well covered in the course), then proceeding through architecture and then design. Architecture should cover security requirements, risk analysis and threat modeling to determine “how much security is enough”, defense-in-depth, attack surface analysis and minimizing attack surface, complexity and economy of mechanism, cost considerations, and layering and defining trust zones. Some of these issues are covered in the discussion on architecture, some in design, some in both, and some not at all.

The risk assessment discussion surveys different risk modeling techniques, including Microsoft’s STRIDE and DREAD, OCTAVE and others. It includes a recommendation for SALSA, an initiative which SANS seems to have started together with another vendor back in 2007 and which has gone nowhere since. SALSA doesn’t belong with the other established models.

The section on design covers security requirements (again) and some general motherhood on good design practices, then briefly touches on authentication, authorization and permissioning, non-repudiation and auditing, and data confidentiality, but doesn’t explore any of these important areas further in the context of design. This is one area where the Stanford course is much stronger.

The discussion of implementation is good, starting with an explanation of Gary McGraw’s 7 Kingdoms and then a long list of implementation-specific issues, which may or may not apply to your system, but anyways offers some concrete guidance which may catch the attention of programmers.

Then an excellent, brief exploration of input handling problems, recognizing the importance of not trusting data. And a (too) brief discussion of cryptography in the context of storing confidential data.

Finally, one of the strongest parts of the course, on security code review practices and security testing. Starting with an explanation of why and how to do code reviews (including formal inspections and tool-assisted code reviews), and references to language-specific checklists. Then a discussion of static analysis and a survey of different tools: a bit out of date, missing Microsoft’s SDL tools for example, but a decent overview of available static analysis tools. Then a good survey of testing techniques from a security perspective, including fuzzing and pen testing and risk-based testing, and finally a discussion of attack surface analysis (which more properly belongs in the architecture section).

You have 4 months to complete the 3 hour course, which is more than adequate. Because this was a beta of a new course and a new online delivery model, there was no assessment included.

Comparison of the courses

It was interesting to see that, except for a common focus on basic secure software principles, the courses took different, almost complementary approaches to introducing secure software development.

The SANS course was much more broad and current, with more references to resources like OWASP and frameworks and tools, and more on secure SDLC practices. There were some notable gaps in the SANS course, in architecture and especially design, and it suffers from some structural weaknesses, but it covers many of the bases, and important concepts such as secure error handling, failing safe, input handling, and risk management and threat modeling techniques. With a few improvements in structure and content around architecture and design, this would be an excellent course. As it stands, if I can get the developers past the introduction on the vulnerability cycle, they should learn some useful things.

The Stanford course is more than twice as long and more focused, but doesn’t touch on secure SDLC practices such as reviews and security testing, or risk assessment and threat modeling. It digs deeper into specific security implementation problems such as XSRF, XSS, password management, and SQL injection; and it is also stronger in its coverage of basic secure design, and especially cryptography. This course would be of special interest to developers who want to go on with the full Advanced Security Program at Stanford.

Neither of these courses is perfect, but they are both professionally delivered, and offer good value for the money.

Tuesday, March 2, 2010

Continuously Putting Your Customers at Risk

Over the past year or so there has been some buzz around Continuous Deployment: immediately and constantly pushing out new features directly to customers. This practice is followed by companies such as Facebook, Flickr and IMVU, where apparently programmers push changes out to production up to 50 times per day.

Continuous integration and rapid deployment have a number of advantages, and continuous deployment appears to maximize value from these ideas: immediate and continuous feedback from customers, and cutting deployment and release costs way down, eliminating overheads and manual checks.

A profile on Extreme Agility at Facebook describes how the company’s small development team deploys rapid updates to production:

“…Facebook developers are encouraged to push code often and quickly. Pushes are never delayed and applied directly to parts of the infrastructure. The idea is to quickly find issues and their impacts on the rest of the system and surely fixing any bugs that would result from these frequent small changes.”

But there are some fundamental and serious challenges with continuous deployment. Success depends on a few key factors:

a comprehensive and fast automated test suite to catch mistakes, especially regressions;
customers that are willing to let you test in production;
an architecture that catches and isolates failures, preventing problems from chaining or cascading across the cluster;
a disciplined, proven, robust deployment model.

Automated Testing

At IMVU, all changes are run through an automated test suite, which executes on a cluster of test servers, before deploying to production.

"So what magic happens in our test suite that allows us to skip having a manual Quality Assurance step in our deploy process? The magic is in the scope, scale and thoroughness.”

The author goes on to say that

“We have around 15k test cases, and they’re run around 70 times a day. That’s a million test cases a day.”

Ummm, actually, no it isn't. That’s 15k test cases a day. Run 70 times [and no explanation why the tests are 70 times per day, since we’re talking about pushing out 50 changes per day, but anyways…]. You could run 15k test cases a million times and it would still be 15k test cases, in the same way that running 1 test case a million times is still only 1 test case.

A regression suite of 15,000 automated unit and functional tests sounds impressive – of course what is much more important than the number of tests is the quality of tests. We’re a small shop, and we run more than 15,000 automated tests as part of our continuous integration environment, and we also run static analysis checks on all code, and we do peer code reviews, and manual testing including operations tests, exploratory and destructive tests, system trials, stress and performance testing, and end-to-end integration tests before we push to production.

From subsequent statements, it seems clear that most of the changes deployed this way at IMVU at least are trivial, bug fixes and minor modifications: schema changes, for example, are made out of band (they take 2 days to rollout to production at IMVU). So we can assume that the scope, and therefore, risk of any one change can be contained. This is backed up by a comment made by the author at 50 Deployments A Day and the Perpetual Beta:

“when working on new features (not fixing bugs, not refactoring, not making performance enhancements, not solving scalability bottlenecks, etc), we’ll have a controlled deliberate roll out plan that involves manual QE checks along the way, as well as a gradual roll0out and A/B testing.”

I’m not sure why you wouldn’t have a controlled roll-out plan for solving scalability bottlenecks, but let’s assume that he was referring to minor tweaks to the code or configuration, say increasing the size of a resource pool or something.

Testing in Production

After hearing about the approach followed by IMVU last year, a couple of exploratory testing experts, Michael Bolton and James Bach, spent a few minutes trying out IMVU’s system. They, not surprisingly, found a lot of problems without making much of an effort:

“Yes folks, you can deploy 50 times a day. If you don’t care bout the quality of what you’re deploying…”

The writer from IMVU admits that they have a lot of bugs:

“continuous deployment lets you write software *regression free*, it sure doesn’t gift you high quality software.”

Ignore the “*regression free*” claim which assumes that the test suite will always catch *all* regressions. Continuous Deployment essentially concedes the job of testing to your customers: you do some superficial reviews and regression tests, and leave the real work of finding problems to your customers. I can appreciate that this might be acceptable to some customers, for example people participating in online communities or online games, as they trade off the inconvenience of occasional glitches against the chance to try out cool new features quickly.

There’s nothing wrong with running experiments, trying out new ideas with part of your customer base through A/B split testing, seeing what sticks and what doesn’t. But this doesn’t mean you need to, or should, deploy every change directly to production. Again, if your customers are trusting you with financial transactions or sensitive personal information, you would be irresponsible if you took this approach, even if, like Facebook, you only push out changes incrementally to a small number of clients at a time.

Failure Isolation and Fail Safe

If you are going to continually roll changes, and anticipate that some of these changes will fail, the architecture of the system needs to isolate and contain failures, using internal firewalling, fast-fail techniques, timeouts and retries and so on to reduce the likelihood of a failure chaining through layers or cascading across servers and taking the cluster down.

Unfortunately, this doesn’t seem to be the case, at least in the example described here, where Alex, a programmer, is preparing to deploy code containing a 1-character typo which can cause a failure cascade and take out the site:

“Alex commits. Minutes later warnings go off that the cluster is no longer healthy. The failure is easily correlated to Alex’s change and her change is reverted. Alex spends minimal time debugging, finding the now obvious typo with ease. Her changes still caused a failure cascade, but the downtime was minimal."

So the development team is not only conceding that they cannot write good code, and that they are incapable of doing a decent job testing their work, but also that they cannot take the steps to fail safe at an architectural level and minimize the scope of any failures that they cause. This is not the same as conceding, as Google does, that at massive scale, failures will inevitably happen, and you will have to learn how to deal with them. This is simply giving up, and pushing risk out to customers, again.

A Deployment Model that Works

The deployment model at IMVU is cool: of course, if you are going to do something 50 times per day, you should be pretty good at it.

“The code is rsync’d out to the hundreds of machines in our cluster. Load average, cpu usage, php errors and dies and more are sampled by the push script, as a basis line. A symlink is switched on a small subset of the machines throwing the code live to its first few customers. A minute later the push script again samples data across the cluster and if there has been a statistically significant regression then the revision is automatically rolled back. If not, then it gets pushed to 100% of the cluster and monitored in the same way for another five minutes. The code is now live and fully pushed.”

This rollback model assumes that problems will be found in the first minute, or few minutes of operation. It does not account for race conditions, deadlocks and other synchronization faults, or time-dependent problems or statistical bugs or downstream integration conflicts or intermittent problems which might not show up for hours or even days, by which time dozens or hundreds of other changes have been applied. Good luck finding out where the problem came from by then.

The rollback approach also assumes that all that needs to be done to fix the problem is to rollback the code. It does not account for what needs to be done to track down and repair broken transactions and corrupt data, which might be ok in an online gaming environment, but would be CTO-suicide in a real system.

And you wonder why some web sites have serious software security issues?

Let’s put aside the arguments above about reliability and quality and responsibility, and just look at the problem of security: building secure software in an environment where developers push each change immediately to production.

First, try to imagine someone detecting unauthorized changes with all of this going on. Was every one of those changes intended and authorized? Would you be able to detect an attack in the middle of all of this noise?

Then there’s the problem of building secure software, which is hard enough even if you follow good practice, especially if you are moving fast. There has been a lot of work over the past couple of years defining how developers can build secure software in an agile way. Microsoft’s SDL for Agile maps secure software design and development practices to agile development models like Scrum, and cuts secure software development controls and practices down to fit rapid incremental development.

But with Continuous Deployment, at least as described so far, there is no time or opportunity to do even a minimal set of security checks and reviews before software changes are pushed out by developers.

It’s bad enough to build insecure software out of ignorance. But by following continuous deployment, you are consciously choosing to push out software before it is ready, before you have done even the minimum to make sure it is safe. You are putting business agility and cost savings ahead of protecting the integrity or privacy of customer data.

Continuous deployment sounds cool. In a world where safety and reliability and privacy and security aren’t important, it would be fun to try. But like a lot of other developers, I live in the real world. And I need to build real software.

Wednesday, February 10, 2010

And now we need to be "Rugged"

A new initiative for secure software development, for Rugged Software Development was announced this week at a SANS conference. Rugged Software is

a value system for writing secure software

defined by some smart people in the application security industry.

Presumably the Rugged Software initiative is attempting to duplicate the success of the agile software movement, coming with its own Rugged Software Manifesto:

I am rugged… and more importantly, my code is rugged.

and so on.

The agile development movement was successful because it was driven by and for the people who actually build software: by programmers, for programmers. By smart, experienced programmers, people like Kent Beck and Ward Cunningham who built software for a living and were really good at it, and who were searching together for ways to solve the problems that programmers face in software development, problems that mattered to programmers. It came from inside the software development community, and set out to put programmers effectively back in charge of building software, to make better software, to make the making of software better.

And agile development, at least at the beginning, was cool, counter-culture: agile developers were sticking it to the man, doing what was right, subverting big upfront design and top-down planning and by-the-book project management and so on. It was certain to create a following…. and unfortunately, eventually to become an institutionalized Methodology subsidized by tool vendors and consultants, but that’s another story for another day.

According to one of the founders of the Rugged Software initiative

Getting the secure software development message to the masses won't be easy, and the plan is to get some initial support and momentum from the application security industry.

However well-intentioned and necessary, it looks like another set of ideas and values being imposed from outside on people who are busy building software. We already have other application security initiatives: Cigital's Build Security In and its maturity model for the enterprise, Microsoft’s SDL for the Microsoft community at least, OpenSAMM and other initiatives from OWASP, and half-baked ideas from the InfoSec community like SALSA.

And now we have Rugged Software Development.

To succeed, the initiative needs support and momentum not just from the application security community, but more importantly from the software development community – from the people who actually build software.

Fair enough, these smart and well-intentioned and hard working InfoSec guys are asking for input and participation from the development community. So after being challenged to “walk the walk" I signed up for the Rugged Software forums, blogs, lists and…. Well, there’s the announcement and some trade press coverage. And that Manifesto about ruggedness, and an empty blog and an empty forum. That’s it, that's all I have been able to find so far.

So, I guess I was walking too fast. I will wait and see if there is a real opportunity here, a chance for an initiative that speaks to, and for, the software development community, something that has a real chance to succeed.

Saturday, February 6, 2010

Real Resources for Software Development Managers

With all of the blogs and books and training programs and even a few magazines still available on software development, agile methods and software project management, there is surprisingly little material that is of real value to a software development manager: information that is thoughtful, current and grounded in real experience.

A lot of what you can find is noise. Blog postings by enthusiastic kids who have finished a project or two using Scrum and are now self-made experts on team dynamics and Agile development. Someone pushing a Project 2.0 collaboration tool or yet another “Agile Software Development using…” or “Agile Project Management…” book, or blogs going over (and over) the same ground, or tiresome evangelism for “Big A” Agile training and consulting.

It’s difficult to find useful information in all of this for software development managers who want to dig deeper. I’ve put together a list here of the resources that I have found useful, and find myself going back to.

Construx Software Builders

Construx is a consultancy specializing in software development management, founded by Steve McConnell, a leading thinker on software engineering and best practices in software development, and author of some of the definitive books on software development: Rapid Development and Code Complete.I learned a lot about good software engineering and software development management from these books. While Code Complete, a guide to writing good, clean code, was significantly revised in 2004, Rapid Development is showing its age, and needs to be updated to take into account XP and Scrum and other new ideas in software development. But it is still the best overview available of SDLCs and the risks and success factors in software projects.

Construx offers a wide range of consulting services including organizational reviews and project reviews and software due diligence reviews for acquisitions. They also offer an excellent set of training programs on software project management and software engineering. Some of the courses that my team and I have attended include:

Code Complete Essentials on the basics of good software development

Master Class on Estimating Software Projects

Developer Testing Bootcamp

Professional Tester Bootcamp

and 10x Software Engineering an excellent course on improving software development results for experienced managers.

You can also get access to white papers and posters including their famous list of Classic Mistakes in software development; and CXOne, a lightweight framework for managing software projects, with templates, samples and checklists.

Once each year or so, Construx holds an Executive Software Summit for experienced software development managers, CTOs and other senior people interested in improving how software is built. This is head and shoulders above the other software management conferences that I have attended.

Agile Development Resources

Scrum seems to have won the Agile development methodology wars over XP and DSDM and Crystal and whatnot, fundamentally because it is much easier to understand and follow (and also, unfortunately, easier for teams to build sloppy software faster). So, until Lean Software Development or some other new idea establishes the next wave you should make sure to understand Scrum, even if you don’t swallow it whole.

Certified Scrum Master training is not expensive, it doesn’t take long, and you’ll leave with a decent understanding of the method and its values and driving principles. Make sure to get a good teacher: I went straight to the source, Ken Schwaber. His book Agile Project Management with Scrum summarizes what you’ll learn from the course and is a good resource for follow-up.

Of the Agile / Scrum community blogs, the most valuable that I have found is Mike Cohn’s Succeeding with Agile and I would recommend his book of the same name if you are serious about understanding and implementing Scrum.

Pragmatic Programmer’s Bookshelf

There are a lot of good books on software development in the Pragmatic Programmer’s Bookshelf including a set of special interest for software development managers:

Manage It! by Johanna Rothman, is a simple but excellent book on incremental and iterative development practices, scheduling and estimating and risk management for small and mid-size projects and teams. This is the best, most practical book I have found on applying lightweight, agile development methods, and while it leans towards Scrum it is not dependent on any single methodology. This book also introduces program management and project portfolio management, which is explored in more detail in Manage Your Project Portfolio: useful if you’re new to the problems of managing small programs using lightweight techniques.

If you are serious about program and portfolio management, and you have the time and money, I strongly recommend the professional program on Advanced Project Management at Stanford University which you can attend online or on campus. This is a world-class program on aligning strategy with execution, managing customers, understanding and exercising power and influence, and coordinating and planning integrated programs and project portfolios.

Johanna Rothman is also the co-author (together with Esther Derby) of Behind Closed Doors, an introductory but good book on coaching and mentoring developers and managing teams. It offers practical advice and good reminders on issues like managing priorities, the value of 1-on-1 meetings, how to deal with technical people, how to give feedback – a leadership resource, but written specifically for software development managers and team leads.

Ship It! by Jared Richardson, Will Gwaltney, Jr, outlines the basics of building and deploying software in an agile context: the use of tools for source code control and continuous integration, useful strategies for adopting automated unit testing, basic engineering practices, small team leadership techniques, fundamentals of incremental development, common problems faced.
Of the books in this series, this is the most basic / introductory, but it is worth a quick read for a framework for iterative, incremental development.

Release It! by Michael T. Nygard, is an excellent resource on technical architecture for distributed, web-based (especially Java) systems – a high-level view of the challenges that your team will face building real systems, patterns and anti-patterns for stability and scalability, and how to engineer a system for real world operations. It is the “hardest” of these books, and will be useful to your architects and Dev/Ops team. You can follow Michael’s current work at his blog, Wide Awake Developers.

Although not part of the Pragmatic Programmer’s suite, you also need to read The Visible Ops Handbook, a simple, practical introduction to IT systems change management and release management.

Software Project Management Resources

I wrote earlier about Scott Berkun’s Making Things Happen, a reissue of The Art of Project Management. This is a practical, focused and well-written book on basic issues in software project management, on leadership and communications and especially on execution, based on his experience managing programs at Microsoft.

The Mythical Man Month, by Fred Brooks – yes, it really is worth reading and re-reading if you haven’t read it in a while.

As a counterbalance to all of the agile and small team collaboration stuff, I enjoy following Herding Cats. This blog is all about large-scale, high-risk, high-complexity, safety-critical projects and large-scale programs like manned space flight and nuclear power and weapons systems. It offers a completely different and provocative perspective on project management problems, and is a good resource on complexity and risk management.

Leadership Resources for Software Development Managers

Cornell University offers an excellent online program on High Performance Leadership covering change management, leadership, negotiation and coaching.

Another excellent resource on leadership is the Center for Creative Leadership, which offers inexpensive live or pre-recorded webinars.

The American Management Association and its Canadian equivalent, the Canadian Management Center, provide excellent training programs on management and leadership.

The only leadership blog that I follow consistently is Great Leadership. It’s a bit cheerleadie at times, but it it is thoughtful and has good links to other leadership resources.

The general leadership books that I have found useful and worth going back to are Difficult Conversations and Getting to Yes both developed out of the Program on Negotiation at Harvard Law School. Its Program on Negotiation for Senior Executives is an excellent course, and definitely worth taking for development managers (don’t be put off by the “for Senior Executives” label).

The Best of the Software Development Manager Blogs

Of the many other software development management blogs and forums, there are only a couple that I follow regularly:

Joel on Software of course: it's well-written and provocative, and focuses on small-scale software development and on how to run a software business. Unfortunately, for the last few months the focus has been on one of Joel’s latest ventures, Stackoverflow: a useful and free resource for problem solving for developers. While this has been good for Stackoverflow, it has not made for especially interesting reading. Here’s hoping that Joel gets back to subjects of wider interest soon.

Hard Code by Erich Brechner, the head of the best practices group at Microsoft. While this is written for Microsoft’s internal developers, most of the issues and problems that he explores apply generally to the software development community, and it is entertaining and real.

I’ll add to this resource list as I find other useful information.

Saturday, January 23, 2010

Software Project Manager's Bridge to Agility

The Software Project Manager’s Bridge to Agility by Michele Sliger and Stacia Broderick sets out to map the PMBOK project management practice framework to an Agile practice framework (essentially Scrum) for managing software development projects.

The book introduces Agile project management concepts, then explains how the PMBOK practices for Integration Management, Scope Management, Time Management, Cost Management, Quality Management, Human Resources Management, Communications Management, Risk Management and Procurement Management are realized in an Agile project management approach. This analysis is necessarily repetitive as Agile project management is essentially simple, solving a broad set of different problems using the same set of basic principles and techniques: a flat, collaborative team structure where the project manager primarily acts as a coach and facilitator; controlling scope and quality and managing risk through time-boxing and frequent reviews; maintaining a close working relationship with the customer through its representative product owner; and lightweight (just enough and no more) documentation.

The authors are certified PMPs and Agile consultants so they have a foot in both worlds, although it is clear where their bias (and their source of incomes) lays. The book that they have written is not a bad book – it offers an interesting analysis of two essentially different approaches to managing software projects, and reinforces the value of fundamental Agile development principles. But it falls short in a number of key areas:

Like too much writing on Agile management, the book offers a simplistic and naïve view of the scope of a project manager's responsibilities, and minimizes especially the need for preparation and planning and design. I suppose my concern here is not so much with the book, as with the weakness of the approach advocated: but to act as a bridge between the PMBOK and Agile management, the book should address the needs and concerns of project managers, especially for projects that are not trivial. In the Agile framework described in the book, there is an unfortunate lack of upfront planning and preparation: it essentially consists of some vague work on the part of the project manager to secure sponsorship for the project, and an upfront vision meeting where the team (who, naturally at this point, likely knows nothing at all about the domain or the problems to be solved) meets with the customer and together lays out the high-level boundaries of the project and the priorities and features (the Backlog), and agrees on how to describe the project’s vision in a few sentences (an Elevator Statement).

This raises a number of concerns of course, but one of the most fundamental is a bootstrapping problem: if you haven’t prepared, done some research and upfront analysis, if you don’t know anything about what you are being asked to try to build, then how do you decide who should be on the team? The fundamental assumption here is that the team is an initial constraint: you get a small group of cross-trained generalized specialists who can handle any kind of problem with minimal upfront work, and make do with that team. For simple business applications, I am sure this could work. For complex science or engineering problems that demand specific domain or technical (or both) experience to even understand what is going on and what the customer is talking about, never mind delivering what the customer needs…. umm, no.

Like others in the Agile community, the authors allow for an “Iteration Zero”: some brief upfront work to setup the environment, for ramp-up and discovery, approvals, design discussions, maybe some training for the team in the domain and tools. But the authors encourage the team to deliver working software in this first (zeroeth actually) iteration, again short changing time for upfront analysis and discovery.

I am a strong advocate of incremental, iterative design and development, and I fully appreciate the concerns of wasting time and resources in what Steve McConnell calls the “fuzzy front end” of a project. But rushing through the initial analysis and discovery and technical planning and integration planning steps before you get a chance to understand something about the domain and the technical requirements and the preparedness of the team would be suicide if you were managing a project that was part of a large program, or an embedded software development project, or building a safety-critical system, or really anything more than a trivial business system or maybe a web portal: and even here you are dependent on the team’s experience and preparedness, their technical competence to fill in the necessary gaps and catch up.

The Agile approach described in the book for project estimating and scheduling is fundamentally weak as well: there is no attempt (at least that I could find) to create a high-level schedule for the project. At the start of the project, in the project vision meeting

the scope is defined a very high level. It is not uncommon to leave the vision meeting with only a dozen or so features identified…

In the section on Time Management, the book explains how estimates are made “at the strategic level” – the level of release planning – through abstract story point estimates, although the method by which the team translates story points into useful measures of time and cost is not made clear (yes I know how it works, but you won’t understand it from this book). A further weakness here is the approach taken to create the estimates: the team picks a feature, estimates it in an abstract way based on their understanding of its complexity, then goes on to estimate the complexity of other features relative to the first one. Of course, the first feature is the least likely to be estimated correctly, as the team has the least amount of information available when preparing that estimate. All of the other estimates are calibrated to this first estimate, amplifying its inaccuracy.

The only time that the team does any substantial estimating is when they are working through the details at the start of each iteration inside a release. Although there is a reference to the team using a “prioritized, estimated product backlog” in release planning, it is not clear when these estimates are created, or how the project schedule itself is defined – i.e., how the team arrives at a date by which the project is scheduled to be completed.

Now I understand and agree that estimates done in initial planning of a project are notional only, that there is a wide variance in accuracy of these estimates, and that the accuracy of estimates will improve over time. This is commonly understood in software development, inside and outside of the Agile community. But there needs to be some attempt at estimating the scope of delivery as understood at the beginning so that the project can be chartered, so that money and people and supporting resources can be committed; and, if this is more than a trivial software development project, so that requirements of other teams working on related projects or programs can be understood, integration points and dependencies defined, and commitments made. This is a fault of the book and not of the method: there is no clear tie-in from project initiation to the planning of each release in the book, although there must be one.

The book also assumes a narrow scope for the project manager’s, and the team’s, responsibilities and concerns: that they are focused only on developing the software, and don’t need to be concerned with integration and coordination with other projects or programs. The authors do allow for an “Iteration H”, a bracketing iteration at the end of the project for “hardening” the system: preparing documentation, packaging, training operations and so on – oh and of course leave time for celebration. Ah, if project management was only that simple. While the book makes a brief reference to the need for the PM to take care of external dependencies, it does not explain how these are managed within the Agile context – as far as I can see, the answer is: they are not.

I was hoping that the book would offer some ideas on how to manage Agile projects with strict contracting models and fixed constraints. Unfortunately again, this concern is disregarded:

“Agile projects have the ability to always be on time, and within budget, if the scope is flexible”.

Umm, stating the obvious, yes this is true. But if you are not delivering what the customer needs, or asked for in the contract, when they need it, what is the point? The book doesn’t offer guidance to the PM on how to reconcile iterative delivery and elastic scope with the cost and time and other business constraints of the contract, and other constraints in a larger program: in any real project, the work of other teams is dependent on your team delivering an expected scope of work by an expected date. Instead, the book assumes that the team working with the customer’s product owner will be able to resolve changes in scope collaboratively, and presumably that there will be no need to worry about contractual terms and commitments and other business issues. In this world nobody appears to be accountable for managing the tie-in back to the contract: as long as working software is delivered, everyone will be happy and everyone will be paid.

It is disappointing, but not a surprise, that the authors don’t offer ideas on how to effectively manage fixed-price, fixed-schedule projects using Agile techniques. Ken Schwaber, one of the authors of the Scrum method, admits in Agile Project Management with Scrum that he does not know of a way to deal with this problem either, and suggests trying a different contracting model based on essentially conceding upfront that you won’t be able to meet the customer’s requirements, and selling them instead on an iterative, collaborative business model. If you are able to stay in business trying this, good on ya mate.

The book does address Risk Management, primarily relying on the intrinsic risk management practices in iterative, incremental development. I agree that iterative, incremental development in short time-boxes, coupled with solid engineering practices, significantly reduces many kinds of project risks – this is why we follow this approach. The book does go further to describe a basic model for those teams who decide that they need to do explicit risk management – reviewing risks and maintaining a Risk Board for each iteration. This is a good approach, but does not take into account larger or longer-term risks: the team is focused on dealing with risks and technical issues immediately in front of them.

My biggest concern with this book is that it takes such a fundamentally simplistic and unnecessarily critical view of “traditional plan-driven projects”. In these projects quality is short changed; project managers only go through the motions on risk management; and of course the planning process does not allow for iteration – all projects except for Agile projects follow a pure waterfall model and attempt to plan everything upfront in detail and fail manage to this plan. Where does this come from? Almost all large projects that I am aware of are managed through spiral or phased development SDLCs, and include prototyping and other iterative techniques. Rolling wave planning has been a recognized best practice in project management for a long time. It is absurd to take the position that PMs managing large, complex projects and programs are not aware of, and not following, these practices and that the only alternative is to follow Scrum.

It’s not clear to me who the audience for this book is. Project Managers (specifically PMPs) who don’t understand Agile methods? But how is this possible with all of the noise about Agile methods, and taking into account that Agile methods are essentially and purposefully simple – you can learn the Scrum approach in an afternoon, and a Certified Scrum Master course involves only a couple days of training.

The naïve and critical view that the authors take of planning and project management will not convince PMPs who manage large or complex projects that they will be more successful following Agile techniques. Or is the audience inexperienced managers who are trying to chose between the PMBOK and Agile methods, not understanding that they can (and should, and in fact if they are managing anything of any complexity, must) do both?