Building Real Software

Friday, March 8, 2013

Book Review: The Phoenix Project

Everyone who attended the “Rugged Devops” panel at RSA this year received a free copy of The Phoenix Project (by Gene Kim, Kevin Behr and George Spafford – the authors of Visible Ops), the fictional story of the education and transformation of an IT manager, an IT organization, and eventually of an entire company.

I'm not sure why they wrote the Phoenix Project as a novel. But they did. So I’ll review it first as a piece of story telling, and then look at the messaging inside.

The reason that I don’t like didactic fiction is that so much of it is so poorly written – generally forced and artificial. I was pleasantly surprised by the Phoenix Project. The first half of the novel tells a story about an IT manager forced into taking on responsibility for saving his company. Our well-meaning hero, an ex-marine sergeant (for some reason unclear to me, the hero, the CEO, and even the mysterious guru all have a military background) with an MBA but without any ambition except to quietly provide for his family. He has been successfully running his own little part of the IT group, so successfully that he is dramatically promoted to take over all of IT operations (and so successfully that his own group is never mentioned again in the story – it seems to run on auto-pilot without him). For a successful manager, our hero knows alarmingly little about how the rest of the IT organization works, or about the business, and so is unprepared for his new responsibility. He reluctantly accepts the big job, and then regrets it immediately as he realizes what a shit storm he has walked into.

It’s a compelling narrative that draws you in and is seems realistic even with the stock characters: the sociopathic SOB CEO, the unpopular everything-by-the-book CISO, the Software Development Manager who only cares about hitting deadlines (but not about delivering software that works), the Machiavellian Marketing executive, and the autistic genius that the entire IT organization of several hundred people all depend on to get all the important stuff done or fixed.

As a pure piece of story telling, things start to unravel in the second half with the emergence of the IT / Lean Manufacturing guru – when the story ends and the devops fable begins. From this point on, the plot depends on several miraculous conversions: everyone except the marketing exec sees the light and literally transform overnight. They even start to groom themselves nicely and dress better. Lifetime friendships are made over a few drinks, and everyone learns to trust and share: there’s a particularly cringe-inducing management meeting in which people bare their souls and weep. Conflicts and problems disappear magically usually because of the guru’s intervention – including an unnecessary scene where the guru shows up at a crucial audit meeting and helps convince the partner of the audit firm (an old buddy of his) that there aren't any real problems with the company’s messed up IT controls (“these aren't the droids you’re looking for”).

But the real heroes are the people running the manufacturing group: the one part of this spectacularly mismanaged company that somehow functions perfectly is a manufacturing plant where everyone can all go to learn about Kanban and Lean process management and so on. Without the help of the smug and smart-alecky guru - who apparently helped create this manufacturing success - and his tiresome Zen master teaching approach (and sometimes even with his help), our hero is unable to understand what’s wrong or what to do about it. He doesn't understand anything about demand management, how to schedule and prioritize project work, that firefighting is actual work, how to get out of firefighting mode, how to recognize and manage bottlenecks in workflow, or even how important it is to align IT with business priorities (where did he get that MBA any ways?). Luckily, the factory is right there for him learn from, if he only opens his eyes and mind.

What will you learn from The Phoenix Project?

The other problem with this story telling approach is that it takes so damn long to get to the point – it’s a 338 page book with about 50 pages of meat. Like Goldratt's The Goal (which is referenced a couple of times in this book), The Phoenix Project leads the reader to understanding through a kind of detective story. You have to be patient and watch as the hero stumbles and goes down blind alleys and ignores obvious clues and only with help eventually figures out the answers. Unfortunately, I'm not a patient reader.

This is a gentle introduction to Lean IT and devops. If you've read anything on Kanban and devops you won’t find anything surprising, although the authors do a good job of explaining how Lean manufacturing concepts can be applied to managing IT work. The ideas covered in the book are standard Lean and Theory of Constraints stuff, with a little of David Anderson’s Kanban and some devops – especially Continuous Deployment as originally described by John Allspaw and Continuous Delivery.

The guru’s lessons are mostly about visualizing and understanding and limiting demand – that you should stop taking on more work until you can get things actually working so that you’re not spending all of your time task-switching and firefighting; identifying workflow bottlenecks and protecting or optimizing around them; how reducing batch size in development will improve control and to get faster feedback; that in order to do this you have to work on simplifying and standardizing deployment; and how valuable it is to get developers and operations to work together.

My complaints aren't with the ideas – I buy into a lot of Devops and agree that Kanban and Lean have a lot to offer to IT ops and support teams (although I'm not sold on Kanban by itself for development, certainly not at scale). But I was disappointed with the unrealistic turnaround in the second half of the book. It’s all rainbows and unicorns at the end. IT, the business, development and security all start working together seamlessly. Management is completely aligned. Performance problems? No problem – just go the Cloud. And they even bring in the famous Chaos Monkey in the last couple of pages just because.

Spoiler Art: Everything goes so well in a few months that the company is back on track, plans to outsource IT and to break up the company are cancelled, the selfish head of marketing is canned, and our hero is promoted to CIO and put on the fast track to corporate second in command. Sorry: nothing this bad gets that good that easily. It is a fable after all, and too hard to swallow.

The Phoenix Project is a unique book – when was the last time that you read an actual novel about IT management?! It was worth reading, and if it introduces devops and Lean ideas to more people in IT, The Phoenix Project will have accomplished something useful. But it’s not a book that you can use to get things done. There are lessons and recipes and patterns but they take work to pull out of the story. There’s no index, no good way to go back to find things that you thought were useful or interesting. So I am looking forward to the Devops Cookbook: something practical that hopefully explains how these ideas can work in businesses that don’t look all like Facebook and Twitter.

Thursday, March 7, 2013

Peer reviews for security are a waste of time?

At this year’s RSA conference, one of the panel’s questioned whether software security is a waste of time. A panellist, John Viega, said a few things that I agreed with, and a lot that I didn't. Especially that

“peer reviews for security are a waste of time.”

This statement is wrong on every level.

Everyone should know by now that code reviews find real bugs – even informal, lightweight code reviews.

“Reviews catch more than half of a product’s defects regardless of the domain, level of maturity of the organization, or lifecycle phase during which they were applied”. What We Have Learned About Fighting Defects

Software security vulnerabilities are like other software bugs – you find them through testing or through reviews. If peer code reviews are a good way to find bugs, why would they be a waste of time for finding security bugs?

There are only a few developers anywhere who write security plumbing: authentication and session management, access control, password management, crypto and secrets handling. Or other kinds of plumbing like the data access layer and data encoding and validators that also have security consequences. All of this is the kind of kind of stuff that should be handled in frameworks anyway – if you’re writing this kind of code, you better have a good reason for doing it and you better know what you are doing. It’s obviously tricky and high-risk high-wire work, so unless you’re a team of one, your code and decisions need to be carefully reviewed by somebody else who is at least as smart as you are. If you don’t have anyone on the team who can review your work, then what the hell are you doing trying to write it in the first place?

Everybody else has to be responsible for writing good, defensive application code. Their responsibilities are:

Make sure their code works – that the logic is correct
Use the framework properly
Check input data and return values
Handle errors and exceptions correctly
Use safe routines/APIs/libraries
Be careful with threading and locking and synchronization

A good developer can review this code for security and privacy requirements: making sure that you are masking or encrypting or – even better – not storing PII and secrets, auditing, properly following access control rules. And they can review the logic and workflow, look for race conditions, check data validation, make sure that error handling and exception handling is done right and that you are using frameworks and libraries carefully.

This is what code reviews are for. To look for and find coding problems. If you find these problems – and code reviews are one of the most effective ways of doing this – your code will be safer and more secure. So I don’t get it. Why are peer reviews for security a waste of time?

Tuesday, March 5, 2013

Appsec at RSA 2013

This was my second time at the RSA conference on IT security. Like last year, I focused on the appsec track, starting with a half-day mini-course on how to write secure applications for developers, presented by Jim Manico and Eoin Keary representing OWASP. It was a well-attended session. Solid, clear guidance from people who really do understand what it takes to write secure code. They explained why relying on pen testing is never going to be enough (your white hat pen tester gets 2 weeks a year to hack your app, the black hats get 52 weeks a year), and covered all of the main problems, including password management (secure storage and forgot password features), how to protect the app from click jacking, proper session management, access control design. They showed code samples (good and bad) and pointed developers to OWASP libraries and Cheat Sheets, as well as other free tools.

We have to solve XSS and SQL Injection

They spent a lot of time on XSS (the most common vulnerability in web apps) and SQL injection (the most dangerous). Keary recommended that a good first step for securing an app is to find and fix all of the SQL injection problems: SQL injection is easy to see and easy to fix (change the code to use prepared statements with bind variables), and getting this done will not only make your app more secure, it also proves your organization’s ability to find security problems and fix them successfully.

SQL injection and XSS kept coming up throughout the conference. In a later session, Nick Galbreath looked deeper into SQL injection attacks and what developers can do to detect and block them. By researching thousands of SQL injection attacks, he found that attackers use constructs in SQL that web application developers rarely use: unions, comments, string and character functions, hex number literals and so on. By looking for these constructs in SQL statements you can easily identify if the system is being attacked, and possibly block the attacks. This is the core idea behind database firewalls like Green SQL and DB Networks, both companies that exhibited their solutions at RSA.

On the last day of the conference, Romain Gaucher from Coverity Research asked “Why haven’t we stamped out SQL Injection and XSS yet?”. He found through a static analysis review of several code bases that while many developers are trying to stop SQL injection by using parameterized queries, it’s not possible to do this in all cases. About 15% of SQL code could not be parameterized properly – or at least it wasn't convenient for the developers to come up with a different approach. Gaucher also reinforced how much of a pain in the butt it is trying to protect an app from XSS:

“XSS is not a single vulnerability”. XSS is a group of vulnerabilities that mostly involve injection of tainted data into various HTML contexts”.

It’s the same problem that Jim Manico explained in the secure development class: in order to prevent XSS you have to understand the context and do context-sensitive encoding, and hope that you don’t make a mistake. To help make this problem manageable, in addition to libraries available from OWASP, Coverity has open sourced a library to protect Java apps from XSS and SQL injection.

The Good

While most of the keynotes offered a chance to catch up on email, the Crypto Panel was interesting. Chinese research into crypto is skyrocketing. Which could be a good thing. Or not. I was interested to hear Dan Boneh at Stanford talk more about the research that he has done into digital certificate handling and SSL outside of browsers. His team found that in almost all cases, people who try to do SSL certificate validation in their own apps do it wrong.

Katie Moussouris at Microsoft presented an update on ISO standards work for vulnerability handling. ISO 30111 lays out a structured process for investigating, triaging and resolving software security vulnerabilities. There were no surprises in the model - the only surprise is that the industry actually needs an ISO standard for the blindingly obvious, but it should set a good bar for people who don't know where to start.

Jeremiah Grossman explained that there are two sides to the web application security problem. One half is weaknesses in web sites like SQL injection and lousy password handling and mistakes in access control. The other half is attacks that exploit fundamental problems in browsers. Attacks that try to break out of the browser – which browser vendors put a lot of attention to containing through sandboxing and anti-phishing and anti-malware protection – and attacks that stay inside the browser but compromise data inside the browser like XSS and CSRF, which get no attention from browser vendors so it’s up to application developers to deal with.

Grossman also presented some statistics on the state of web application security, using data that White Hat Security is collecting from its customer base. Recognizing that their customers are representative of more mature organizations that already do regular security testing of their apps, the results are still encouraging. The average number of vulnerabilities per app is continuing to decline year on year. SQL injection is now the 14th most common vulnerability, found in only 7% of tested apps – although more than 50% of web apps are vulnerable to XSS, for the reasons discussed above.

Gary McGraw from Cigital agreed that as an industry, software is getting better. Defect density is going down (not as fast as it should be, but real progress is being made), but the software security problem isn't going away because we are writing a lot more code, and more code inevitably means more bugs. He reiterated that we need to stay focused on the fundamentals – we already know what to do, we just have to do it.

“The time has come to stop looking for new bugs to add to the list. Just fix the bugs”.

Another highlight was the panel on Rugged Devops, which continued a discussion that started at OWASP Appsec late last year, and covered pretty much the same ground: how important it is to get developers and operations working together to make software run in production safely, that we need more automation (testing, deployment, monitoring), and how devops provides an opportunity to improve system security in many ways and should be embraced, not resisted by the IT security community.

The ideas are based heavily on what Etsy and Netflix and Twitter have done to build security into their rapid development/deployment practices. I agreed with ½ of the panel (Nick Galbreath and David Mortman, who have real experience in software security in Devops shops) almost all of the time, and disagreed with the other ½ of the panel most of the rest of the time. There’s still too much hype over continuously deploying changes 10 or 100 or 1000 times a day, and over the Chaos Monkey. Etsy moved to Continuous Deployment multiple times per day because they couldn’t properly manage their release cycles – that doesn't mean that everyone has to do the same thing or should even try. And you probably do need something like Chaos Monkey if you’re going to trust your business to infrastructure as unreliable as AWS, but again, that’s not a choice that you have to make. There’s a lot more to devops, it’s unfortunate that these ideas get so much attention.

The Bad and the Ugly

There was only one low point for me – a panel with John Viega formerly of McAfee and Brad Arkin from Adobe called “Software Security: a Waste of Time”.

Viega started off playing the devil’s advocate, asserting that most people should do nothing for appsec, it’s better and cheaper to spend their time and money on writing software that works and deal with security issues later. Arkin disagreed, but unfortunately it wasn't clear from the panel what he felt an organization should do instead. Both panellists questioned the value of most of the tools and methods that appsec relies on. Neither believed that static analysis tools scale, or that manual security code audits are worth doing. Viega also felt that “peer reviews for security are a waste of time”. Arkin went on to say:

“I haven’t seen a Web Application Firewall that’s worth buying, and I've stopped looking”
“The best way to make somebody quit is to put them in a threat modelling exercise”
“You can never fuzz and fix every bug”

Arkin also argued against regulation, citing the failure of PCI to shore up security for the retail space – ignoring that the primary reason that many companies even attempt to secure their software is because PCI requires them to take some responsible steps. But Arkin at least does believe that secure development training is important and that every developer should receive some security training. Viega disagreed, and felt that training only matters for a small number of developers who really care.

This panel was like a Saturday Night Live skit that went off the rails. I couldn't tell when the panellists were being honest or when they were ironically playing for effect. This session lived up to its name, and really was a waste of time.

The Toys

This year’s trade show was even bigger than last year, with overflow space across the hall. There were no race cars or sumo wrestlers at the booths this year, and fewer strippers (ahem models) moonlighting (can you call it "moonlighting" if you're doing it during the day?) as booth bunnies although there was a guy dressed like Iron Man and way too many carnival games.

This year’s theme was something to do with Big Data in security so there were lots of expensive analytics tools for sale. For appsec, the most interesting thing that I saw was Cigital Secure Assist a plug-in for different IDEs that provides fast feedback on security problems in code (Java, .NET or PHP) every time you open or close a file. The Cigital staff were careful not to call this a static analysis tool (they’re not trying to compete with Fortify or Coverity or Klocwork), but what excited me was the quality of the feedback, the small client-side footprint, and that they intend to make it available for direct purchase over the web for developers at a very reasonable price point, which means that this could finally be a viable option for smaller development shops that want to take care of security issues in code.

All in all, a good conference and a rare opportunity to meet so many smart people focused on IT security. I still think for pure appsec that OWASP’s annual conference is better, but there's nothing quite like RSA.

Thursday, February 21, 2013

Hardening Sprints - in Portuguese

My post on Hardening Sprints has been translated into Portuguese at the Brazilian iMasters site. If your Portuguese is good, you can read the post here.

Wednesday, February 20, 2013

A Bug is a Terrible Thing to Waste

Some development teams, especially Agile teams, don’t bother tracking bugs. Instead of using a bug tracking system, when testers find a bug, they talk to the developer and get it fixed, or they write a failing test that needs to be fixed and add it to the Continuous Integration test suite, or if they have to, they write up a bug story on a card and post it on the wall so the team knows about it and somebody will commit to fixing it.

Other teams live by their bug tracking systems, using tools like Jira or Bugzilla or FogBugz to record bugs as well as changes and other work. There are arguments to be made for both of these approaches.

Arguments for tracking bugs – and for not tracking bugs

In Agile Testing: A Practical Guide for Testers and Agile Teams, Lisa Crispin and Janet Gregory examine the pros and cons of using a defect tracking system.

Using a system to track bugs can be the only effective way to manage problems for teams who can’t meet face-to-face - for example, distributed teams spread across different time zones. It can also be useful for teams who have inherited open bugs in a legacy system; and a necessary evil for teams who are forced to track bugs for compliance reasons. The information in a bug database is a potential knowledge base for testers and developers joining the team – they can review bugs that were found before in the area of the code that they are working on to understand what problems they should look out for. And bug data can be used to collect metrics and create trends on bugs – if you think that bug metrics are useful.

But the Lean/Agile view is that using a defect tracking system mostly gets in the way and slows people down. The team should stay focused on finding bugs, fixing them, and then forget about them. Bugs are waste, and everything about them is waste – dead information, and dead time that is better spent delivering value. Worse, using a defect tracking system prevents testers and developers from talking with each other, and encourages testers to take a “Quality Police” mindset. Without a tool, people have to talk to each other, and have to learn to play nicely together.

This is a short-term, tactical point of view, focused on what is needed to get the software out the door and working. It’s project-thinking, not product thinking.

Bugs over the Long Term

But if you’re working on a system over a long time like we are, if you're managing a product or running a service, you know that it’s not that simple. You can’t just look at what’s in front of you, and where you want to be in a year or more. You also have to look back, at the work that was done before, at problems that happened before, at decisions that were made before, to understand why you are where you are today and where you may be heading in the future.

Because some problems never go away. And other problems will come back unless you do something to stop it. And you’ll find out that other problems which you thought you had licked never really went away. The information from old bugs, what happened and what somebody did to fix them (or why they couldn't fix them), which workarounds worked (and which didn't) can help you understand and deal with the problems that you are seeing today, and help you to keep improving the system and how you build it and keep it running.

Because you should understand the history of changes and fixes to the code if you’re going to change it. If you like the way the code is today, you might want to know how and why it got this way. If you don’t like it, you’ll want to know how and why it got this way – it’s arrogant to assume that you won’t make the same mistakes or be forced into the same kinds of situations. Revision control will tell you what was changed and when and who did it, the bug tracking system will tell you why.

Because you need to know where you have instability and risk in the system. You need to identify defect-dense code or error-prone code – code that contains too many bugs, code that is costing you too much to maintain and causing too many problems, code that is too expensive to keep running the way that is today. Code that you should rewrite ASAP to improve stability and reduce your ongoing costs. But you can’t identify this code without knowing the history of problems in the system.

Because you may need to prove to auditors or regulators and customers and investors that you are doing a responsible job of testing and finding bugs and fixing them and getting the fixes out.

And because you want to know how effective the team is in finding, fixing and preventing bugs. Are you seeing fewer bugs today? Or more bugs? Are you seeing the same kinds of bugs – are you making the same mistakes? Or different mistakes?

Do you need to track every Bug?

As long as bugs are found early enough, there’s little value in tracking them. It’s when bugs escape that they need to be tracked: bugs that the developer didn't find right away on their own, or in pairing, or through the standard automated checks and tests that are run in Continuous Integration.

We don’t log

defects found in unit tests or other automated tests – unless for some reason the problem can’t or won’t be fixed right away;
problems found in peer reviews – unless something in the review is considered significant and can’t get addressed immediately. Or a problem is found in a late review, after testing has already started, and the code will need to be retested. Or the reviewer finds something wrong in code that wasn't changed, an old bug – it’s still a problem that needs to be looked at, but we may not be prepared to deal with it right now. All problems found in external reviews, like a security review or an audit, are logged;
static analysis findings – most of the problems caught by these tools are simple coding mistakes that can be seen and fixed right away, and there’s also usually a fair amount of noise (false positives) that has to be filtered out. We run static analysis checks and review them daily, and only log findings if we agree that the finding is real but the developer isn't prepared to fix it immediately (which almost never happens, unless we’re running a new tool against an existing code base for the first time). Many static analysis tools have their own systems for tracking static analysis findings any ways, so we can always go back and review outstanding issues later;
bugs found when developers and testers decide to pair together to test changes early in development, when they are mostly exploring how something should work – we don’t usually log these bugs unless they can’t be / won’t be fixed (can’t be reproduced later for example).

A Bug is a Terrible thing to Waste

We log all other bugs regardless of whether they are found in production, in internal testing, partner testing, User Acceptance Testing, or external testing (such as a pen test). Because most of the time, when software is handed to a tester, it’s supposed to be working. If the tester finds bugs, especially serious ones, then this is important information to the tester, to the developer, and to the rest of the team. It can highlight risks. It can show where more testing and reviews need to be done. It can highlight deeper problems in the design, a lack of understanding that could cause other problems.

If you believe that testing provides important information not just about the state of your software, but also on how you are designing and building it – then everyone needs to be able to see this information, and understand it over time. Some problems can’t be seen or fully understood right away, or in 1-week or 2-week sprint-sized chunks. It can take a while before you recognize that you have a serious weakness in the design or that something is broken in your approach to development or in your culture. You’ll need to experience a few problems before you start to find relationships between them and before you can look for their root cause. You’ll need data from the past in order to solve problems in the future.

Tracking bugs isn't a waste if you learn from bugs. Throwing the information on bugs away is the real waste.

Wednesday, February 13, 2013

Releasing more often drives better Dev and better Ops

One of the most important decisions that we made as a company was to release less software, more often. After we went live, we tried to deliver updates quarterly, because until then we had followed a staged delivery lifecycle to build the system, with analysis and architecture upfront, and design and development and testing done in 3-month phases.

But this approach didn't work once the system was running. Priorities kept changing as we got more feedback from more customers, too many things needed to fixed or tuned right away, and we had to deal with urgent operational issues. We kept interrupting development to deploy interim releases and patches and then re-plan and re-plan again, wasting everyone’s time and making it harder to keep track of what we needed to do. Developers and ops were busy getting customers on and fire fighting which meant we couldn't get changes out when we needed to. So we decided to shorten the release cycle down from 3 months to 1 month, and then shorten it down again to 3 weeks and then 2 weeks, making the releases smaller and more focused and easier to manage.

Smaller, more frequent releases changes how Development is done

Delivering less but more often, whether you are doing it to reduce time-to-market and get fast feedback in a startup, or to contain risk and manage change in an enterprise, forces you to reconsider how you develop software. It changes how you plan and estimate and how you think about risks and how you manage risks. It changes how you do design, and how much design you need to do. It changes how you test. It changes what tools people need, and how much they need to rely on tools.

It changes your priorities. It changes the way that people work together and how they work with the customer, creating more opportunities and more reasons to talk to each other and learn from each other. It changes the way that people think and act – because they have to think and act differently in order to keep up and still do a good job.

Smaller, more frequent releases changes how Development and Ops work together

Changing how often you release and deploy will also change how operations works and how developers and operations work together. There’s not enough time for heavyweight release management and change control with lots of meetings and paperwork. You need an approach that is easier and cheaper. But changing things more often also means more chances to make mistakes. So you need an approach that will reduce risk and catch problems early.

Development teams that release software once a year or so won’t spend a lot of time thinking about release and deployment and operations stuff in general because they don’t have to. But if they’re deploying every couple of weeks, if they’re constantly having to push software out, then it makes sense for them to take the time to understand what production actually looks like and make deployment - and roll-back – easier on them and easier on ops.

You don’t have to automate everything to start – and you probably shouldn't until you understand the problems well enough. We started with check lists and scripting and manual configuration and manual system tests. We put everything under source control (not just code), and then started standardizing and automating deployment and configuration and roll-back steps, replacing manual work and check lists with automated audited commands and health checks. We've moved away from manual server setup and patching to managing infrastructure with Puppet. We’re still aligning test and production so that we can test more deployment steps more often with fewer production-specific changes. We still don’t have a one-button deploy and maybe never will, but release and deployment today is simpler and more standardized and safer and much less expensive.

Deployment is just the start

Improving deployment is just the start of a dialogue that can extend to the rest of operations. Because they’re working together more often, developers and ops will learn more about each other and start to understand each other’s languages and priorities and problems.

To get this started, we encouraged people to read Visible Ops and sent ops and testers and some of the developers and even managers on ITIL Foundation training so that we all understood the differences between incident management and problem resolution, and how to do RCA, and the importance of proper change management – it was probably overkill but it made us all think about operations and take it seriously. We get developers and testers and operations staff together to plan and review releases, and to support production and in RCA whenever we have a serious problem, and we work together to figure out why things went wrong and what we can do to prevent them from happening again. Developers and ops pair up to investigate and solve operational problems and to improve how we design and roll out new architecture, and how we secure our systems and how we set up and manage development and test environments

It sounds easy. It wasn't. It took a while, and there were disagreements and problems and back sliding, like any time you fundamentally change the way that people work. But if you do this right, people will start to create connections and build relationships and eventually trust and transparency across groups – which is what Devops is really about.

You don’t have to change your organization structure or overhaul the culture – in most cases, you won’t have this kind of mandate anyways. You don’t have to buy into Continuous Deployment or even Continuous Delivery, or infrastructure as code, or use Chef or Puppet or any other Devops tools – although tools do help.

Once you start moving faster, from deploying once a year every few months to once a month and as your organization’s pace accelerates, people will change the way that they work because they have to.

Today the way that we work, and the way that we think about development and operations, is much different and definitely healthier. We can respond to business changes and to problems faster, and at the same time our reliability record has improved dramatically. We didn't set out to “be Agile” – it wasn't until we were on our way to shorter release cycles that we looked more closely at Scrum and XP and later Kanban to see how these methods could help us develop software. And we weren't trying to “do Devops” either – we were already down the path to better dev and ops collaboration before people started talking about these ideas at Velocity and wherever else. All we did was agree as a company to change how often we pushed software into production. And that has made all the difference.

Wednesday, February 6, 2013

Code and Code Reviews: What’s in a Name?

In a code review a developer needs to look at the code from two different perspectives:

Correctness. Is the code logically correct, does it do what it is supposed to do? Will it hold up in the real world? Is it safe? Does it handle errors and exceptions? Does it check for bad input parameters and return values? Is it secure? And where performance is important, is it efficient?
Maintainability. Can I understand this code well enough to maintain it, could I change it safely myself? Is it readable and consistent? Is the logic too complex, are the pieces too big? Is it unit tested and if not can it be unit tested? This is where a reviewer looks out for too much copying and pasting, whether hand-rolled code could be done using standard libraries or language features instead, adherence to style guidelines and standards.

It’s obvious that using good names for classes and methods and variables is important in making code understandable (if you can’t understand it, you can’t tell if it is doing what it is supposed to do) and easier to maintain. Books like Clean Code and Code Complete have entire chapters on proper naming. But even good, experienced developers have a hard time coming up with the right abstractions and good, meaningful, intention-revealing names for them. It’s especially hard if they’re working on code that they don’t know that well.

Bad names cause reviewers to stumble, or to make the wrong assumptions about what the code is doing. There are lame names – lazy, ambiguous, generic names that don’t help the reader understand what’s happening in the code. Then there are names which are misleading or just plain wrong – names which used to be right, but aren't now because the logic changed but the name didn't. You’re reading through code, it calls postPayment, but postPayment doesn't just post a payment, it does a lot more now – or worse, it doesn't post a payment at all any more.

Focusing on naming becomes more important over time as more code gets changed more often. The design changes, responsibilities change, often incrementally and subtly. Code gets added, then later other code gets deleted or moved. The programmer making the latest change just wants to focus on what they’re doing, fix the code and get out, and doesn't look at the bigger picture, doesn't notice that the meaning of the code has changed or become less clear because of what they have just done.

Reviewers need to be on the watch for these kinds of problems.

But naming isn't just about making it easier for somebody else to read the code and maintain it – or even making it easier for you to read it and change it yourself when you come back to it later. As a colleague of mine has pointed out, if someone can’t come up with a good name for a method or class, it’s a sign that they are having problems understanding what they were working on. So in reviewing code, a bad name is more than just a distraction or an irritation – it’s a warning sign that there could be more problems in the code, that the developer might not have understood the design well enough to make the change correctly, or that they weren't paying close enough attention to what they were working on, and they may have made other mistakes that you – the reviewer – need to be on the lookout for.

Focusing on naming sometimes seems fussy and nit picky, another battle in the endless “style wars” which end in hand-to-hand fighting over bracket placement and indentation. But good naming is much more important than aesthetics or standardization. It’s about making sure that code is working correctly, and that it will stay that way.