Building Real Software

Sunday, May 3, 2009

Everything I needed to know about Maintenance...

So much has been written about software development: there are good books and blogs on software engineering, agile methods, design patterns, requirements requirement, software development lifecycles, testing, project management. But so little has been written about how most of us spend most of our careers in software: essentially maintaining and supporting legacy software, software that has already been built.

I learned most of what I needed to know about successful software maintenance and support a long time ago, when I worked at a small and successful software development company. At Robelle Solutions Technology we developed high-quality technical tools, mostly for other programmers: an IDE (although we didn’t call it an IDE back then, we called it an editor – but it was much more than that), database tools, and an email system, which were used by thousands of customers world wide. I worked there in the early 1990s as a technical support specialist – heck, I was half of the global technical support team at the time. Besides the two of us on support, there was three programmers building and maintaining the products, another developer building and supporting our internal tools, a technical writer (who moonlighted as a science fiction writer), and a small sales and administrative team. The developers worked from home – the rest of us worked out of a horse ranch in rural BC for most of the time I was there. The company succeeded because of strong leadership, talent, a close-knit team built on trust and respect, clear communications, and a focus on doing things right.

Looking back now I understand better what we did that worked so well, and how it can be applied today, with the same successful results.

We worked in small, distributed teams, using lightweight, iterative design and development methods; building and delivering working software to customers every month, responding to customer feedback immediately and constantly improving the design and quality of the product. Large changes were broken down into increments and delivered internally on a monthly basis. All code was peer reviewed. The developers were responsible for running their own set of tests, then code was passed on to the support team for more testing and reviews, and then on to customers for beta testing at the end of the monthly timebox. Back then we called this “Step by Step”.

Incremental design and delivery in timeboxes is perfectly suited to maintenance. Except for emergency patch releases, enhancement requests and bug fixes from the backlog can be prioritized, packaged up and delivered to customers on a regular basis. Timeboxing provides control and establishes a momentum to releases, and customers see continuous value.

We maintained a complete backlog of change requests, bug reports, and our product roadmaps in a beautiful, custom-built issue management system that was indexed for text lookup as well as all of the standard key searching: by customer, product, date, engineer, priority, issue type. Using this we could quickly and easily find information about a customer’s issue, check to see if we had ran into something similar before, if we had a workaround available or a fix scheduled in development. When a new version was ready, we could identify the customers who asked for fixes or changes and see if they were interested in a pre-release for testing. There are issue management systems today which still don’t have the capabilities that what we had back then.

Technical debt was carefully and continuously managed: of course we didn’t know about technical debt back then either. One of the developers’ guiding principles was that any piece of code could be around for years: in fact, some of the same code is still in use today, more than 25 years after the original products were developed! If you had to live with code for a long time, you better be happy with it, and be able to change it quickly and confidently. The programmers were careful in everything that they wrote or changed, and all code was peer reviewed, checking for clarity and consistency, encapsulation, error handling, portability, and performance optimization where that was important. The constraints of delivering within a timebox also focused the development team to come up with the simplest design and implementation possible, knowing that in some cases the solution may have to be thrown away entirely or rewritten in a later increment because it missed the requirement.

The same principles and practices apply today, taking advantage of improvements in engineering methods and technology, including automated unit testing, continuous integration, static analysis tools and refactoring capabilities in the IDE. All of this helps ensure the quality of the code; allows you to make changes with confidence; and helps you avoid falling into the trap of having to rewrite a system because you can no longer understand it or safely change it.

We knew our products in detail: on the support desk, we tested each release and worked with the developers and our technical writer to write and review the product documentation (which was updated in each timebox), and we used all of our own products to run our own business, although not pushing them to the same limit as our customers. Once or twice a year we could take a break from support and work in development for a timebox, getting a chance to do research or help with a part of the product that we felt needed improvement. All of the developers worked on the support desk a few times a year, getting a chance to hear directly from their customers, understand the kind of problems they were facing or what kind of improvements they wanted, and thinking about how to improve the quality of the products, how to make troubleshooting and debugging customer problems easier.

Since we delivered to customers as often as once per month, we had a well-practiced, well-documented release process, with release and distribution checklists, release notes, updated manuals, install instructions that were carefully tested in different environments. All of this was developed iteratively, constantly improved as we found problems or new tools or new ideas. Today teams can take advantage of the ITIL release management practice framework, books like Release It! and Visible Ops to build effective release management processes.

I have learned a lot since working at Robelle, but sometimes I find that I am relearning old lessons; only now truly understanding the importance, the value of these practices, and how can they be applied and improved on today.

Friday, April 17, 2009

OpenSAMM shows a way

We have done a lot of work over the past 3 years to develop an effective software security program. We began working with Cigital’s Touchpoint model in 2006, starting with an internal vulnerability assessment. Touchpoints, described in Software Security: Building Security In by Gary McGraw, outline the required practices for secure software development. But to get real, applied value from the touch point model you have to break your own trail, or get consulting help from Cigital.

We brought consultants in from Secure Software (now part of Fortify) and Cigital’s software security practice to establish a baseline for the system and our organization, including a detailed tool-assisted code review and architecture assessment. With Cigital’s help we built a secure SDLC roadmap for the team, trained everyone in defensive coding, developed secure coding and code review guidelines, and added static analysis tools into our continuous build process.

Working with Cigital definitely kickstarted our security program. Cigital has some smart guys, and they know their stuff. However their approach became more heavyweight over time: their engagement model included assigning a practice manager and a project manager to what were otherwise small engagements, adding to cost and overhead while contributing little value. Their consulting model is clearly targeted towards bigger companies, with bigger consulting budgets and longer time horizons. We needed practical, concrete, immediate feedback; tools and deliverables that we could understand and use right away; and we worked hard with their team to get this.

As part of our security program we also work with another expert consulting firm, Foundstone Professional Services, who we contract for application vulnerability assessments and penetration tests. They have an efficient and well-defined engagement model, they are professional and thorough, and they’re fast. We get high-quality, clear and actionable feedback from their pen testing team – the results of Foundstone’s work not only validate the security of a release, they also provide a health check on the overall security posture of our team.

Based on this work, with the help of Cigital and Foundstone, we have a solid foundation in place and a high-level roadmap for continuous improvement. I am reviewing our next steps, looking to where we can get the most bang for our buck. To help with this I have been looking for a secure SDLC framework suitable for small companies – companies that cannot afford heavyweight process controls (with independent security teams and so on) and large expert consulting budgets, but still need to hold to a high standard of due diligence for secure software development and delivery.

I started working with the CLASP (the Comprehensive, Lightweight Application Security Process) framework from OWASP developed by Pravir Chandra while he was at Secure Software: Pravir then went on to Cigital (he was one of the consultants working with us) and is now independent. CLASP comes with a fair amount of resources, and is intended (as indicated by its name) to be lightweight. However it has not been well maintained since it was contributed, it is not aligned well with other OWASP initiatives; and there is little information on how to apply it and how to scale it. Just as I was growing frustrated at the amount of work it was going to take just to make CLASP useful, Pravir released “CLASP on steroids”, contributing a completely new framework, OpenSAMM to OWASP

OpenSAMM, a Software Assurance Maturity Model, offers a roadmap and well-defined maturity model for secure software development and deployment, with some good tools for self-assessment and planning. So far from my review, OpenSAMM seems more pragmatic than the “Building Security In Maturity Model” BSIMM developed out of the initial collaboration between Pravir, Gary McGraw at Cigital, and sponsored by Fortify Software.

I am working with OpenSAMM to see how it scales to small companies, with small teams using agile development methods. So far it looks promising and actionable. I have completed an internal assessment using the self-assessment tool, and I’m comparing the gap analysis findings to our roadmap plans. I will keep track of my experience with applying OpenSAMM and record my findings as I go along.

Thursday, April 16, 2009

Making Things Happen

I am reading an excellent book on project management called “Making Things Happen” by Scott Berkun, who used to run major projects at Microsoft, and then worked in their engineering excellence group. I quickly zeroed in on Chapter 13, titled “Making Things Happen” which explores what I believe project management is really about – doing whatever it takes to help the team get the job done.

How does a project manager “make things happen”?

First, one of the critical questions that should be asked when hiring a project manager, after you have checked out the candidate’s technical background, is: “If things were not going well on an important project, would I feel confident sending this person into that room, into that debate, and believe that he’d find a way to make it better, whatever the problem was?”. The team has to be convinced that the candidate can make a difference in tough situations.

The project manager’s job is to find out the priorities and manage to them. Make this list of priorities clear to everyone involved – the team must be focused on doing only what is important to success. “What wastes time on projects is confusion about which things should come before which other things.”

Set clear goals, make sure everyone understands them, followup and reinforce priorities. Everyone needs to understand what the “priority 1” list is: the list of things that must be done to succeed. Keep this list as small as possible.

Prevent miscommunications and missteps. Help people take secondary, minor things off of their plates. Resolve conflicts by driving back to the project’s priorities, the critical success factors.

Remove obstacles. Risk management is part of this of course: setup the project to minimize obstacles upfront, watch for things that could go wrong and manage them. Handle people problems. Fix the environment – make sure people can get their work done.

Be relentless. Don’t give up, don’t stop looking for alternatives. Berkun talks about the example of Apollo 13, where the team kept driving to fix unfixable problems and save the mission.

Question people (even powerful ones) and challenge assumptions. Believe that there is a solution to a problem – even if it means changing the definition of the problem. If you can’t find an answer that means that you haven’t looked hard enough.

Own the problem. Escalate, use your network, create options and alternatives. Be dead serious and fight to the end - there is always a way out.

All of this might sound over-done, over-dramatic, but I believe that this is what sets successful project managers apart – the sense of ownership, the ability, the discipline, the drive to execute.

Monday, March 23, 2009

What's Wrong with Sucking Less?

At the Agile 2008 conference in Toronto, David Douglas & Robin Dymond discussed their concerns that the majority of companies who adopt agile (and by “agile”, effectively meaning Scrum) practices were falling short of complete adoption. The companies that they were working with were satisfied with 1.5-2x performance improvements in quality and time-to-market gained by effectively cherry picking from the key agile, incremental software development practices. The authors were concerned that most, if not all companies adopting agile were content simply “to suck less” rather than transforming their businesses.

This was further explored in December of last year in a StickyMinds column “Little Scrum Pigs and the Big Bad Wolf” by Michelle Sliger, who expressed her concern that

“Indeed, many companies are refusing to view agile as anything other than a set of engineering practices.”

That surprised me. I thought that this was, in fact, the point of agile software development: for people to adopt more effective software engineering practices.

What is especially confusing is that Scrum, in particular, is really a project management approach and provides very little in the way of software engineering practices. It is intuitive and obvious and easy to implement, maybe too easy, which is why most agile projects today are based on Scrum: its strength, and weakness, is that it provides an effective framework for organizing and managing software projects by breaking them down into time boxed increments, but does not force the team to adopt specific engineering practices and disciplines, unlike other methods, and especially XP.

Martin Fowler of ThoughtWorks, one of the leading thinkers in the agile (in this case, XP, however, rather than Scrum) community, raises the concern that this lack of software engineering discipline can lead to Scrum teams building a lot of sloppy software, however quickly.

Back to the “Three Little Pigs” – the author goes on to express her dismay that

“They have not adopted the value system that is the underlying infrastructure of all agile approaches.”

and that companies who are simply interested in adopting good practices from scrum and XP, but who don’t buy into the complete philosophy and value set, therefore lack vision and lack commitment; and are at risk of failure.

I find the argument to be both elitist and dogmatic, and awfully awfully unclear. The author suggests that there is something hidden in the agile manifesto, that by surrendering to this mystery you will find the one true agile path to success in software development and anything less is conceding defeat, or at the least condemning yourself and your organization to mediocrity.

But what is this mystery, this ineffable something or somethings that organizations refuse to, or are somehow unable to, accept? Perhaps it is delivering working software incrementally, following a timeboxed approach – no, this can’t be it, this is a well understood engineering practice, one of those mere disciplines that the author suggests is insufficient for success. Maybe it is “continuous attention to technical excellence and good design”. That can’t be it: I don’t see why any organization would not accept this as axiomatic when building software. Or is it the emphasis on simplicity? Or that the team should have a good working environment and the trust and support of management? Or that developers and customers should work together? Or maybe that we need to create self-organizing teams? Just what is it that these companies, who are “teetering at the edge between mediocrity and high performance”, are failing to do?

Douglas and Dymond concede that there are too few real agile success stories: they point to Nokia and BMC Software and PatientKeeper of course; a small number of companies (very small, after 10 or so years of evangelism) who have had noted success in adopting Scrum in a fundamental way. But I would argue that there are a lot of success stories – all of those companies who are “sucking less”, who have started on a path to building better and better software: every day, working hard to suck less and still less and less and so on.

While there is a negative connotation to the term, I don't see what is wrong with "sucking less". With being practical, goal-focused, and incremental in improving software development practices. With delivering good, working software in time boxed, iterative releases. With building a stronger development team, and a better development environment. With following good engineering practices and management methods, and stopping bad ones. With constantly reviewing your failures and successes and finding new ways to improve. I don’t see this as building a “house of straw” – I see this as what we all have to do to succeed: constantly, ruthlessly get better and better together.

Monday, January 19, 2009

Risk Management

Successfully building real software systems is a high-risk undertaking. To manage risk, you need to explicitly identify and face risks; and then implicitly manage these risks in the way that you build and develop the software. Let me explain.

Poor (or nonexistent) management of risk has led to continual, and in some cases, spectacular project failures and product failures. As a result, development managers and project managers are now expected to follow risk management in their work. Standardized risk management is an important part of the Project Management Institute’s professional standard for project managers. Steve McConnell and Construx introduced a simple, and commonly referenced tool, the Top 10 Risk List in his Rapid Development book – this tool, and generic risk patterns as well as a set of plan templates and checklists, are available as part of the CXOne method available for free from Construx.

Using the Top 10 Risk List or following other risk management practices, the team, and especially management, need to first recognize and confront risks – recognize what can go wrong, recognize the early warning signs or triggers of risks, don’t hide from risks. Continually review and look for new sources of problems.

The next part is to manage risk by how you build and deliver software – build risk management into your SDLC. Let’s consider the common types of risk:

Architecture, tools, platform and technical risks. Build software incrementally. Build the hard parts first. Develop vertical, technical prototypes to evaluate your design. You’ll know if your idea and your technology is going to work, if it works. If you’re going to fail, fail early.

Requirements risk: changes, incomplete, wrong requirements. Prototyping, incremental and iterative development. Work closely with your customers. Get feedback, adapt.

Stakeholder risk and other political risks. Deliver concrete value, deliver early, deliver often, and make sure your stakeholders are well-informed, that your work and your approach are transparent to them. You are unlikely to be stabbed in the back by someone if you are giving them (or other important people) what they want and making sure that they know it.

Staff, personnel risks, key people leaving. Share code, conduct design and code reviews, pair, switch tasks between team members. Invest upfront in automated unit testing and other developer testing - not only does it protect team members when they need to change code written by someone who has left; it also helps you understand what that the code is supposed to do – good unit tests are the best form of documentation. And code that is well unit tested is also generally simpler.

Schedule risk. Like requirements risks and technical risks, manage through incremental delivery, deliver the most important and hardest parts upfront, time box everything. If you can’t get to everything, you at least get to the important stuff – this is probably all that they need anyways. As always, hire the best people you can and make sure that they have good tools and a good working environment.

Quality risk. Incremental development, continuous integration, extensive automated developer testing, use static analysis to find the stupid mistakes, reviews or pair programming. Ensure that developers and testers work closely together, and fix bugs early. Review and learn from each delivery, constantly find ways to improve.

Design risk – over complex or bad design. Again, incremental development, focus on hard things first, build a technical proof of concept upfront to prove the design approach.

Subcontractor risks. Manage your partners and contractors like you manage yourself. Demand regular, incremental deliveries and transparency into the construction and delivery process. Hold your outsourcing partners to the same standard as you would hold your own team; or in some cases, a higher standard.

Following a basic set of good practices in incremental development will help you and your team avoid or deal with most of the classic development risks. What changes is the amount of care and discipline in your development approach.

Thursday, November 13, 2008

Construx SW Development Executive Summit

I have started working again with Construx Software, helping us improve our software engineering practices. I was of course familiar with Steve McConnell, Construx's CEO and his books on software engineering, and I had worked with Construx at a previous company on training and mentoring our development team and development managers.

Earlier this year I attended their 10x Software Engineering course, an excellent software development management course which focuses on understanding and implementing good practices in software development, how to improve quality and team productivity. We enlisted their help on a project assessment with a partner, and Construx is scheduled to come in later this year to teach a Developer Testing Bootcamp course to the development team.
In October I also attended Construx's Software Development Executive Summit: an intimate, intense and highly-focused series of roundtable sessions and high-quality keynotes with the heads of software development and software engineering at large and small companies across North America, Europe and Asia. Like other Construx offerings, the summit was pragmatic, selective, carefully organized and very professionally run. The keynote speakers included Martin Fowler of ThoughtWorks and Ken Schwaber, the creator of Scrum, as well as Construx's CEO Steve McConnell, author of Code Complete and Rapid Development, and Matt Peloquin of Construx; and interesting case studies presented by IT executives at MSNBC and RIM.

It was a unique forum: a chance to meet and share ideas in an open, constructive and respectful environment with serious people who all had extensive experience leading software development and who were all looking for ways to improve. There were so many different stories: companies who had great success (or disappointing failures) at outsourcing (onshore and offshore); companies who were successful delivering using small, agile, results-oriented collocated teams; other companies who followed highly structured, metrics-driven SDLCs and were equally successful. The development organizations ranged in size from a handful of developers to hundreds or thousands of engineers in multiple countries. The roundtable sessions provided me the opportunity to explore problems and share ideas with experienced development managers, product managers and project managers, and thinkers like Martin Fowler. The social engagements provided excellent networking opportunities and were generally good fun, and there was no pressure from vendors or sponsors.

What key ideas did I take back from the summit?

The first key to success is talent. Get the best people you can. Treat them well and support them, give them what they need. Be careful when hiring, and spend time with new hires, help them to be successful. Keep teams together as long as you can: continuity and cohesion of a team pays dividends.
There is no “one way to skin a cat”: software development poses different problems to different organizations, and there are different answers to these problems. What is important is to execute, and to constantly review and improve.
If you want to show value to your customers, deliver value often, deliver what is important first. It’s all about incremental delivery.
In globally distributed teams, follow-the-sun works for operations, but doesn’t for development. Co-locate teams whenever possible.
Develop technical leadership within your organization. Create a path for talented and ambitious technical people who do not fit or do not want to pursue the management track. Follow the lead of IBM, Microsoft and Google and offer a “distinguished engineer” career path where senior technical people are given respect, latitude and a voice in product direction.
Don’t expect to save costs through outsourcing. Outsource for flexibility, to flex your organization’s delivery capability; and to gain access to talent. To outsource successfully takes a lot of discipline and management attention and supporting costs.
Constantly be aware of, and beware of, technical debt. Don’t bet on “throwing one away” when you build a system. Agile methods without discipline (comprehensive reviews or pair programming, developer testing, …) gets fast results at first, but builds up a lot of technical debt that has to be addressed eventually. If you start with disciplined, good practices from the beginning you won't dig yourself as deep a hole.

This is an event I look forward to attending again in the future and will defintely recommend to my colleagues.

Sunday, November 2, 2008

Software Quality and Software Security

Recently I came across a series of posts by Fortify Software arguing that that “software quality is not software security”. This position doesn’t make sense to me: security, like reliability and performance and usability and maintainability and so on must be taken into account when building a system; and it is the software quality program that makes sure that this will all be done correctly. Most software is not secure because it is not built in a high-quality way: good software is secure, as well as reliable and scalable and fast and maintainable.

Gary McGraw of Cigital, one of the thought leaders in software security, describes how to integrate software security into SDLC. In the book Software Security: Building Security In he introduces a set of touchpoints: key areas in the SDLC where security must be considered in designing and building software. The touchpoints, listed in order of effectiveness, are:

code reviews, including both manual code reviews and automatic checking with static analysis tools, looking for errors in API use, inadequate validation and error handling, proper logging and specific security-related bug-patterns
risk analysis in architecture and design
penetration testing
risk-based security testing
abuse case scenario planning
security included in requirements
secure operations
and external, expert reviews including third party security walkthroughs of architecture, design and code.

In a Computerworld interview Gary McGraw goes on to say that

“Software security relates entirely and completely to quality. You must think about security, reliability, availability, dependability — at the beginning, in the design, architecture, test and coding phases, all through the software life cycle.”

Code reviews, static analysis, risk analysis in architecture and design, risk-based testing, … all of these are necessary in building high-quality software. What is required in addition is building the team’s knowledge of software security: most programmers don’t learn much if anything about software security in school, and need to be taught about the problems and best practices and tools needed to build secure software systems, just as they need to be taught about developer testing, or high availability systems design, handling concurrency issues in massively parallel systems, and the other complex problems that need to be solved in real systems. Secure software requires:

1. Security awareness training and technical training from companies like the SANS Institute, Foundstone and Cigital. First to understand that IT security is not just about hardening infrastructure and secure operations, that a secure system requires building software in a secure way in the first place. Then training in threat modeling and defensive programming to understand the technical problems, the risks, exploits, attack patterns and common vulnerabilities such as the OWASP Top 10, and best practices, tools and other resources available to build secure software.

2. Developing and sustaining “an attack-based” approach to designing and building systems - making sure not only to check that software meets the specifications, but also checking to ensure that “bad things don’t happen”. Architectural risk analysis, abuse cases, exploratory testing, stress testing, war games, fuzzing, gorilla testing and other types of destructive testing all need to be done: not just to ensure security but also the reliability and overall quality of the system. In other words, a negative, attack-based posture should already be followed in testing and reviews, not just for security.

From Gary McGraw again in a recent podcast "How to Start a Secure Software Development Program":

"I guess the biggest difference is thinking about an attacker and pondering the attacker’s perspective while you’re building something”

To build this perspective takes management focus, time, training, mentoring on what to look for in design and code reviews; and continual reinforcement.

Evangelism on the part of software security experts and highly public IT security failures have been effective in raising awareness of the important of software security: that software needs to be designed and coded and tested with security in mind. Software security, making sure that security is considered in every step, is another important part of building good software - it's another part of the job, handled the same as reliability and performance and other hard problems in building big, real systems.