Friday, January 26, 2007

Avoiding the most common software development goofs

Finding defects in code has been the bane of developers' existence since the earliest days of computer programming. Maurice Wilkes, the British computer scientist best known for his work on the EDSAC, said in 1949:

"As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs."

This keen observation from more than 50 years ago still resonates with anyone tasked with developing software. But why do we make mistakes? And what are some of the ways that we can avoid making mistakes in an attempt to diminish the task of debugging software after it is written? In this paper, we use our years of experience from developing and commercializing static source code analysis to help answer these questions.

During this decade, we have analyzed hundreds of millions of lines of code, seen programming errors from the very simple to the most complicated and heard first hand accounts of the bugs that killed development organizations. While it is an impossible task to relate all of the relevant and interesting anecdotes in this type of discussion, our aim is to convey the general impression of what mistakes keep developers and managers awake at night.

As a means for communicating our experience, we first discuss the cost of mistakes in software development and hypothesize as to why developers make mistakes. Then, in an attempt to help developers identify their most common mistakes as they write their code, we examine some of the categories of these mistakes, both from a pure source code perspective as well as from a higher level programming methodology perspective. Finally, we make the case for automatic technology to help weed out these mistakes earlier in the development process.

The cost of software defects

It is a well known fact that software defects are a very costly problem. According to a study commissioned by the National Institute of Standards and Technology (NIST), software errors are costing the U.S. economy an estimated $59.5 billion annually. The study also reports that more than one-third of these costs could be eliminated by an improved testing infrastructure that enables earlier and more effective identification and removal of software defects.

Drilling into the problem further, it has been shown that the cost of discovering a defect increases drastically the later it is found in the development lifecycle. A defect found during the coding phase of a project is very inexpensive to fix. This makes sense intuitively since the developer responsible for the defect is working on the questionable code, has all of the context of that code in his head at the time the defect is discovered, and as such, can make a reasonable fix in a small amount of time.

When that same defect slips into the QA or system integration phase of the development lifecycle, it now can become an order of magnitude more expensive to address. Now the defect must be discovered as the program is being executed and the person who discovered the defect must reproduce the defect and communicate the errant behavior with the development organization.

Then the development organization must determine which part of the code was likely to cause that particular fault, assign the appropriate developer or developers to investigate further to determine the root cause in the faulty code, then finally fix the defect without introducing other problems into the code.

Another order of magnitude in cost is added if a defect slips passed the QA organization and reaches the field. Not only does an organization have all of the above issues in removing that defect, the organization must now deal with the additional cost of reproducing the issue through their support organization, not to mention the cost of bad public perception surrounding their "buggy product."

Software defects end up costing organizations millions of dollars every year. But the problem is not because the cost of discovering a defect in the field is high; it is because organizations are discovering defects in the field. The distribution of defects across the development lifecycle (from coding to testing to release) is what determines the actual cost of those defects to the organization.

If two organizations each have one thousand defects in their code and the first finds them all in the coding phase but the second discovers them all after the product has been released, the first organization is in much better shape financially. Therefore, we must focus on discovering more defects earlier in the process.

Why do developers make mistakes?
If it's clear to everyone that software defects are an expensive problems (and we assume that it is), why do developers make mistakes? Or rather, why do they make as many mistakes as they do to the point where NIST performs studies and shows that it is costing businesses sixty billion dollars a year? Based on our experience in developing software as well as interacting with thousands of software developers and seeing the types of bugs that come out of the software development process, we view the following as the top reasons developers make mistakes.

Ignorance. The reader might think from this header that we are taking a shot at the educational system that trains our software developers, but that is not the thrust of this argument. Developers are ignorant of the systems that they develop. A single developer can keep thousands, maybe even tens of thousands of lines of code in his or her head for the purpose of perfectly understanding how different pieces of the code interact.

However, today's systems are in the hundreds of thousands, if not millions or tens of millions of lines of code. A single developer working on that type of system will be calling functions or methods of which they are quite ignorant. The pieces of the code that he is forced to interact with may have been written years ago by someone who is no longer available to explain their intent or nuance. So the developer does his best, quickly reading though the implementation or the comments (potentially incorrect!) provided when he needs to interact with another piece of the system. And this leads to errors.

Stress. We mentioned above that the developer does his best to "quickly" read through the implementation of a piece of code that he must interact with. If you are a developer, you probably didn't think twice about the phrasing of that sentence (nor did we when writing it) because that is the reality of any software development process. Managers put pressure on developers to generate code quickly " deadlines come fast and this leads to hasty coding and that leads to mistakes. Often these mistakes are not necessarily in the most common case of the code (since that is well tested), but on edge cases. When time is of the essence and developers are stressed, the parts of the code less traversed suffer. Yet these defects can be just as costly as mainstream bugs.

Boredom. Not all coding is rocket science. In fact, a good number of coding projects, once the design is complete, would be classified by most developers as "boring." Of course, if a developer is bored, he is much less likely to produce good code than if he is excited about his work.

Pounding out those last few cases in a switch statement when the first few took dozens of minutes can be just mind-numbing enough to switch off the brain and make the simplest of mistakes. Boredom also leads to shortcuts " if you are bored with any given task, you are probably looking for ways to eliminate your boredom as quickly as possible. And unfortunately, a shortcut in coding often translates to a defect in the code.

Human Frailties. Certainly the above points play into this last point about the very nature of human beings. Humans are creative and intelligent and able to solve difficult problems through reason. However, we are not robots. We are not so good at repeating the exact same operation thousands of times without some variance. If you doubt this, pull out a piece of paper and sign your name ten times.

Signing your name is probably something you've done thousands of times in your life, yet each time is a little different. This variance means that even if a developer understood every interface in a system perfectly, had all the time in the world, and were programming the most interesting project computer science has ever known, he would still make a mistake in the translation from the design in his head to the code that he writes. That is just a fact of life.

Common goofs
When discussing common programming defects, we have (at least) two choices for categorization. We can either categorize based on root cause in the code (e.g., null pointer dereference, failure to unlock after acquiring a lock, buffer overrun, etc.) or based on a higher level reason for the mistake (e.g., improper error handling, typo, copy and paste, etc.).

Having a hybrid of these two categorizations is difficult in this format, so we choose the latter because we feel it gives a better sense for why a particular defect is introduced. However, we acknowledge that this higher level categorization is very subjective. We're not here to forge new territory in defect classification, but rather want to shed light on why we believe these defects are made.

The examples below are admittedly toy fragments meant only to highlight the particular issue in the discussion. Bear in mind that these problems do manifest themselves over hundreds or thousands of lines of code within and across functions and methods in real systems.

Ignorance. If you were to ask most developers, "should you return a pointer into data on the stack?" they would answer a resounding no. However, from time to time, we see the following type of code in programs:

The function looks simple enough " it is putting a name into a character array and then returning that array presumably for the caller to use. However, once the stack is popped upon return from this function, that pointer is no longer a reliable piece of data. Once other functions are called, the data containing that name will be likely overwritten. To make this function work correctly, we should allocate the memory dynamically so that it persists past the end of the function:

Now the caller of the function can trust that the pointer points to valid data for as long as that memory is not freed. Imagine a potential caller:

This code will work just fine in printing the name. However, notice that with the change to the get_name function, we now have introduced a resource leak in calls_get_name! If the developer implementing calls_get_name does not realize that the implementation changed, there is a defect due to the developer ignorance of that changed interface.

Copy and paste. Now suppose our developer is tasked with writing a function similar to get_name, but that instead duplicated the name of an incoming parameter, the developer would likely copy and paste the original code. Copying and pasting code is a common practice and often stems from developer boredom (since the task is not seen as interesting) or from time stress in not having sufficient time to code a function from scratch. So, the developer copies get_name as follows:

And then he changes the name and adds a parameter:

Then he just changes the part that does the strncpy to call strdup since he knows that's a good way to duplicate a string:

And now the function works as desired. However, the astute reader notices that in the midst of the copy and pasting, the developer has left the original call to malloc in the code, thus causing a resource leak on the very next line when he reassigns the temp_name pointer:

Error handling. One of the most common problems we see in code is in the handling of error conditions. Programmers tend to program for the common case leaving the outliers, from a path execution standpoint, largely untested. However, these outliers are exactly the scenario that the end user is likely to hit as the load becomes high or the application has been running for days or weeks at a time. Examine the following piece of code, pulled directly from Linux:

Here a lock is being acquired near the beginning of the function with the call to spin_lock_irq. And on the common case, right before the end of the function, the corresponding unlock function is called. However, notice that there is an error case in the middle of the function depending on the return value of vortex_adb_allocroute. If this function fails, the calling function returns without unlocking the acquired lock! This can lead to deadlock causing the kernel to hang. In this particular case, failing to handle the error case correctly lead to a concurrency type problem, but this bad behavior can also lead to other coding defects like resource leaks.

Off by ones. Similar to the case of returning pointers from the stack, if you were to ask a developer "How do you index arrays in C/C++ code?" most would appropriately respond that arrays are 0-indexed and the maximum value that should be used to index into array is the size of the array minus one. However, we still see this type of code more often than we'd like:

In this case, depending on how the stack is arranged, it is likely that ptr will be overwritten by the buffer overrun caused by the off by one error in indexing the array. What's worse, this pointer is now null, and as such, the caller of the function may inadvertently deference a null pointer. If you were to catch this type of problem in testing, it may seem very strange that the pointer is null if you know that the something_very_important function can never return a null pointer!

Typos. From time to time, a developer simply omits some punctuation. Unlike in English, where the reader can likely "figure out what you meant," a computer will blindly execute code as is, causing the functionality to be incorrect. In this example below, the developer clearly meant to break if the element found in the array was greater than 100. But because he forgot the { and }, the break will occur on the first iteration of the loop:

And finally, the following typo was discovered in the X.org code that controls root access in a certain piece of the system:

Notice that the second "call" to geteuid does not have parenthesis following the identifier. As such, it is treated as a function pointer and its value is compared against 0. This test always succeeds allowing a normal user of the system to have root access when this piece of code is triggered. Yes, this piece of code is in a real system that tens of thousands of users are probably still using.

Avoiding the goofs

Unfortunately, we do not have a silver bullet for guaranteeing that developers will not make some of the common mistakes that lead to very expensive defects.

There's no way to make code less complex or give them more time to develop it. However, there is technology that helps alleviate the problem of human frailties in the software development process. Research in static source code analysis has made tremendous strides in the past decade " gone are the false positive ridden days of Lint and other light weight code scanning tools.

All of the goofs listed in this paper are easily detected by state of the art static source code analysis technology. Compared with testing tools (e.g., purify), static source code analysis has the benefit of analyzing all of the paths through a given code base and is not tied to the particular test suite of the application. Compared with manual code audits or developer debugging, static source code analysis technology isn't hindered by the human frailties discussed previously.

There is no ignorance of the numerous interfaces in the code since it can analyze the whole program, keeping billions of contexts in memory simultaneously. Also, static source code analysis never suffers from stress or boredom or typos. Computers are very good at performing the same operation thousands of times in a row without variance. If you want to avoid the most common development goofs, augment your development process to include the latest technology to help find defects earlier in the lifecycle.