The Agile Executive

Making Agile Work

What 108M Lines of Code Tell Us

with 16 comments

Results of the first annual report on application quality have just been released by CAST. The company analyzed 108M lines of code in 288 applications from 75 companies in various industries. In addition to the ‘usual suspects’ –  COBOL, C/C++, Java, .NET – CAST included Oracle 4GL and ABAP in the report.

The CAST report is quite important in shedding light on the code itself. As explained in various posts in this blog, this transition from the process to its output is of paramount importance. Proficiency in the software process is a bit allusive. The ‘proof of the pudding’ is in the output of the software process. The ability to measure code quality enables effective governance of the software process. Moreover, Statistical Process Control methods can be applied to samples of technical debt readings. Such application is most helpful in striking a good balance in ‘stopping the line’ – neither too frequently nor too rarely.

According to CAST’s report, the average technical debt per line of code across all application is $2.82.  This figure, depressing that it might be, is reasonably consistent with quick eyeballing of Nemo. The figure is somewhat lower than the average technical debt figure reported recently by Cutter for a sample of the Cassandra code. (The difference is probably attributable to the differences in sample sizes between the two studies). What the data means is that the average business application in the CAST study is saddled with over $1M in technical debt!

An intriguing finding in the CAST report is the impact of size on the quality of COBOL applications.  This finding is demonstrated in Figure 1. It has been quite a while since I last saw such a dramatic demonstration of the correlation between size and quality (again, for COBOL applications in the CAST study).

Source: First Annual CAST Worldwide Application Software Quality Study – 2010

One other intriguing findings in the CAST study is that “application in government sector show poor changeability.” CAST hypothesizes that the poor changeability might be due to higher level of outsourcing in the government sector compared to the private sector. As pointed out by Amy Thorne in a recent comment posted in The Agile Executive, it might also be attributable to the incentive system:

… since external developers often don’t maintain the code they write, they don’t have incentives to write code that is low in technical debt…

Congratulations to Vincent Delaroche, Dr. Bill Curtis, Lev Lesokhin and the rest of the CAST team. We as an industry need more studies like this!

16 Responses

Subscribe to comments with RSS.

  1. Terrible stats. Tech debt is a killer!

    agilescout

    September 28, 2010 at 6:51 am

    • Indeed.

      Having said that, I would suggest in any specific case to evaluate the technical debt vis-a-vis the cost (to produce the application) and the value it is expected to generate. $1M in technical debt on an application that cost $10M and is expected to produce $100M in Net Present Value (NPV) is quite a different story than $1M in technical debt in an application that cost $1M and is expected to generate a NPV of $4M.

      Jim Highsmith and I will discuss these considerations in our forthcoming seminar on technical debt in the Cutter Summit. Until then, a quick summary of our approach can be found in How to Combine Development Productivity Data with Software Quality Metrics.

      Best,

      Israel

      israelgat

      September 28, 2010 at 7:17 am

  2. Hello,

    I trust we all know, negativity sells (from revenue to user base). One side, the picture, data presented seems (even though i have questions on data authenticity) a stark reality. On other, may i request you to present the thorough analysis with the positive trends that came through the analysis. Or is it fair to assume this snippet proves the tester image by being negatively critical.

    Thanks
    Manav Ahuja

    Manav Ahuja

    September 28, 2010 at 10:04 am

    • Hi Manav:

      First and foremost, let me assure you my post has nothing whatsoever to do with “the tester image” you refer to. Knowing some of the folks at CAST, it is most unlikely they were thinking in terms of “the tester image.” Moreover, knowing of one of their folks for some twenty years I would be greatly shocked if he had such thoughts/associations on his mind.

      The CAST team briefed me for about an hour on their study. Best I can tell from a one hour briefing they were meticulous on statistical considerations and data validity.

      My take on the CAST findings can be summarized in five words: it is what it is. I do not associate “positive” or “negative” with an amount of technical debt. To me, it is neutral data that gives us insights how to improve the software we are inspecting.

      Best,

      Israel

      israelgat

      September 28, 2010 at 11:41 am

      • Hi Manav,

        I have been involved with the research team on the study. Thank you for your comments, and many thanks to Israel for the insightful post and analysis.

        I hope you will have a chance to read the actual summary of findings document that’s referenced in Israel’s original post. When you read it, you will find that there are some stark findings in there, but they are not all negative. For example, some of the other conclusions the team came to are: (1) Applications can be implemented to be very secure, if you try hard enough and (2) larger applications don’t have to be of poorer quality, as long as they are built in a modular fashion.

        Regarding the data collection, what’s interesting about this particular study is that it’s not at all survey based. All the structure data about the application software is collected and analyzed by the CAST platform, using the exact same parameters for each application. So, there is no room for subjectivity there at all. The team will be publishing the detailed report in about a month or so, where you’ll see the data cut by every dimension available in the repository. I would be happy to step you through the data collection and analysis process. Please feel free to give me a call.

        In all, I think our host here summed it up well: it is what it is. We shied away from making too many of our own conclusions about the data, because there are so many factors that could drive the results we see. And, we had absolutely no intention to single out any part of the SDLC as culprit for any of the findings. The real point is that now we collectively have empirical data about the output of SDLC processes that we can analyze.

        Thank you again for your comments.

        Lev Lesokhin
        VP Marketing, CAST
        212-871-8330

        Lev Lesokhin

        September 28, 2010 at 9:37 pm

  3. Hi Israelgat:

    I have an objection to the CAST study. It is using wrong metric that application size is in correlation with number of code lines. Low quality code (that does not follow best practices) produces lines that are not necessary. For example if it does not use simple basic rule of using functions. I saw a lot of code for simple applications (e.g. feature was send an e-mail) that had more code lines that some more complex application. So, correlation of low quality code with number of code lines is rather straightforward. It is wrong to use number of code lines for application size metric. Even worse is metric for measuring developer productivity using the unit KLOC (kilo lines of code).

    Karlo Smid

    September 29, 2010 at 2:32 am

    • Hi Karlo:

      I will let the CAST team (e.g. Lev Lesokhin who answered a question on this thread from another reader) respond in detail. Until they do so, here are two quick reflections from me:

      First, the CAST folks did not define a metric. Rather, they looked for statistical correlation. Moreover, the findings about size v. quality is restricted to COBOL. This is a significant finding in pointing out a phenomenon in a specific language.

      I would thing CAST could have easily used function points instead of lines of code. Conversion ratios (from lines of code to function points) per language have been published by various authors. Out of the top of my head, Capers Jones and Michael Mah published conversion tables.

      Best,

      Israel

      israelgat

      September 29, 2010 at 7:08 am

    • Karlo,

      Many pople share your ocncerns about using Lines of Code instead of Function Points as a measure of size. Unfortunately the cost of manually counting Function Points on the 108 million Lines of Code in our study was prohibitive. Even worse, there is often as much as a 10% difference in counting results among certified Function Point counters, a level of error variation that is not acceptable in our research. Although we could have used Backfired Function Points, it is essentially a Line of Code measure since it applies a language weighing factor to the number of lines. In fact, Al Albrecht, the inventor of Function Points, published a study in the IEEE Transactions on Software Engineering in 1983 demonstrating that Function Points correlated over 0.9 with Lines of Code for software written within a single language. Some of the advantage of Function Points are that you can compare them across languages, estimate them earlier during development, and talk in terms understandable to users. None of these benefits were critical to our study.

      That said, CAST is working with the Consortium for IT Software Quality to produce a definition for automated Function Point counting that is as close to the International Function Point User Group (IFPUG) definition as possible. This will make the calculation of Function Points inexpensive and consistent. We anticipate being able to use Function Point counts in future annual reports in this series when the computational standard is completed by CISQ and approved by the OMG.

      – Bill

      Bill Curtis

      September 29, 2010 at 9:06 am

      • Hi Bill,

        functions were just an example. My point is that system size is not in correlation with number of lines. For example using GoF (or any other) patterns can significantly lower the code line number metric and still could implement complex system. Counting the number of unit test cases could reveal some of the system complexity (size).

        Karlo Smid

        September 30, 2010 at 12:58 am

      • Karlo, Rob Austin demonstrated that there is always a disconnect between the desired outcome and the measurement. People game the system over time – it is part of human nature.

        Best,

        Israel

        israelgat

        September 30, 2010 at 6:51 am

  4. Maybe I misread the CAST piece, but they seem to be estimating latent errors in developed code, which ought to be directly comparable with the increased cost of improving quality through (eg) TDD (15-30% uplift for 2-4 factor improvement in defect rate).

    The more interesting technical debt for me is concerned with poor engineering, which compromises testability and migration to new components, causing increased downstream costs when COTS products fall out of support or hardware can no longer be found; or even opportunities are missed to take advantage of Moore’s Law on tech-refresh. Are such costs included in the scope of technical debt?

    Tim Coote

    September 29, 2010 at 3:13 am

    • Hi Tim:

      Again, I will defer to the CAST experts. Until they have the opportunity to respond, here is my take on the two points you bring up.

      Lack of unit test coverage is one of the major components of technical debt I find in my practice. I am not talking about legacy code written some forty years ago but often times about code that was written in the past few years… so, lack of coverage constitutes technical debt in my book.

      Poor engineering, which gets you down the road, is most definitely at the heart of technical debt. See, for example, my various posts on technical debt in this blog.

      Best,

      Israel

      israelgat

      September 29, 2010 at 7:24 am

    • Tim,

      Our estimate of technical debt actually focused on poor engineering in delivered software–the non-functional aspects of quality. Our technology looks for violations of good architectural and coding practices related to changeability, security, performance, robustness, and other quality characteristics in statically analyzed applications.

      The Changeability results would come closest to providing information on how difficult it will be to migrate applications, but certainly there will be concerns for security, robustness, etc. Since our quality characteristics are computed from violations of good software engineering practice, they represent the non-functional aspects of quality that are often not detected through TDD or other testing methods that develop test cases primarily from the functional requirements or functional aspects of stories. You should be able to use measures such as these to adjust your estimates of migration effort.

      – Bill

      Bill Curtis

      September 29, 2010 at 9:40 am

      • Hi Bill
        Maybe I’m reading too much into the thinking behind TDD and related activities, and broadening the approach too much. However, the way that it seems to have been used in Humble and Farley’s ‘Continuous Delivery’, where one starts by building automated UATs points, to my mind, in using TDD to keep control of non-functionals. (Assuming that one puts suitable nfrs into the UA governance process.)

        I think that there’s a bit of a blind spot in most business cases for applications/systems that fail to stress the longer term costs of ownership, even though these dominate the tco.

        I’d have thought that spotting changeability with static analysis was quite hard to do, unless one uses a rather simplistic proxy like afference/efference measures.

        Tim Coote

        October 3, 2010 at 5:46 am

  5. […] on the heels of Gartner’s research note projecting $1 trillion in IT Debt by 2015, CAST’s study provided a more granular view of the debt, estimating an average of over $1 million in technical […]

  6. […] complexity in anything better? If your answer is “no”, you appear to be right. I found this post from The Agile Executive and was drawn to the graph, which is […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: