The Agile Executive

Making Agile Work

Archive for the ‘Testing’ Category

Technical Debt Meets Continuous Deployment

with 11 comments

As you would expect in a conference entitled velocity, and in a follow-on devops day, speeding up things was an overarching theme. In the context of devops, the theme primarily manifested itself in lively discussions about the number of deploys per day. Comments such as the following reply to my post Ops Driven Dev were typical:

Conceptually, I move the whole business application configuration into the source code…

The theme that was missing for me in many of the presentations and discussions on the subject was the striking of a balance between velocity and quality. The classical trade-off in process control is between production rate and product quality (and safety, but that aspect [safety] is beyond the scope of this post). IMHO this trade-off applies to software just as it applies to mechanical or chemical processes.

The heart of the “deploy early and often” strategy hailed by advocates of continuous deployment is known deployment state to known deployment state. You don’t let the deployment evolve from one state to another before it has stabilized to a robust state. The power of this incremental deployment is in dealing with single-piece (or as small number of pieces as possible) flow rather than dealing with the effects of multiple-piece flow. When the deployment increments are small enough, rollback, root cause analysis and recovery are relatively straightforward if a deployment turns sour. It is a similar concept to Agile development, extending continuous integration to continuous deployment.

While I am wholeheartedly behind this devops strategy, I believe it needs to be reinforced through rigorous quality criteria the code must satisfy prior to deployment. The most straightforward way for so doing is through embedding technical debt criteria in the release/deploy process. For example:

  • The code will not be deployed unless the overall technical debt per line of code is lower than $2.
  • To qualify for deployment, code duplication levels must be kept under 8%.
  • Code whose Cyclomatic complexity per Java class is higher than 15 will not be accepted for deployment.
  • 50% unit test coverage is the minimal level required for deployment.
  • Many others…

I have no doubt whatsoever that code which does not satisfy these criteria might be successfully deployed in a short-term manner. The problem, however, is the accumulative effect over the long haul of successive deployments of code increments of inadequate quality. As Figure 1 demonstrates, a Java file with Cyclomatic complexity of 38 has a probability of 50% to be error-prone. If you do not stop it prior to deployment through technical debt criteria, it is likely to affect your customers and play havoc with your deployment quite a few times in the future. The fact that it did not do so during the first hour of deployment does not guarantee that such a  file will be “well-behaved” in the future.

mccabegraph.jpg

Figure 1: Error-proneness as a Function of Cyclomatic Complexity (Source: http://www.enerjy.com/blog/?p=198)

To attain satisfactory long-term quality and stability, you need both the right process and the right code. Continuous deployment is the “right process” if you have developed the deployment infrastructure to support it. The “right code” in this context is code whose technical debt levels are quantified and governed prior to deployment.

Advertisements

Using 3σ Control Limits in Software Engineering

with 2 comments



Source: Wikipedia; Control Chart

The  July/August 2010 issue of IEEE Software features an article entitled “Monitoring Software Quality Evolution for Defects” by Hongyu Zhang and Sunghun Kim. The article is of interest to the software developer/tester/manager in quite a few ways. In particular, the authors report on their successful use of 3σ control limits in c-charts used to plot defects in software projects.

To put things in perspective, consider my recent assessment of the results accomplished by Quick Solutions (QSI) in two of their projects:

One to one-and-a-half standard deviation better than the mean might not seem like much to six-sigma black belts. However, in the context of typical results we see in the software industry the QSI results are outstanding.  I have not done the exact math whether those results are superior to 95%, 97% or 98% of software projects in Michael Mah‘s QSMA database as the very exact figure almost does not matter when you achieve this level of excellence.

A complementary perspective is provided by Capers Jones in Estimating Software Costs: Bringing Realism to Estimating:

Another way of looking at six-sigma in a software context would be to achieve a defect-removal efficiency level of about 99.9999 percent. Since the average defect-removal efficiency level in the United States is only about 85 percent, and less than one project in 1000 has ever topped 98 percent,  it can be seen that actual six-sigma results are beyond the current state of the art.

The setting of control limits is, of course, quite a different thing from the actual defect-removal efficiency numbers reported by Jones for the US and the very low number of defects reported by Mah for QSI. Having said that, driving a continuous improvement process through using 3σ control limits is the best recipe toward eventually reaching six-sigma results. For example, one could drive the development process by using Cyclomatic complexity per Java class as the quality characteristic in the figure at the top of this post. In this figure, a Cyclomatic complexity reading higher than 10.860 (the Upper Control Limit) will indicate a need to “stop the line” and attend to reducing complexity before resuming work on functions and features.

Coming on the heels of the impressive results reported by David Joyce on the use of statistical process control (SPC) techniques by the BBC, the article by Zhang and Kim is another encouraging report on the successful application of manufacturing techniques to software (and to knowledge work in general). I am not at liberty to quote from this just published IEEE article, but here is the abstract:

Quality control charts, especially c-charts, can help monitor software quality evolution for defects over time. c-charts of the Eclipse and Gnome systems showed that for systems experiencing active maintenance and updates, quality evolution is complicated and dynamic. The authors identify six quality evolution patterns and describe their implications. Quality assurance teams can use c-charts and patterns to monitor quality evolution and prioritize their efforts.

Can Technical Debt Constitute a Breach of Implied Warranties?

with 12 comments

POGO_film_diffs by Dancing Lemur.

Photo credit: Dancing Lemur (Flickr)

Cunningham’s quip “A little debt speeds development so long as it is paid back promptly with a rewrite” is intuitively very clear. We are talking about short-term debt which will be reduced, and hopefully eliminated in entirety, at the earliest possible time.

The question this post addresses is what happens when the expected short-term technical debt becomes a significant long-term debt? Specifically, can technical debt under some conditions constitute a breach of implied warranties?

In his InformIT article Don’t “Enron” Your Software Project, Aaron Erickson coined the term “Technical Fraud” and connected it to Lemmon Laws:

As a reaction to seeing this condition and its deleterious effects, I coined the term technical fraud to refer to the practice of incurring unmanaged and hidden technical debt. Many U.S. states have “lemon laws” that make it illegal to knowingly sell someone a car that has undisclosed maintenance problems. Selling a “lemon” is a fraudulent practice in the world of cars, and it should be considered as such in the world of software.

It is a little tricky (though not impossible – see Using Credit limits to Constrain Development on Margin) to define the precise point where technical debt becomes “unmanaged.” One needs to walk a fine line between technical/methodical incompetence and resource availability to determine technical fraud. For example, if your code has 35% coverage, is it or is not unmanaged? Does the answer to this question change if your cyclomatic complexity per class exceeds 30? I would think the courts might be divided for a very long time on the question when does hidden technical debt represent a fraudulent misrepresentation.

One component  of technical debt deserves special attention in the context of this post. I am referring to the conscious decision not to do unit testing at all.

Best I understand it, the rationale for not “bothering” with unit testing is a variant of the old ploy “we do not have time for testing here.” It is a resource allocation strategy that bets on the code being miraculously bug-free. Some amount of functional testing is done out of necessity – the code in customers hands needs to function as proclaimed.  But, the pieces of code  from which functionality is constructed are not subject to direct rigorous testing. The individual units of code will be indirectly exercised in some manner through functional testing, but not in a systemic manner to verify and validate correctness of the units of code per se.

Such a conscious decision IMHO indicates no intention to pay back this category of technical debt – unit test coverage. It is therefore quite incompatible with the nature of an implied warranty:

An implied warranty is as an unstated promise, assumed by the law in most sales transactions, that the product will be of at least average quality and will do what the average customer would expect it to do  [The Reader’s Digest Legal Questions & Answers Book]

To #1 defense open to a software vendor who gets sued over lack of unit testing is that a fair average quality of software can be attained without any unit testing. As a programmer, I would think such defense would fly at the teeth of the availability since 1987 of the IEEE Standard for Software Unit Testing.

It is fascinating to note the duality between contracts and programming.  For the programmer who follows the tenets of design by contract, “a unit test provides a strict, written contract that the piece of code must satisfy…”

Disclaimer: I am not an expert in the law. The opinion expressed in this post merely represents my layman’s understanding of  principles of contract law that might be applicable to technical debt situations.

How to Use Observations From Outside the Agile Process

with 2 comments

a9723 The Whorl of Architecture by tengtan.

Photo credit: tengtan (Flickr)

Most posts on technical debt in this blog emphasize the use of technical debt for strategic decision-making. In this post we will point out the use of technical debt in Agile teams at the tactical level. Specifically:

  • Every two weeks; and/or,
  • With every build.

Taking a close look at the various components of technical debt during the  bi-weekly iteration review meeting provides plenty of useful information to the process. For example, you might look for insights to explain the following:

  • Why is the unit test coverage figure going down?
  • Any particular reason the cyclomatic complexity figure has gone up?
  • Why is the figure of merit for design lower than the figure indicated in the previous iteration review meeting?
  • Many others…

The emphasis in this mode of operation is on guiding the retrospection. Plenty of good and valid reasons might exist for any of the trends mentioned above. However, observing the trends helps you ask the right questions, focusing on what happened during the iteration just completed. In conjunction with technical debt data from previous iteration review meetings, trends that characterize your software development project become visible. You may or may not need to change anything you are doing, but you become very conscious of any “let’s not change” decision.

An intriguing practice suggested by colleague and friend Erik Huddleston is to make technical debt a criterion for the build to pass. The build automatically fails if the technical debt figure has gone up. Or, if you are very focused on a specific aspect of technical debt such as complexity, you fail the build whenever the complexity figure of merit rises above  a certain pre-determined threshold. For example, you might fail a build in which the cyclomatic complexity per method has exceeded 4.

The power of failing a build whenever the technical debt arises is in utilizing the build as an exceptionally effective influence point. You instill the discipline of reducing technical debt one build at a time. If your team aggressively practices continuous integration, it will address technical debt issues multiple times a day. Instead of staring at a “mountain” of technical debt towards the release of a product, you chunk it to really small increments that get addressed “real-time.” For instance, a build that failed due to lack of comments can usually be fixed very quickly by the developer who “upset the apple cart” while the logic embedded in the code is fresh on his/her mind.

A good insight to the way the tactical use of technical debt techniques adds value is provided by the following observation: the technical debt data is observed from outside the Agile process. Hence, technical debt data  is nicely suited to guiding the process. If you think of the software engineering fabric as a virtual stack, the technical debt “layer” could be considered a layer above the Agile process.

Should You Ship This Code Before Reducing Technical Debt?!

with 8 comments

File:Control flow graph of function with loop and an if statement without loop back.svg

Source: JulesH, Wikipedia, A control flow graph of a simple function

Technical debt is usually perceived as a measure of expediency. You borrow a little (time) with the intent of paying it back as soon as possible. To quote Ward Cunnigham:

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… I thought that rushing software out the door to get some experience with it was a good idea, but that of course, you would eventually go back and as you learned things about that software you would repay that loan by refactoring the program to reflect your experience as you acquired it.

As is often the case with financial debt, technical debt accrues with compound interest. Once it reaches a certain level (e.g. $1 per line of code) you stare at a difficult question:

Should I ship this code before reducing the accrued technical debt?!

The Figure below, taken from An Objective Measure of Code Quality by Mark Dixon, answers the question with respect to one important component of technical debt – cyclomatic complexity. Once complexity per source code file exceeds 74, the file is for most practical purposes guaranteed to contain errors. Some of the errors in such a file might be trivial. However, a 2007 study by Capers Jones indicates about a third of the errors found in released code are likely to be serious enough to stop an application from running or create erroneous outputs.

mccabegraph.jpg

To answer the question cited above – Should You Ship This Software Before Reducing Technical Debt?! –  examine both cost and risk for the number of error-prone files you are about to unleash:

  • The economics of defect removal clearly favor early defect removal over late defect removal. The cost of removal grows exponentially as function of time.
  • Brand risk should be first and foremost on your mind. If complexity figures higher than 74 per file are more of the norm than the exception, you are quite likely to tarnish your image due to poor quality.

If you decide to postpone the release date until the technical debt has been reduced, you can apply yourself to technical debt reduction in a biggest-bang-for-the-buck manner. The analysis of complexity can identify the hot spots in your code, giving you a de-facto roadmap you would be wise to follow.

Conversely, if you opt to ship the code without reducing technical debt, you might lose this degree of freedom to prioritize your “fix it” work.  Customer situations and pressures might force you to attend to fixing modules that do not necessarily provide as much bang for the buck.

Postscript: Please note that the discussion in this post is strictly limited to intrinsic quality. It does not address at all extrinsic quality. In other words, reducing/eliminating technical debt does not guarantee that the customer will find the code valuable. I would suggest reading Beyond Scope, Schedule and Cost: Measuring Agile Performance in the Cutter Blog for a more detailed analysis of the distinction between the two.

Erratum: The figure above is actually taken from a blog post on the Mark Dixon paper cited in my post. See McCabe Cyclomatic Complexity: the proof is in the pudding. My apology for the error.

Cloud Computing Meet the Iterative Requirements of Agile

leave a comment »

It so happened that a key sentence fell between my editing fingers while publishing Annie Shum‘s splendid post Cloud Computing: Agile Deployment for Agile QA Testing. Here is the corrected paragraph with the missing sentence highlighted:

By providing virtually unlimited computing resources on-demand and without up-front CapEx or long-term commitment, QA/load stress and scalability testing in the Cloud is a good starting point. Especially, the flexibility and on-demand elasticity of the Cloud Computing meet the iterative requirements of Agile on an on-going basis. More than likely it will turn out to be one of the least risky but quick ROI pilot Cloud projects for enterprise IT. Case in point, Franz Inc, opted for the Cloud solution when confronted with the dilemma of either abandoning their critical software product testing plan across dozens of machines and databases or procuring new hardware and software that would have been cost-prohibitive. Staging the stress testing study in Amazon’s S3, Franz completed its mission within a few days. Instead of the $100K capital expense for new hardware as well as additional soft costs (such as IT staff and other maintenance costs), the cost of the Amazon’s Cloud services was under $200 and without the penalty of delays in acquisition and configuration.

Reading the whole post with this sentence in mind makes a big difference… And, it is is a little different from my partner Cote‘s perspective on the subject

My apology for the inconvenience.

Israel

Cloud Computing: Agile Deployment for Agile QA Testing

with 9 comments

Annie Shum‘s original thinking has often been quoted in this blog. Her insights are always characterized by seeing the world through the prism of fractals principles.  And, she always relentlessly pursues the connecting of the dots. In this guest post, she examines in an intriguing manner both the tactical and the strategic aspects of large scale testing in the cloud.

Here is Annie:

Cloud Computing: Agile Deployment for Agile QA Testing
Annie Shum twitter@insightspedia
Invariably, the underlying questions at the heart of every technology or business initiative are less about technology but more about the people (generally referred to as the users and consumers in the IT industry). For example, “How does this technology/initiative impact the lives and productivity of people?” or “What happens to the uses/consumers when they are offered new power or a new vehicle of empowerment?” Remarkably, very often the answers to these questions will directly as well as indirectly influence whether the technology/initiative will succeed or fail; whether its impact will be lasting or fleeting ; and whether it will be a strategic game-changer (and transform society) or a tactical short-term opportunity.
One can approach some of the Cloud-friendly applications, e.g. large scale QA and load stress testing in the Cloud, either from a tactical or from a strategic perspective. As aforementioned, the answer to the question “What happens to the uses/consumers when they are offered new power or a new vehicle of empowerment?” can influence whether a new technology initiative will be a strategic game-changer (and transform society) or a tactical short-term opportunity. In other words, think about the bacon-and-eggs analogy where the chicken is involved but the pig is committed. Look for new business models and innovation opportunities by leveraging Cloud Computing that go beyond addressing tactical issues (in particular, trading CapEx for OpEx). One example would be to explore transformative business possibilities stemming from Cloud Computing’s flexible, service-based delivery and deployment options.
Approaching Large-scale QA and Load Stress Testing in the Cloud from a Tactical Perspective
Nowadays, an enterprise organization is constantly under pressure to demonstrate ROI of IT projects. Moreover, they must be able to do this quickly and repeatedly. So as they plan for the transition to the Cloud, it is only prudent that they start small and focus on a target area that can readily showcase the Cloud potential. One of the oft-touted low hanging fruit of Cloud Computing is large scale QA (usability and functionality) testing and application load stress testing in the Cloud. Traditionally, one of the top barriers and major obstacles to comprehensive and high quality (iterative) QA testing is the lack of adequate computing resources. Not only is the shortfall due to budget constraint but also staff scheduling conflicts and the long lead time to procure new hardware/software. This can cause significant product release delays, particularly problematic with new application development under Scrum. An iterative incremental development/management framework commonly used with Agile software development, Scrum requires rapid successive releases in chunks, commonly referred to as splints. Sophisticated Agile users leverage this chunking technique as an affordable experimentation vehicle that can lead to innovationi. However, the downside is each iteration can lead to new testing needs and further compounding the QA woes.
By providing virtually unlimited computing resources on-demand and without up-front CapEx or long-term commitment, QA/load stress testing in the Cloud is a good starting point. More than likely it will turn out to be one of the least risky but quick ROI pilot Cloud projects for enterprise IT. In addition, the flexibility and on-demand elasticity of Cloud Computing meet the iterative nature of Agile on an on-going basis. Case in point, Franz Inc, opted for the Cloud solution when confronted with the dilemma of either abandoning their critical software product testing plan across dozens of machines and databases or procuring new hardware and software that would have been cost-prohibitive. Staging the stress testing study in Amazon’s S3, Franz completed its mission within a few days. Instead of the $100K capital expense for new hardware as well as additional soft costs (such as IT staff and other maintenance costs), the cost of the Amazon’s Cloud services was under $200 and without the penalty of delays in acquisition and configuration.
Approaching Large-scale QA and Load Stress Testing in the Cloud from a Strategic Perspective
While Franz Inc. leverages the granular utility payment model, the avoidance of upfront CapEx and long-term commitment for a one-off project, other entrepreneurs have decided to harness the power of on-demand QA testing in the Cloud as a new business model. Several companies, e.g. SOASTA, LoadStorm and Browsermob are now offering “Testing as a Service” also known as “Reliability as a Service” to enable businesses to test the real-world performance of their Web applications based on a utility-based, on-demand Cloud deployment model. Compared to traditional on-premises enterprise testing tool such as LoadRunner, the Cloud offerings promise to reduce complexity without any software download and up-front licensing cost. In addition, unlike conventional outsourcing models, enterprise IT can retain control of their testing scenarios. This is important because comprehensive QA testing typically requires an iterative process of test-analyze-fix-test cycle that spans weeks if not months.
Notably, all three organizations built their service offerings on Amazon EC2 infrastructure. LoadStorm launched in January 2009 and Browsermob (open source) currently in beta, each enable users to run iterative and parallel load tests directly from its Website. SOASTA, more established than the aforementioned two startups, recently showcases the viability of “Testing as a Service” business model by spawning 650 EC2 Servers to simulate load from two different availability zones to stress test a music-sharing website QTRAX. As reported by Amazon, after a 3-month iterative process of test-analyze-fix-test cycle, QTRAX can now serve 10M hits/hour and handle 500K concurrent users.
The bottom line is there are effectively two different perspectives: tactical (“involved”) versus the strategic (“committed”) and both can be successful. Moreover, the consideration of tactical versus strategic is not a discrete binary choice but a granularity spectrum that accommodates amalgamations of short term and long-term thinking. Every business must decide the best course to meet its goals.
i A shout out to Israel Gat for his insightful comment on chunking as a vehicle for innovation.

Invariably, the underlying questions at the heart of every technology or business initiative are less about technology but, as Clive Thompson of Wired Magazine observed, more about the people (generally referred to as the users and consumers in the IT industry). For example, “How does this technology/initiative impact the lives and productivity of people?” or “What happens to the uses/consumers when they are offered new power or a new vehicle of empowerment?” Remarkably, very often the answers to these questions will directly as well as indirectly influence whether the technology/initiative will succeed or fail; whether its impact will be lasting or fleeting ; and whether it will be a strategic game-changer (and transform society) or a tactical short-term opportunity.

One can approach some of the Cloud-friendly applications, e.g. large scale QA and load stress testing in the Cloud, either from a tactical or from a strategic perspective. As aforementioned, the answer to the question “What happens to the uses/consumers when they are offered new power or a new vehicle of empowerment?” can influence whether a new technology initiative will be a strategic or tactical. In other words, think about the bacon-and-eggs analogy where the chicken is involved but the pig is committed. Look for new business models and innovation opportunities by leveraging Cloud Computing that go beyond addressing tactical issues (in particular, trading CapEx for OpEx). One example would be to explore transformative business possibilities stemming from Cloud Computing’s flexible, service-based delivery and deployment options.

Approaching Large-scale QA and Load Stress Testing in the Cloud from a Tactical Perspective

Nowadays, an enterprise organization is constantly under pressure to demonstrate ROI of IT projects. Moreover, they must be able to do this quickly and repeatedly. So as they plan for the transition to the Cloud, it is only prudent that they start small and focus on a target area that can readily showcase the Cloud potential. One of the oft-touted low hanging fruit of Cloud Computing is large scale QA (usability and functionality) testing and application load stress testing in the Cloud. Traditionally, one of the top barriers and major obstacles to conducting comprehensive, iterative and massively parallel QA test cases is the lack of adequate computing resources. Not only is the shortfall due to budget constraint but also staff scheduling conflicts and the long lead time to procure new hardware/software. This can cause significant product release delays, particularly problematic with new application development under Scrum. An iterative incremental development/management framework commonly used with Agile software development, Scrum requires rapid successive releases in chunks, commonly referred to as splints. Advanced Agile users leverage this chunking technique as an affordable experimentation vehicle that can lead to innovation. However, the downside is the rapid accumulation of new testing needs.

By providing virtually unlimited computing resources on-demand and without up-front CapEx or long-term commitment, QA/load stress and scalability testing in the Cloud is a good starting point. Especially, the flexibility and on-demand elasticity of the Cloud Computing meet the iterative requirements of Agile on an on-going basis. More than likely it will turn out to be one of the least risky but quick ROI pilot Cloud projects for enterprise IT. Case in point, Franz Inc, opted for the Cloud solution when confronted with the dilemma of either abandoning their critical software product testing plan across dozens of machines and databases or procuring new hardware and software that would have been cost-prohibitive. Staging the stress testing study in Amazon’s S3, Franz completed its mission within a few days. Instead of the $100K capital expense for new hardware as well as additional soft costs (such as IT staff and other maintenance costs), the cost of the Amazon’s Cloud services was under $200 and without the penalty of delays in acquisition and configuration.

Approaching Large-scale QA and Load Stress Testing in the Cloud from a Strategic Perspective

While Franz Inc. leverages the granular utility payment model, the avoidance of upfront CapEx and long-term commitment for a one-off project, other entrepreneurs have decided to harness the power of on-demand QA testing in the Cloud as a new business model. Several companies, e.g. SOASTA, LoadStorm and Browsermob are now offering “Testing as a Service” also known as “Reliability as a Service” to enable businesses to test the real-world performance of their Web applications based on a utility-based, on-demand Cloud deployment model. Compared to traditional on-premises enterprise testing tool such as LoadRunner, the Cloud offerings promise to reduce complexity without any software download and up-front licensing cost. In addition, unlike conventional outsourcing models, enterprise IT can retain control of their testing scenarios. This is important because comprehensive QA testing typically requires an iterative process of test-analyze-fix-test cycle that spans weeks if not months.

Notably, all three organizations built their service offerings on Amazon EC2 infrastructure. LoadStorm launched in January 2009 and Browsermob (open source) currently in beta, each enable users to run iterative and parallel load tests directly from its Website. SOASTA, more established than the aforementioned two startups, recently showcases the viability of “Testing as a Service” business model by spawning 650 EC2 Servers to simulate load from two different availability zones to stress test a music-sharing website QTRAX. As reported by Amazon, after a 3-month iterative process of test-analyze-fix-test cycle, QTRAX can now serve 10M hits/hour and handle 500K concurrent users.

The bottom line is there are effectively two different perspectives: tactical (“involved”) versus the strategic (“committed”) and both can be successful. Moreover, the consideration of tactical versus strategic is not a discrete binary choice but a granularity spectrum that accommodates amalgamations of short term and long-term thinking. Every business must decide the best course to meet its goals.

P.S.  A shout out to Israel Gat for not only allowing me to post my piece today but for his always insightful comments in our daily email exchanges.