The Agile Executive

Making Agile Work

Archive for the ‘IT Operations’ Category

What 108M Lines of Code do not Tell Us

leave a comment »

Source: Nemo

Coming on the heels of Gartner’s research note projecting $1 trillion in IT Debt by 2015, CAST’s study provided a more granular view of the debt, estimating an average of over $1 million in technical debt per application in a sample of 288 applications. Between these two studies, the situation examined at the micro-level seems to be quite consistent with the state of affairs estimated and projected at the macro-level.

My hunch is that the gravity of the situation from a software quality and maintenance perspective is actually masked by efforts of IT staffs to compensate for programming problems through operational excellence. For example, carefully staged deployment and quick rollback often enable coping with defects that could/should have been handled through higher test coverage, lesser complexity or a more acceptable level of code duplication.

Part of the reason that the masking effects of IT staffs are not always fully appreciated is that they are embedded in the business design of IT Outsourcing companies. The company to which you outsourced your IT is ‘making a bet’ it can run your IT better than you can. It often succeeds in so doing. The unresolved defects in your old code plus those that evolved over time through software decay have not necessarily been fixed. Rather, the manifestations of these defects are  handled operationally in a more efficient manner.

Think again if your visceral reaction to the technical debt situation described in the Gartner research note and the CAST study is of the “This can’t possibly be true” variety. It is what it is – just take a quick look at Nemo to see representative technical debt data with your own eyes. And, as indicated in this post, it might even be worse than what it looks. As Gartner puts it:

The results of such [IT Debt] an assessment will be, at best, unsettling and, at worst, truly shocking.

Technical Debt Assessment, Sterling Barton LLC and the Moussaka

with one comment

A few month ago Chris Sterling and I were carrying out a Cutter Technical Debt Assessment and Valuation engagement for a venture capitalist who was considering a certain company. We discovered various things in the code of this company. More noteworthy, my deep domain expertise led to Chris discovering the great Greek dish Moussaka.

I have eaten a lot of good Moussakas over the years. Even against this solid gastronomic background I can’t forget how the eyes of Chris lit up when he took the first bite. It took him only a tiny little time to get on his iPhone and tweet on the culinary aspects of our engagement. I then knew it was going to be a very successful engagement…

The relationship with Chris deepened since this episode. For example, in collaboration with Brent Barton Chris contributed a great article to the forthcoming issue of the Cutter IT Journal on Technical Debt. In this article Chris and Brent  demonstrate how technical debt techniques can be applied at the portfolio level. They make the reader step into the shoes of the project portfolio planner and walk him through their approach to enhancing the decision-making process by using the software debt dashboard.

Chris has just published an excellent post entitled “Using Sonar Metrics to Assess Promotion of Builds to Downstream Environments” in Getting Agile and was kind enough to suggest I cross-post it in The Agile Executive. Here it is (please note that the examples given below by Chris have nothing to do with the engagement described above):

“For those of you that don’t already know about Sonar you are missing an important tool in your quality assessment arsenal. Sonar is an open source tool that is a foundational platform to manage your software’s quality. The image below shows one of the main dashboard views that teams can use to get insights into their software’s health.

The dashboard provides rollup metrics out of the box for:

  • Duplication (probably the biggest Design Debt in many software projects)
  • Code coverage (amount of code touched by automated unit tests)
  • Rules compliance (identifies potential issues in the code such as security concerns)
  • Code complexity (an indicator of how easy the software will adapt to meet new needs)
  • Size of codebase (lines of code [LOC])

Before going into how to use these metrics to assess whether to promote builds to downstream environments, I want to preface the conversation with the following note:

Code analysis metrics should NOT be used to assess teams and are most useful when considering how they trend over time

Now that we have this important note out-of-the-way and, of course, nobody will ever use these metrics for “evil”, lets discuss pulling data from Sonar to automate assessments of builds for promotion to downstream environments. For those that are unfamiliar with automated promotion, here is a simple, happy example:

A development team makes some changes to the automated tests and implementation code on an application and checks their changes into source control. A continuous integration server finds out that source control artifacts have changed since the last time it ran a build cycle and updates its local artifacts to incorporate the most recent changes. The continuous integration server then runs the build by compiling, executing automated tests, running Sonar code analysis, and deploying the successful deployment artifact to a waiting environment usually called something like “DEV”. Once deployed, a set of automated acceptance tests are executed against the DEV environment to validate that basic aspects of the application are still working from a user perspective. Sometime after all of the acceptance tests pass successfully (this could be twice a day or some other timeline that works for those using downstream environments), the continuous integration server promotes the build from the DEV environment to a TEST environment. Once deployed, the application might be running alongside other dependent or sibling applications and integration tests are run to ensure successful deployment. There could be more downstream environments such as PERF (performance), STAGING, and finally PROD (production).

The tendency for many development teams and organizations is that if the tests pass then it is good enough to move into downstream environments. This is definitely an enormous improvement over extensive manual testing and stabilization periods on traditional projects. An issue that I have still seen is the slow introduction of software debt as an application is developed. Highly disciplined technical practices such as Test-Driven Design (TDD) and Pair Programming can help stave off extreme software debt but these practices are still not common place amongst software development organizations. This is not usually due to lack of clarity about these practices, excessive schedule pressure, legacy code, and the initial hurdle to learning how to do these practices effectively. In the meantime, we need a way to assess the health of our software applications beyond just tests passing and in the internals of the code and tests themselves. Sonar can be easily added into your infrastructure to provide insights into the health of your code but we can go even beyond that.

The Sonar Web Services API is quite simple to work with. The easiest way to pull information from Sonar is to call a URL:

http://nemo.sonarsource.org/api/resources?resource=248390&metrics=technical_debt_ratio

This will return an XML response like the following:

  248390
  com.adobe:as3corelib
  AS3 Core Lib
  AS3 Core Lib
  PRJ
  TRK
  flex
  1.0
  2010-09-19T01:55:06+0000

    technical_debt_ratio
    12.4
    12.4%

Within this XML, there is a section called  that includes the value of the metric we requested in the URL, “technical_debt_ratio”. The ratio of technical debt in this Flex codebase is 12.4%. Now with this information we can look for increases over time to identify technical debt earlier in the software development cycle. So, if the ratio to increase beyond 13% after being at 12.4% 1 month earlier, this could tell us that there is some technical issues creeping into the application.

Another way that the Sonar API can be used is from a programming language such as Java. The following Java code will pull the same information through the Java API client:

Sonar sonar = Sonar.create("http://nemo.sonarsource.org");
Resource commons = sonar.find(ResourceQuery.createForMetrics("248390",
        "technical_debt_ratio"));
System.out.println("Technical Debt Ratio: " +
        commons.getMeasure("technical_debt_ratio").getFormattedValue());

This will print “Technical Debt Ratio: 12.4%” to the console from a Java application. Once we are able to capture these metrics we could save them as data to trend in our automated promotion scripts that deploy builds in downstream environments. Some guidelines we have used in the past for these types of metrics are:

  • Small changes in a metric’s trend does not constitute immediate action
  • No more than 3 metrics should be trended (the typical 3 I watch for Java projects are duplication, class complexity, and technical debt)
  • The development should decide what are reasonable guidelines for indicating problems in the trends (such as technical debt +/- .5%)

In the automated deployment scripts, these trends can be used to stop deployment of the next build that passed all of its tests and emails can be sent to the development team regarding the metric culprit. From there, teams are able to enter the Sonar dashboard and drill down into the metric to see where the software debt is creeping in. Also, a source control diff can be produced to go into the email showing what files were changed between the successful builds that made the trend go haywire. This might be a listing per build and the metric variations for each.

This is a deep topic that this post just barely introduces. If your organization has a separate configuration management or operations group that managed environment promotions beyond the development environment, Sonar and the web services API can help further automate early identification of software debt in your applications before they pollute downstream environments.”

Thank you, Chris!

A Good Start Point for Devops – Guest Post by Peter McGarahan

with one comment

Source: http://www.flickr.com/photos/stevepj2009/3461077400/

Many of the devops posts in this blog were written from a dev perspective. Today’s guest post by Peter McGarahan examines the topic from the ops perspective. It is inspired by the following eloquent quip about change:

Assume we’re starting from scratch. Assume that we actually are a startup that doesn’t have over a hundred years of experience and sub-optimized IT legacy.

A few biographical details for readers who might not know Peter or know of him. Peter J. McGarahan is the founder and president of McGarahan & Associates, an IT Service Management consulting and training organization.  Peter offers 27 years of IT and Business Service Management experience in optimizing and aligning the service and support organizations of the Fortune 1000 to deliver value against business objectives. His thought leadership has influenced the maturity and image of the service and support industry. His passion for customer service led the Taco Bell support organization to achieve the Help Desk Institute Team Excellence Award in 1995. IT Support News named him one of the “Top 25 Professionals in the Service and Support Industry” in 1999.  Support professionals voted McGarahan “The Legend of the Year” in 2002 and again in 2004 at the Service Desk Professionals conference for his endless energy, mentoring and leadership coaching. As a practitioner, product manager and support industry analyst and expert, McGarahan has left his service signature on the support industry / community.

Here is Peter:

As a former Director of Infrastructure & Operations (I&O), I found it beneficial to establish a respectful working relationship with my Development Colleagues. It was important for the accountable leaders to better understand the objectives, workings and success metrics of each team. It was also critical for the leader to establish the ‘rules of engagement’ for how each team would assist each other in achieving their stated objectives (success metrics). It certainly helped to have an IT / Business leader who established a cooperative / collaborative teamwork culture. She also supported it with shared IT / Business objectives and performance goals for all accountable IT leaders. The I&O team certainly benefited from a CIO who understood the importance of customer service, the value of support and the business impact (negative IT perception) caused by repetitive incidents, problems and service disruptions. It was a game changing day for I&O when the CIO announced that all accountable leaders would have half of their performance objectives (bonus compensation) based on the success metrics of the I&O team.

In working with Infrastructure & Operations organizations, it has become apparent that as we continue to implement, measure and continuously improve the IT Infrastructure Library v3 (ITIL) processes, we must simultaneously address how we focus on all things new! In a recent Cutter Executive Update entitled IT’s Change Imperative, I relate lessons learned from my conversations with Geir Ramleth, CIO of Bechtel and Ron Griffin, Senior VP of Applications for Hewlett-Packard. Their leadership, vision and courage inspired me to think differently about how IT can better work together for the benefit of the business. In the end, the only success that matters – is the continued growth and profitability of the business. A summary of their change success stories:

  • Hewlett-Packard CIO Randy Mott hired the right people to implement his IT strategy and change plan that included building, consolidating and automating its data centers; transferred work in-house from contractors; standardized on only a quarter of its apps; and built one central data warehouse — all while cutting spending in half.
  • Geir Ramleth, CIO of Bechtel described how he used cloud computing principles to transform IT and make Bechtel’s computing environment more agile. He had a vision of allowing Bechtal’s global employees access to the right resources at any place at any time with any device – delivered securely and cost-effectively. He encouraged his IT people to step outside their comfort zones and do things in a different way. He resisted modifying the current state and went with the transformational change fearing they would only wind-up incrementally better. In targeting a desired end state, he gave his team guiding instructions to “Assume we’re starting from scratch. Assume that we actually are a startup that doesn’t have over a hundred years of experience and sub-optimized IT legacy.”

In the spirit of change, we should challenge ourselves to develop shared ‘devops’ goals / objectives. In the end, these should help us identify, link and realize how to translate IT objectives / metrics into tangible business benefits / value.

I have listed some shared ‘devops’ goals / objectives that I believe are a good starting point. I encourage and invite your thoughts, opinions and ideas around these and any others that you feel would aid ‘devops’ in working to establish measurable business value credibility.

  1. Lower the total cost of ownership of all services (best way to achieve this is build them with serviceability, usability and maintainability in the design of all new applications, systems and services).
  2. Increase business value – achieve business benefits (lower operational costs, increased revenues, improved customer experience)
  • Simplified navigation
  • Productivity enhancing capabilities /functionality
  • Plug ‘n play integration
  • Personalization
  • Training / On-line Self-help features

3. Minimize business impact

  • Reduce change-related outages / incidents.
  • Reduce number of problems / incidents / calls.
  • Reduce the number of requests / training-related calls / inquiries.
    • Provide insights and tracking to the number of Known errors / workarounds / knowledge articles (solutions).
  • Speed to resolution based on business prioritization model
    • Operating Level Agreement / Commitment between Single Point of Contact (SPOC) Service Desk and internal IT Service Providers based on response / resolution times / commitments.
    • Bug-fix Process:

- Provide insights into the ‘bug/fix/enhancement’ list and process with transparent visibility to business prioritization (needs / requirements / quantifiable benefits).

4. Improved and frequent Communication

  • A marketing / product launch / status update and awareness campaign.
    • Especially around rollout / enhancement time.

As If Another Proof Point Was Needed

leave a comment »

Annie Shum’s interview earlier this week gave readers of this blog a multi-dimensional view of imminent changes in IT. If you needed independent validation, it came yesterday through EMC’s Chuck Hollis words in the national solution provider GreenPages Technology Solutions’ 14th annual summit:

Vice President Global Marketing CTO Chuck Hollis Monday said the changes resulting from the storage giant’s own no-holds barred journey to the private cloud led to a decline in IT employee job satisfaction…

Hollis said the internal IT satisfaction drop came in the second phase of the EMC cloud revolution focused squarely on mission critical applications. That second phase — which EMC is in the midst of now — has sparked major changes in IT jobs as the company has replaced IT management, security staff and backend IT staff.

“During this phase, this is where org (organizational) chart issues started to come in,” Hollis said. “People’s jobs started to change. Younger people in the organization were being promoted over older people.”

As if another proof point to add to Annie‘s rigorous data was needed…

Written by israelgat

August 4, 2010 at 7:30 am

Boundary Objects in DevOps

with one comment

Boundary Object by Cherice.

Source: Flickr; Chrice‘s Photostream

The following recommendation was given in the post How to Initiate a Devops Project:

For a DevOps project, start by establishing the technical debt of the software to be released to operations. By so doing you build the foundations for collaboration between development and operations through shared data. In the devops context, the technical debt data form the basis for the creation and grooming of a unified backlog which includes various user stories from operations.

I would like to augment this recommendation with a suggestion with respect to the mindset during the initiation phase. Chances are the IT folks feel outnumbered by the dev folks. It might or might not be a matter of optics, but recognizing and appreciating this mindset is will help a lot in getting the devops project on track.

Here is a simple example I heard from a participant in the June 25 devops day in Mountain View, CA. The participant with whom I talked is an IT ops person who tries to get ops aligned with  fairly proficient Agile development teams. She is, however, constrained with respect to the IT ops resources available to her. She simply does not have the resources required to attend each and every Scrum meeting as 25 such meetings take place every day. She strongly feels “outnumbered.”

Various schemes could be devised to enable meaningful participation of ops in the Agile process. The more important thing though is to be fully sensitized to the “outnumbered” feeling. The extension of Agile principles to ops will not succeed at the face of such a feeling.

Discussing the subject with my friend Andrew Shafer, he mentioned the effectiveness of boundary objects in such cross-organizational situations:

Boundary objects are objects which are both plastic enough to adapt to local needs and constraints of the several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use, and become strongly structured in individual-site use. They may be abstract or concrete. They have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable means of translation. The creation and management of boundary objects is key in developing and maintaining coherence across intersecting social worlds. [Source: Wikipedia].

As an example, the boundary object for the situation described in this post could be a set of technical debt criteria that make the code eligible for deployment from a product life cycle perspective. By so doing, it shifts the dialog from the process to the outcome of the process. Instead of working on generating IT resources in an “outnumbered” mode, the energy shifts toward developing a working agreement on the intrinsic quality of the code to be deployed.

Some technical debt criteria that could form the core of a devops boundary object are mentioned in the post Technical Debt Meets Continuous Deployment. Corresponding criteria could be used in the boundary object to satisfy operational requirements which are critical to the proper functioning of the code. For example, a ceiling on configuration drift in IT could be established to ensure an adequate operating environment for the code. A boundary object that contains both technical debt criteria and configuration drift criteria satisfies different concerns – those of dev and those of ops – simultaneously.

Written by israelgat

July 6, 2010 at 6:44 am

Technical Debt Meets Continuous Deployment

with 11 comments

As you would expect in a conference entitled velocity, and in a follow-on devops day, speeding up things was an overarching theme. In the context of devops, the theme primarily manifested itself in lively discussions about the number of deploys per day. Comments such as the following reply to my post Ops Driven Dev were typical:

Conceptually, I move the whole business application configuration into the source code…

The theme that was missing for me in many of the presentations and discussions on the subject was the striking of a balance between velocity and quality. The classical trade-off in process control is between production rate and product quality (and safety, but that aspect [safety] is beyond the scope of this post). IMHO this trade-off applies to software just as it applies to mechanical or chemical processes.

The heart of the “deploy early and often” strategy hailed by advocates of continuous deployment is known deployment state to known deployment state. You don’t let the deployment evolve from one state to another before it has stabilized to a robust state. The power of this incremental deployment is in dealing with single-piece (or as small number of pieces as possible) flow rather than dealing with the effects of multiple-piece flow. When the deployment increments are small enough, rollback, root cause analysis and recovery are relatively straightforward if a deployment turns sour. It is a similar concept to Agile development, extending continuous integration to continuous deployment.

While I am wholeheartedly behind this devops strategy, I believe it needs to be reinforced through rigorous quality criteria the code must satisfy prior to deployment. The most straightforward way for so doing is through embedding technical debt criteria in the release/deploy process. For example:

  • The code will not be deployed unless the overall technical debt per line of code is lower than $2.
  • To qualify for deployment, code duplication levels must be kept under 8%.
  • Code whose Cyclomatic complexity per Java class is higher than 15 will not be accepted for deployment.
  • 50% unit test coverage is the minimal level required for deployment.
  • Many others…

I have no doubt whatsoever that code which does not satisfy these criteria might be successfully deployed in a short-term manner. The problem, however, is the accumulative effect over the long haul of successive deployments of code increments of inadequate quality. As Figure 1 demonstrates, a Java file with Cyclomatic complexity of 38 has a probability of 50% to be error-prone. If you do not stop it prior to deployment through technical debt criteria, it is likely to affect your customers and play havoc with your deployment quite a few times in the future. The fact that it did not do so during the first hour of deployment does not guarantee that such a  file will be “well-behaved” in the future.

mccabegraph.jpg

Figure 1: Error-proneness as a Function of Cyclomatic Complexity (Source: http://www.enerjy.com/blog/?p=198)

To attain satisfactory long-term quality and stability, you need both the right process and the right code. Continuous deployment is the “right process” if you have developed the deployment infrastructure to support it. The “right code” in this context is code whose technical debt levels are quantified and governed prior to deployment.

Devops: It is Not About ITIL, It is About Proficiency

with 2 comments

As you would expect, the Information Technology Infrastructure Library (ITIL) topic was brought up in the devops day held last Friday in a LinkedIn facility in Mountain View, CA. We, of course, had the expected spectrum of opinions about ITIL in the context of devops – from “ITIL will never work for a true continuous development shop” to “well, you can make ITIL work under such circumstances.” Needless to say, a noticeable level of passion accompanied these two statements…

IMHO the heart of the issue is not ITIL per se but system management proficiency. If your system management proficiency is high, you are likely to be able to effectively respond to 10, 20 or 50 deploys per day. Conversely, if your system management proficiency is low, ops is not likely to be able to cope with high velocity in dev. The critical piece is alignment of velocities between dev and ops, not the method used to manage IT systems and services.  Whether you use ITIL, COBIT or your own home-grown set of best practices is irrelevant. Achieving alignment of velocities between dev and ops is a matter of proficiency in system management.

How to Initiate a DevOps Project

with 4 comments

17th/21st Lancers c. 1922-1929 "THE FIGHTING SPIRIT!" by sunnybrook100 - One Million Views!.

Source: 17th/21st Lancers c. 1922-1929 “THE FIGHTING SPIRIT!”

Agile consultants on a development project often start by helping the team construct a backlog. The task is sufficiently concrete to get all stakeholders (product management, project management, development, test, any others) on a collaborative track through the creation of a key artifact. The backlog establishes a base line for the tasks to be carried out in the project.

For a DevOps project, start by establishing the technical debt of the software to be released to operations. By so doing you build the foundations for collaboration between development and operations through shared data. In the DevOps context, the technical debt data form the basis for the creation and grooming of  a unified backlog which includes various user stories from operations.

Apply the same approach when you are fortunate to be able to include folks from operations in the Agile team from the very beginning. You start with zero technical debt, but you track it on an ongoing basis and include the corresponding “fix-it” stories in the backlog as you accrue the debt. Running technical debt analytics on the source code every two weeks is a good practice to follow.

As the head of development, you might not be comfortable sharing technical debt data. This being the case, you are not ready for DevOps.

The Agile Flywheel

with 8 comments

Readers of The Agile Executive have been exposed to the “All In!” strategy used by Erik Huddleston to transform the software engineering process at Inovis and make it uniquely streamlined. In this post we follow up on the original discussion of the subject to explore the effect of Agile on IT Operations. As the title implies, Agile at Inovis served as a flywheel which created the momentum required to transform IT Operations and blend the best of Agile with the best of ITIL.

This guest post was written by Ray Riescher – a Six Sigma Black Belt, Agile evangelist and a business process change agent. Ray is currently responsible for business process management and IT governance at Inovis, a  leading provider of business-to-business (B2B) e-commerce services, in Alpharetta, GA

Here is Ray:

When we converted to an Agile Scrum software methodology some 24 months ago, I never imagined the lessons I’d learn and the organizational change that would be driven by the adoption of Scrum.

I’ve lived by the philosophy that managing a business is managing its processes and that all of those processes, especially the operational processes, are interconnected.  However, I don’t think I was fully prepared for effect Agile Scrum would have on our company operations.

We dove head first into Agile Scrum and adapted to it very quickly. However, it wasn’t until we landed a very large and demanding customer that Scrum was really put to the test. New enhancements, new features, and new configurations were all needed ASAP.  Scrum delivered with rapid development and deployment in the form of releases that were moving into production with amazing velocity. Our release cadence hit warp drive and at one point we experienced several months where multiple teams’ production releases were deploying at the end of every two week sprint.

We’ve subscribed to the ITIL service support processes for Release, Change, Incident, Problem and Configuration Management. ITIL has served us well, giving us a common language and a clear understanding of process boundaries.

As the Scrum release cadence kicked in, the downstream ITIL processes had to keep up, adapt, and support the dynamics of rapid production changes.  What happened was enlightening and maybe a bit ground breaking.

The Release Management process had to reassess its reliance on artifacts for gate keeping. The levels of sign offs had to be streamlined, the heavyweight deployment documentation had to be lightened, yet the process still had to control the production release to ensure deployment success.  The rapidity of the release cycles meant that maintenance window downtime would be experienced too frequently by customers, so “rolling bounce” deployment strategies were devised and implemented.

Change requests could no longer wait for a weekly Change Management review board to approve and schedule the changes.  Change management risk models had to be relied on for accurate detection of risky changes.

Early on in this dynamic environment, we weren’t quite as good as we needed to be and our Incident Management process was put to the test.  Faster releases meant more opportunity for problems with service degradation and outages. This reality manifested itself more frequently than we’d ever experienced. Monitoring, detecting and repairing became paramount for environment stability and customer satisfaction.

What we found out was that we became very agile at this break/fix game. We developed a small team approach to managing incidents and leveraged the ITIL Problem Management process to rapidly perform root cause analysis. Once the true root cause was determined, a fix would be defined and deployed. Sometimes the fix was software related and went through the Scrum process, sometimes the fix was hardware related and went through the Configuration Management process, other times it was more operational and the fix took the form of training or corrections to procedural documentation.

The point is we’ve become agile across the entire IT spectrum. Whether it’s development via Scrum, the velocity with which we now operate our ITIL processes, or the integrated break/fix operational support processes, we are performing all of these with an agile mindset and discipline. We have small teams, working on priorities, and completing what needs to be completed now.

Scrum set the flywheel in motion and caused the rest of the IT process life cycle to respond.  ITIL’s processes still form the solid core of service support and we’ve improved the processes’ capability to handle intense work velocity. The organization adapted by developing unprecedented speed in the ability to deliver production fixes and to solve root cause problems with agility.

What I think we are witnessing is a manifestation of Agile Business Service Management; a holistic agile methodology running across the IT process spectrum that’s delivering eye popping change and tremendous results.

Harnessing Economies of Scale in Cloud Computing to Realize a Greener Computing Option

with 2 comments

Economies of Scale have been much discussed in The Agile Executive since the recent OpsCamp in Austin, TX. The significant savings on system administration costs  in very large data centers have been called out as a major advantage of Internet-scale Clouds. Unlike various short-lived advantages, the benefits to the Cloud operator, and to the Cloud user when the savings are passed on to him/her, are sustainable.

In this guest post, colleague and friend Annie Shum analyzes the various sources of waste in operations in traditional data centers. Like an Agilist with Lean inclinations who confronts an inefficient Waterfall process, Annie explains how economies of scale apply to the various kinds of waste that are prevalent in today’s small and medium data centers. Furthermore, she connects the dots that lead toward a Green IT option.

Here is Annie:

Harnessing Economies of Scale in Cloud Computing to Realize a Greener Computing Option

Scale Matters: “Over time, however, competitive advantage within categories shifts inexorably toward volume operations architecture.” – Geoffrey Moore, “Dealing with Darwin”

It is a truism that today’s datacenters are systemically inefficient. This is not intended as an indictment of all conventional datacenters. Nor does it imply that today’s datacenters cannot be made more efficient (incrementally) through right sizing and other initiatives, notably consolidation by deploying virtualization technologies and governance by enforcing energy conservation/recycling policies. There are a myriad of inefficiencies, however, that are prevalent in datacenters today.

Many industry observers lament the “staggering complexity” that permeates on-premises datacenters. Over time, most, if not all, enterprise IT datacenters have become amalgamations of disparate heterogeneous resources. Generally, they can be described as incohesive, perhaps even haphazard, accumulations. The datacenter components and configurations often reflect the intersections of organizational politics (LOB reporting structures leading to highly customized/organizational asset acquisitions and configurations), business needs of the moment (shifting corporate strategies and changing business imperatives to gain competitive edge or meet regulatory compliances) and technology limitations (commercial tools available in the marketplace). It should come as no surprise that human interactions and errors are considered a major contributor to the inefficiencies of datacenters: IBM reported that human errors account for seventy percent of the datacenter problems.

The challenge of maximizing energy efficiency begins fundamentally with the historical capital-intensive ownership model for computing assets to enable each organization to operate its own datacenter and to provide “24×7 availability” to its own users.  The enterprise IT staff has been required to support unpredictable future growth, accommodate situational demands and unscheduled but deadline-critical events, meet performance levels within SLAs and comply with regulatory and auditing requirements. Hence, datacenters generally are over-configured and over-provisioned. In addition to highly skewed under-utilization of distributed platform servers, ninety percent of corporate datacenters have excess cooling capacity. Worst of all, according to IBM, about seventy-two percent of cooling bypassed the computing equipment entirely. Further compounding these problems for a typical enterprise datacenter, is the lack of transparency and the inability to control energy consumption properly due to inadequate and often inaccurate instrumentation to quantify energy consumption and waste due to energy lost.

The economics of Cloud Computing can offer a compelling option for more efficient IT: by lowering power consumption for individual organizations and by improving the efficiency of a large number of discrete datacenters. Although the electricity consumption of Cloud Computing is projected to be one to two percent of today’s global electricity use, Cloud service providers can still cultivate sustainable Green I.T. effectively at lower costs by leveraging state-of-the-art super energy efficient massive datacenters, proximity to power generation thereby reducing transmission costs and, above all, harnessing enormous economies of scale. To better understand how Cloud Computing can offer greener computing in the Cloud and how will it help moderate power consumption by datacenters and rein in run-away costs, a good starting place is James Hamilton’s September 2008 study on Internet-Scale Service Efficiency” as summarized in the table below.

Resource Cost in

Medium DC

Cost in

Very Large DC

Ratio
Network $95 / Mbps / month $13 / Mbps / month 7.1x
Storage $2.20 / GB / month $0.40 / GB / month 5.7x
Administration ≈140 servers/admin >1000 servers/admin 7.1x

Table 1: Internet-Scale Service Efficiency [Source: James Hamilton]

This study concludes that hosted services by Cloud providers with super large datacenters (at least tens of thousands of servers) can achieve enormous economies of scale of five to seven times over smaller scale (thousands of servers) medium deployments.  The significant cost savings is driven primarily by scale. Other key factors include location (low cost real estate and electricity rate, abundant water supply and readily available fiber-optic connectivity), proximity to electricity and power generators, load diversity, and virtualization technologies.

Will this mark the beginning of the end for traditional on-premises datacenters? Can enterprise IT continue to justify new business cases for expanding today’s non-renewable energy powered datacenters? According to the McKinsey article, the costs to launch a large enterprise datacenter have risen sharply from $150M to over $500M over the past five years. The facility operating costs are also increasing at about twenty percent per year. How long will the status quo last for enterprise IT considering the recent trend of Cloud service providers? Major players such as Google, Microsoft as well as the U.S. government itself have invested in or are planning ultra energy-efficient mega-size datacenters (also known as “container hotels”) with massive commoditized containerization and proximity both to power source and less expensive power rates. Bottom line: will the tide turn if the economics (radical cost savings) due to enormous economies of scale become too significant to ignore?

Despite the potential for significant cost savings, it is premature to declare the demise of traditional IT or the end of enterprise datacenters. After all, the rationale for today’s enterprise IT extends well beyond simplistic bottom-line economics – at least for now. To most industry observers, enterprise datacenters are unlikely to disappear although the traditional roles of enterprise IT will be changing. A likely scenario may involve redistributing IT personnel from operating low-level system operational tasks to addressing higher-level functions involving governance, energy management, security and business processes. Such change not only would become more apparent but will likely be precipitated by the rise of hybrid Clouds and the growing interconnection linking SOA, BPM and social computing. Another likely scenario is the rise of the mega datacenters or “container hotels” for Cloud Utility Computing providers. Although the global economic outlook will undoubtedly play a key role in shaping the development plans/timelines of the mega datacenters, they are here to stay. Case in point: by 2012, Intel estimates that it will design and ship about a quarter of the server chips (it sells) to such mega-data centers.


Follow

Get every new post delivered to your Inbox.

Join 36 other followers