Archive for the ‘IT Operations’ Category
OpsCamp Through an Internet-scale Lens
Like Agile Roots in Salt Lake City in June 2009, OpsCamp in Austin last week demonstrated how powerful grass roots conferences can be. We might not have had big names on the roster, but we sure had a productive dialog on the tricky issues lurking in the cusp between software development and IT operations in Cloud environments.
The conference has been amply covered by Michael Cote, John Willis, Mark Hinkle, and Damon Edwards (to name a few). This post restricts itself to commenting on one fundamental aspect of the cloud which IMHO does not get the attention it deserves. It might be implied in various discourses on the subject, but I believe it needs to be called out as a fundamental assumption for just about anything and everything one might consider doing with respect to the cloud. I am referring to economies of scale.
As pointed out in a forthcoming book on Cloud Computing by colleague and friend Annie Shum, the cloud phenomenon is fundamentally driven by substantial economies of scale in very large data centers. The operational costs of running such data centers are close to an order of magnitude lower than these prevailing in small and mid-sized data centers. User benefits are primarily derived from these compelling economies of scale.
I will be asking Annie to write a detailed guest post on the subject for readers of The Agile Executive. Until her post is published here, I would recommend we primarily consider the Cloud as a phenomenon that only becomes meaningful at scale. In particular, Private Clouds are not likely to yield Internet-scale efficiencies. Folks who regard their company’s conventional data center as a private cloud might be missing up on the ‘secret sauce’ of cloud computing.
The various agile system administration schemes discussed at the Austin OpsCamp are essential to attaining the requisite economies of scale in cloud services. Watch out for follow-on OpsCamps in other cities for developments to come in this all important space.
Agile Infrastructure
Ten years ago I probably would not have seen any connection between global warming and server design. Today, power considerations prevail in the packaging of servers, particularly those slated for use in large and very large data centers. The dots have been connected to characterize servers in terms of their eco foot print.
In his Agile Austin presentation a couple of days ago, Cote delivered a strong case for connecting the dots of Agile software development with those of Cloud Computing. Software development and IT operations become largely inseparable in cloud environments. In many of these environments, customer feedback is given “real time” and needs to be responded to in an ultra fast manner. Companies that develop fast closed-loop feedback and response systems are likely to have a major competitive advantage. They can make development and investment decisions based on actual user analytics, feature analytics and aggregate analytics instead of speculating what might prove valuable.
While the connection between Agile and Cloud might not be broadly recognized yet, the subject IMHO is of paramount importance. In recognition of this importance, Michael Cote, John Allspaw, Andrew Shafer and I plan to dig into it in a podcast next week. Stay tuned…
The Urgency of Now – Guest Post by Annie Shum
Failure to learn, failure to anticipate, and failure to adapt are the three generic causes of military disasters. Each one of these three failures is bad enough. In combination, they can be catastrophic. Germany swiftly defeated and conquered France in 1940 due to the utter failure of the French army to grasp the nature of future war, to conceive the probable action of the German forces and to adequately react to the German initiative once it unfolded through the Ardennes. The patterns leading to the catastrophe suffered by the French are similar in some ways to the eco-meltdowns described by Jared Diamond in Collapse: How Societies Choose to Succeed or Fail.
In this guest post, colleague and friend Annie Shum poses disturbing questions with respect to our willingness and ability as IT professionals to learn, anticipate and adapt to the imperatives of Cloud Computing. Between shockingly low (15%) server capacity utilization on the one hand, and dramatic changes in the needs of the business on the other hand, companies who continue to use industrial-era IT models are at peril. Annie weaves theses and other related threads together, and makes a resounding call-to-action to re-think IT.
It is remarkable that Annie’s analysis herein of the root causes of a possible meltdown in IT identifies worrisome patterns similar to those that the Agile movement has pointed out to with respect to arcane methods of software development. The very same core problems that afflict software development manifest themselves in the IT paradigm as well as in the corresponding business design. Painful and wasteful that this repeated manifestation is, it actually creates the opportunity to manage software, IT, and the business in unison. To do so, we need to embrace a data-driven version of the economics of IT, to grasp the true nature of Cloud Computing without the hype that currently surrounds it, and to adapt software development, IT operations and business design accordingly. As the title of this post states, we need to start carrying out these three tasks now.
Here is Annie:
The Urgency of Now: The Edge of Chaos and A “Strategic Inflection Point” for IT
“It was the worst of times. It may be the best of times.” – IBM
Consider the following table. It contains a list of statistics pertaining to the enterprise datacenter index compiled by Peter Mell and Tim Grance, NIST. Overall, the statistics are sobering, perhaps even alarming, and do not bode well for the long-term sustainability of traditional on-premises datacenters. Prudent IT organizations – whether big or small, stalwart or startup – should consider this as a wake-up call. In particular, out of the almost twelve million servers in US datacenters today, the typical server capacity utilization is only around fifteen percent. Although not explicitly shown in this table, the average utilization of the mainframe z/OS servers is typically over eighty percent. However, mainframe z/OS server utilization is only a minor component of the overall average server utilization.
Statistics Enterprise Datacenter Index 11,800,000 Servers in US datacenters 15% Typical server capacity utilization $800,000,000,000/year Purchasing & maintaining enterprise software 80% Software costs spent on maintenance: the “80-20” ratio 100x Power consumption/sq ft compared to office building 4x Increase in server power consumption, 2001 to 2006 2x Increase in number of servers, 2001 to 2006 $21,300,000 Datacenter construction cost, 9000 sq ft $1,000,000/year Annual cost to power the datacenter 1.5% Portion of national power generation 50% Potential power reduction from green technologies 2% Portion of global carbon emissions
Over the years, organizations have accepted such skewed levels of server inefficiency and escalating maintenance costs of IT infrastructure as the norm. Even as organizations continue to express concerns, many seem resigned to the status quo tacitly: akin to what Bob Evans of InfoWeek described as “insurmountable laws of physics.” Looking ahead, however, the status quo may no longer be a viable option for most organizations. Due to soaring electricity/power costs compounded by the recent global financial meltdown with a near collapse of the financial system that triggered a prolonged (and for now, apparently indefinite) credit crunch, these are unparalleled strident and chaotic times for businesses. Pressured by business decision-makers who are under a heightened level of anxiety, enterprise IT is now confronting a transformative dilemma whether to preserve the status quo or to re-think IT.
On one hand, the current global recessionary down cycle is a particularly powerful (albeit rooted in fear) and instinctive deterrent to challenging the status quo. For risk-adverse organizations, it is only understandable why status quo, fundamental flaws notwithstanding, may trump disruptive change during these challenging times. On the other hand, forward-thinking decision-makers may make the bold but disruptive (radical) choice to view status quo as the fundamental problem: acknowledge the growing “urgency of now” by resolving to overcome and correct the entrenched shortcomings of enterprise IT.
“You never want a serious crisis to go to waste”. That quote (or its many variations) has been attributed alike to economists and politicians. The same could be said for IT. Indeed a growing number of IT industry observers believe the profound impact of the on-going economic crisis could offer a rare window of opportunity for organizations to rethink traditional capital-intensive, command-control, on-premises IT operations and invest in new and more flexible self-service IT delivery/deployment models. Think of this defining moment as what Andy Grove, co-founder of Intel, described as the “strategic inflection point”. He was referring to the point in the dynamic when the fundamentals of a business are about to change and “that change can mean an opportunity to rise to new heights.” Nonetheless, the choices will be hard decisions because the options are stark: either counter-intuitively invest in a down cycle by focusing on a more sustainable but disruptive trajectory or hunker down and risk irreversible shrinking business.
As one considers how to address the challenges of today’s enterprise IT, perhaps the following two observations should be taken into account. First, despite the quantum leap in technology advancements, generally the basic design and delivery models of existing IT applications/services are variations of traditionally insular, back-office automation business tools. Second, the organizational structure and business models of most companies are deeply rooted in models of yesteryear, in many instances dating back to the Industrial Revolution. In theory, adhering to the traditional organizational model of top-down command-control can maximize predictability, efficiency and order. Heretofore, this has been the modus operandi for most organizations that Umair Haque succinctly characterized as “ industrial-era companies that make industrial-era stuff — and play by industrial-era rules.” In today’s exponential times, however, the velocity of change and the rapidly growing need of interconnecting to other organizations and automating value chains inevitably lead to an increase in uncertainty and disorder. Strategically, forward-thinking organizations should consider seeking alternative models to address the interdependent and shifting new world order.
In their book, “Presence – Human Purpose and the Field of the Future”, authors Peter Senge, Otto Schramer, Joe Jaworski and Betty Sue Flower observe that many of the practices of the Industrial Age appear to be largely unaffected by the changing reality of today’s society and continue to expand in today’s business organizations. They conclude with this advice: “As long as our thinking is governed by industrial ‘machine age’ metaphors such as control, predictability, and faster is better, we will continue to re-create organizations as we have had – for the last 100 years – despite their increasing disharmony with the world and the science of the 21st century.” Likewise, the traditional top-down command-control modus operandi of enterprise IT today does not reflect adequately and hence likely is unable to accommodate fully the transformational shift of business from silo organizations to “all thing’s digital all the time”, hyper-interconnected and hyper-interdependent ecosystems.
And Now the Bottle-neck is in Operations
In his forthcoming Agile Austin presentation, colleague and friend Michael Cote will be discussing velocity in Agile development vis-a-vis velocity in IT operations. To quote Cote:
Technologies used by public web companies and now cloud computing are looking to offer a new way to deliver applications by addressing deployment and provisioning concerns. Agile software development has sped up the actual development of software, and now the bottle-neck is in operations who’re not always able to deploy software at the same velocity that Agile teams ship code. What do these technologies look like, are they realistic, and how might they affect your organization?
The topic is important from a few perspectives, such as the new business models it enables. With Agile infrastructure, a closed loop is formed between vendor and customer. This loop operates on the basis of close to real-time feedback. The new functionality in the software deployed in the afternoon could be in response to a specific need that was brought up in the morning. Hence, the business focus and the business design change from software that has already been developed and tested (‘done done’) but not yet delivered, to one that has been developed, tested and deployed (‘done done done’) in ultra fast way.
It should also be pointed out that the line between developing content and developing software gets really blurry nowadays. From a company perspective both software and contents are entities that are being made available for dissemination. If you accept the premise that the generation of content and development of the corresponding software should be done under a unified Agile model, the desirability, the power and the benefits of managing development and delivery in unison become obvious. When applied to both content and software, an agile infrastructure paradigm could easily transform the publishing industry, and others.
In short, the business benefits Agile Infrastructure begets trump the (very significant) operational benefits it enables.
Between Agile and ITIL – Part II
The July 2009 post Between Agile and ITIL introduced the application of Agile principles to system management with the following words:
You do not need to be an expert in Value Stream Mapping to appreciate the power of speeding up deployment to match the pace of Agile development. By aligning development with deployment, you streamline “production” with “consumption.” The rationale for so doing is aptly captured in the first bullet of the Declaration of Interdependence: “We increase return on investment by making continuous flow of value our focus.”
Yesterday’s press release about the acquisition of Phurnace by BMC validates the projection given in the afore-listed post. Colleague and friend Michael Cote puts his finger on the heart of the acquisition in his post in People Over Process:
The interesting part is also that this is automation – I’m assuming – at the application layer, where as most automation talk in past and present is at the infrastructure layer. Of course, the thought leaders in this area – folks like Reductive Labs (Puppet),OpsCode (Chef), and in a more general sense cloud management outfits – are doing a helpful job of blurring the distinction between traditional IT layers like application and infrastructure with their dev/ops angled automation. Check out this white paper done by Reductive Labs and dto solutions on the topic for a nice toe-dip. And, I’d expect to see more application layer automation from VMWare/SpringSource. Older automation portfolios like BMC’s Blade Logic line need to keep a close eye on these developments, hopefully, taking in the proven parts of that work.
One can, of course, automate IT tasks without embracing Agile. The fundamental question to be answered is whether one considers ITIL as an “empirical” process control model or as a “defined” process control model (or possibly a hybrid).
Extending a True Epiphany
In Agile Software Development with Scrum, Ken Schwaber describes a true epiphany he experienced as a result of his 1995 meetings with DuPont’s process control experts:
They [DuPont’s process control experts] inspected the system development processes that I brought them. I have rarely provided a group with so much laughter. They were amazed and appalled that my industry, system development, was trying to do its work using a completely inappropriate process control model. They said system development had so much complexity and unpredictability that it had to be managed by a process control model they referred to as “empirical.” They said it was nothing new, and all complex processes that were not completely understood required the empirical model…
… I realized why everyone in my industry had such problems building systems. I realized why the industry was in such trouble and had such poor reputation. We were wasting our time trying to control our work by thinking we had an assembly line when the only proper control was frequent and first-hand inspection, followed by immediate adjustments…
Based on this insight, I have since formulated with others the Scrum process for developing complex products, particularly software systems.
Fast forward to November 2009. During a lovely dinner in Boulder with Dean Leffingwell, we got into the subject of connecting Agile with ITIL. This conversation really registered with me. I actually recalled how years ago Ray Paquet characterized IT as a “continuous manufacturing” process. If you accept Ray’s premise, the chain {DuPont –> Scrum –> IT} is quite intriguing.
Re-reading Software Evolution recently, I was struck by the observation Tom Mens makes in the Introduction:
… due to the fact that the activity of software evolution is a continuous feedback process, the chosen process model itself is likely to be subject to evolution.
I can’t help wondering whether Tom’s observation applies to IT. If so, what are the implications with respect to IT operations and system management?!
Opinions please!
Your Investment in Enterprise Software – Guidelines to CIOs and CFOs
The overall investment associated with implementing and maintaining a suite of enterprise software products could be significant. A 1:4 ratio between product investment and the corresponding investment over time in related services is not uncommon. In other words, an initial $2M in licensing a suite of enterprise software products might easily balloon to $10M in total life-cycle costs (initial investment in perpetual license plus the ongoing investment in associated services).
I offer the following rule-of-a-thumb guidelines to assessing whether the terms quoted by a vendor for an enterprise software suite of products are right:
- Standard maintenance costs: Insist on a 1:1 ratio between license and standard maintenance over a 5 year period. If standard maintenance costs over this period exceed the corresponding license costs, chances are: A) the vendor is quite greedy; or, B) the vendor’s software accrued a non-negligible amount of technical debt. Ask the vendor to quantify the technical debt in monetary terms. See Technical Debt on Your Balance Sheet for an example how to conduct such quantification.
- Premium customer support costs: Certain premium customer support services could be quite appropriate for your business parameters. However, various “premium services” could actually address deficits or defects in the enterprise software products you license. If the technical debt figure is high, the vendor you are considering might not be able to afford the software he has developed. Under such circumstances, “premium services” could simply be a vehicle the vendor uses to recoup his investment in software development.
- Professional services costs: Something is wrong if the costs of professional services exceed licensing cost. Either the suite of products you are considering is not a good fit for your business parameters or the initiative you are aspiring to implement through the software is overly ambitious.
To summarize, the grand total of license fees, customer support fees and professional services fees over a 5 year period should not be higher than 3X license fees. Something is out of balance if you are staring at a 4X or 5X ratio for the software you are considering.
One final point: please do not forget to add End-of-Life costs to the economic calculus. Successful enterprise software initiatives can be very sticky.
An Update on Agile Business Service Management
A previous post in this blog defined the demarcation line between The Agile Executive and BSM Review as follows:
If software development is your primary interest, you might find my forthcoming posts in BSM Review go a little beyond the traditional scope of software methods. If, however, you are interested in software delivery in entirety, you are likely to find good synergy between the topics I will address in BSM Review and those I will continue to bring up in The Agile Executive. Either way, I trust my posts and Cote’s will be of on-going interest to you.
Since writing these words, I realized how tricky it is to adhere to this differentiation. The difficulty lies in the “cord” between development and operations. Development needs to devise algorithms that take into account operational characteristics in IT. Operations needs to comprehend the limits of such algorithms in the context of the service level agreements and operational level agreements that had been negotiated with their customers (either external or internal). The mutual need is particularly strong in the web application/web operations domain where mutual understanding, collaborative work and joint commitment often need to transcend organizational lines.
Given the inherently close ties between development and operations, here are some BSM Review articles and posts that are likely to be of interest to readers of The Agile Executive:
- The Joys of Real Hardware – what it means to do Business Service Management on a very large-scale
- The Voice of the CIO – a study on the attributes needed by today’s CIO (yes, you had better be agile and Agile…)
- The Quest for a Maturity Model in Business Service Management – while the focus is on BSM, some of the models might apply in a fairly straightforward manner to Agile
- Business Service Management, Six Sigma and your IT Compliance Program – lessons to the champion who has one foot in Agile, the other foot in Six Sigma
- A Measured Approach to Cloud Computing – what it really means to “… make muck so you don’t have to”
- The Case for Agile Business Service Management – the fusion of modern software development methods with the prevailing preference to run IT from the perspective of the business customer
It is a little premature at this early stage to project how BSM Review will evolve. My hunch is that forthcoming articles in BSM Review on cloud computing, large-scale operations, leadership, risk mitigation and technology trends will be of particular interest to readers of this blog.