Posts Tagged ‘Agile Infrastructure’
Beyond Devops
Based on feedback from participants in my Agile 2010 workshop “How We Do Things Around Here In Order To Succeed,” I am planning to offer the workshop as a one-day seminar. A tentative agenda for the seminar is as follows:
- Introduction to Cultural Framework
- Exercise #1: Determine Your Culture
- Exercise #2: Strengths and Weaknesses of Your Culture
- Change Behavior, Not Culture
- When Cultures Meet
- Exercise #3: Conflicts in Devops
- The Agile Flywheel
- Exercise #4: Using Technical Debt as a Boundary Object
- Bringing Individuals and Organizations Together
- Exercise #5: It is About Sharing the Process, Not Just Sharing the Information
- Exercise #6: From success in devops to end-to-end success
Until I publish a full-fledged outline for the seminar, here is the central theme:
Beyond Devops
Inter-departmental flow in a corporate setting is often envisioned as the inner workings of a swiss watch. Wheels turn other wheels in a precise manner. Not only is effectiveness maintained, it is maintained in an efficient manner.
Problem is, many individuals and most departments hold distorted views of the departments they interact with. Reasonable distortions can be mitigated as long as the operational balance between departments is maintained. Once the operational balance is broken the “swiss watch” stops to function as the inter-departmental distortions block any attempt to restore the balance.
The most effective way to get dev and ops on a path of collaboration is for the two departments to jointly construct a boundary object. As dev and ops are joined in the hip through the code, and even more so through its quality, technical debt is well suited to serve as the core of a boundary object around which the two department share meaning while retaining operational autonomy.
Similar boundary objects can be constructed between dev and other departments – customer support, professional services, marketing, sales and finance. When conceived and implemented in a manner that links numerous boundary objects together, Agile success in dev can be extended to both upstream and downstream functions.
Schedule Constraints in the Devops Triangle
Last week’s post “The Devops Triangle” demonstrated the extension of Jim Highsmith‘s Agile Triangle to devops. The extension relied on adding compliance to the three traditional constraints of software development: scope, schedule, cost. A graphical representation of this extension is given in Figure 1.
Figure 1: Compliance as the Fourth Constraint in Devops Projects
This blog post examines how time/schedule should be governed in the devops context. It does so by building on the concluding observation in the previous post:
The Devops Triangle and the corresponding Tradeoff Matrix demonstrate how governance a la Agile can be extended to devops projects as far as compliance goes. The proposed governance framework however is incomplete in the following sense: schedule in devops projects can be a much more granular and stringent constraint than schedule in “dev only” projects.
For the schedule constraint in devops, I propose a schedule set. It consists of four components:
- Lead Time or Engineering Time
- Time to change
- Time to deploy
- Time to roll back
Lead Time/Engineering Time: These are customary metrics used in Kanban software development, as demonstrated in Figure 3.
Figure 3: The Engineering Time Metric Used by the BBC (David Joyce in the LSSC10 Conference)
Time to change: The amount of time it takes for the various stakeholders (e.g., dev, test, ops, customer support) to review the code to be deployed, approve its deployment and assign a time window for the deployment.
Time to deploy: The amount of time from (metaphorically speaking) pushing the Deploy “button” to completion of deployment.
Time to roll back: The amount of time to undo a deployment. (Rigorous that the engineering practices and IT processes might be, the time to roll back a deployment can’t be ignored – it is a critical risk parameter).
A graphical representation of these four schedule metrics together with the Devops Triangle is given in the figure below:
Figure 4: The Devops Triangle with a Schedule Set
Using hours as the common unit of measure, a typical schedule set could be {100, 48, 3, 2}. In this hypothetical example, it takes a little over 4 days to carry out the development of the code increment; 2 days to get approval for the change; 3 hours to deploy the code; and, 2 hours to roll back.
Whatever your specific schedule numbers might be, it is highly recommended you apply value stream mapping (see Figure 5 below) to your schedule set. Based on the findings of the value stream mapping, apply statistical process control methods like those illustrated in Figure 3 to continuously improving both the mean and the variances of the four schedule components.
Figure 5: An Example of Value Stream Mapping (Source: Wikipedia entry on the subject)
The Devops Triangle
The Agile Triangles was introduced by Jim Highsmith as an antidote to the Iron Triangle. Instead of balancing development between cost, schedule and scope, the Agile Triangle strives to strike a balance between value, quality and constraints:
Figure 1 – The Agile Triangle (based on Figure 1-3 in Agile Project Management: Creating Innovative Products.)
Consider the Iron Triangle in the context of devops. Value, quality and constraints apply to IT operations as meaningfully as they apply to software development. IT can go beyond cost, schedule and scope to focus on value and quality just as the Agile software development team does. Between development and operations the specific tasks to be carried out change, but the principles embodies in the triangle remain invariant.
In addition to cost, schedule and scope, devops projects must cope with another constraint: compliance. For example, a bank that implements a ‘follow the sun’ strategy with respect to trading must finish reconciling transaction that took place in London before the start of trade in Wall Street. From the bank’s point of view, its IT department needs to be mindful of four constraints: compliance, cost, schedule and scope. This view is represented in Figure 2 below.
Figure 2 – The Devops Triangle
Balancing the four constraints – compliance, cost, schedule, and scope – is not a trivial task. However, just like the Agile Triangle, the Tradeoff Matrix used in Agile software development applies to IT. In its software development variant, the Tradeoff matrix is an effective tool to decide between conflicting constraints, as follows:
Table 1 – Tradeoff Matrix (based on Table 6-1 in Agile Project Management: Creating Innovative Products.)
For devops, the matrix is extended to include a compliance row and a Reluctantly Accept column as follows:
Table 2 – Tradeoff Matrix for Devops
The Devops Triangle and the corresponding Tradeoff Matrix demonstrate how governance a la Agile can be extended to devops projects as far as compliance goes. The proposed governance framework however is incomplete in the following sense: schedule in devops projects can be a much more granular and stringent constraint than schedule in “dev only” projects. The subject of schedule constraints in devops projects will be addressed in a forthcoming post.
The Agile Flywheel
Readers of The Agile Executive have been exposed to the “All In!” strategy used by Erik Huddleston to transform the software engineering process at Inovis and make it uniquely streamlined. In this post we follow up on the original discussion of the subject to explore the effect of Agile on IT Operations. As the title implies, Agile at Inovis served as a flywheel which created the momentum required to transform IT Operations and blend the best of Agile with the best of ITIL.
This guest post was written by Ray Riescher – a Six Sigma Black Belt, Agile evangelist and a business process change agent. Ray is currently responsible for business process management and IT governance at Inovis, a leading provider of business-to-business (B2B) e-commerce services, in Alpharetta, GA
Here is Ray:
When we converted to an Agile Scrum software methodology some 24 months ago, I never imagined the lessons I’d learn and the organizational change that would be driven by the adoption of Scrum.
I’ve lived by the philosophy that managing a business is managing its processes and that all of those processes, especially the operational processes, are interconnected. However, I don’t think I was fully prepared for effect Agile Scrum would have on our company operations.
We dove head first into Agile Scrum and adapted to it very quickly. However, it wasn’t until we landed a very large and demanding customer that Scrum was really put to the test. New enhancements, new features, and new configurations were all needed ASAP. Scrum delivered with rapid development and deployment in the form of releases that were moving into production with amazing velocity. Our release cadence hit warp drive and at one point we experienced several months where multiple teams’ production releases were deploying at the end of every two week sprint.
We’ve subscribed to the ITIL service support processes for Release, Change, Incident, Problem and Configuration Management. ITIL has served us well, giving us a common language and a clear understanding of process boundaries.
As the Scrum release cadence kicked in, the downstream ITIL processes had to keep up, adapt, and support the dynamics of rapid production changes. What happened was enlightening and maybe a bit ground breaking.
The Release Management process had to reassess its reliance on artifacts for gate keeping. The levels of sign offs had to be streamlined, the heavyweight deployment documentation had to be lightened, yet the process still had to control the production release to ensure deployment success. The rapidity of the release cycles meant that maintenance window downtime would be experienced too frequently by customers, so “rolling bounce” deployment strategies were devised and implemented.
Change requests could no longer wait for a weekly Change Management review board to approve and schedule the changes. Change management risk models had to be relied on for accurate detection of risky changes.
Early on in this dynamic environment, we weren’t quite as good as we needed to be and our Incident Management process was put to the test. Faster releases meant more opportunity for problems with service degradation and outages. This reality manifested itself more frequently than we’d ever experienced. Monitoring, detecting and repairing became paramount for environment stability and customer satisfaction.
What we found out was that we became very agile at this break/fix game. We developed a small team approach to managing incidents and leveraged the ITIL Problem Management process to rapidly perform root cause analysis. Once the true root cause was determined, a fix would be defined and deployed. Sometimes the fix was software related and went through the Scrum process, sometimes the fix was hardware related and went through the Configuration Management process, other times it was more operational and the fix took the form of training or corrections to procedural documentation.
The point is we’ve become agile across the entire IT spectrum. Whether it’s development via Scrum, the velocity with which we now operate our ITIL processes, or the integrated break/fix operational support processes, we are performing all of these with an agile mindset and discipline. We have small teams, working on priorities, and completing what needs to be completed now.
Scrum set the flywheel in motion and caused the rest of the IT process life cycle to respond. ITIL’s processes still form the solid core of service support and we’ve improved the processes’ capability to handle intense work velocity. The organization adapted by developing unprecedented speed in the ability to deliver production fixes and to solve root cause problems with agility.
What I think we are witnessing is a manifestation of Agile Business Service Management; a holistic agile methodology running across the IT process spectrum that’s delivering eye popping change and tremendous results.
OpsCamp Through an Internet-scale Lens
Like Agile Roots in Salt Lake City in June 2009, OpsCamp in Austin last week demonstrated how powerful grass roots conferences can be. We might not have had big names on the roster, but we sure had a productive dialog on the tricky issues lurking in the cusp between software development and IT operations in Cloud environments.
The conference has been amply covered by Michael Cote, John Willis, Mark Hinkle, and Damon Edwards (to name a few). This post restricts itself to commenting on one fundamental aspect of the cloud which IMHO does not get the attention it deserves. It might be implied in various discourses on the subject, but I believe it needs to be called out as a fundamental assumption for just about anything and everything one might consider doing with respect to the cloud. I am referring to economies of scale.
As pointed out in a forthcoming book on Cloud Computing by colleague and friend Annie Shum, the cloud phenomenon is fundamentally driven by substantial economies of scale in very large data centers. The operational costs of running such data centers are close to an order of magnitude lower than these prevailing in small and mid-sized data centers. User benefits are primarily derived from these compelling economies of scale.
I will be asking Annie to write a detailed guest post on the subject for readers of The Agile Executive. Until her post is published here, I would recommend we primarily consider the Cloud as a phenomenon that only becomes meaningful at scale. In particular, Private Clouds are not likely to yield Internet-scale efficiencies. Folks who regard their company’s conventional data center as a private cloud might be missing up on the ‘secret sauce’ of cloud computing.
The various agile system administration schemes discussed at the Austin OpsCamp are essential to attaining the requisite economies of scale in cloud services. Watch out for follow-on OpsCamps in other cities for developments to come in this all important space.
Agile Infrastructure
Ten years ago I probably would not have seen any connection between global warming and server design. Today, power considerations prevail in the packaging of servers, particularly those slated for use in large and very large data centers. The dots have been connected to characterize servers in terms of their eco foot print.
In his Agile Austin presentation a couple of days ago, Cote delivered a strong case for connecting the dots of Agile software development with those of Cloud Computing. Software development and IT operations become largely inseparable in cloud environments. In many of these environments, customer feedback is given “real time” and needs to be responded to in an ultra fast manner. Companies that develop fast closed-loop feedback and response systems are likely to have a major competitive advantage. They can make development and investment decisions based on actual user analytics, feature analytics and aggregate analytics instead of speculating what might prove valuable.
While the connection between Agile and Cloud might not be broadly recognized yet, the subject IMHO is of paramount importance. In recognition of this importance, Michael Cote, John Allspaw, Andrew Shafer and I plan to dig into it in a podcast next week. Stay tuned…
And Now the Bottle-neck is in Operations
In his forthcoming Agile Austin presentation, colleague and friend Michael Cote will be discussing velocity in Agile development vis-a-vis velocity in IT operations. To quote Cote:
Technologies used by public web companies and now cloud computing are looking to offer a new way to deliver applications by addressing deployment and provisioning concerns. Agile software development has sped up the actual development of software, and now the bottle-neck is in operations who’re not always able to deploy software at the same velocity that Agile teams ship code. What do these technologies look like, are they realistic, and how might they affect your organization?
The topic is important from a few perspectives, such as the new business models it enables. With Agile infrastructure, a closed loop is formed between vendor and customer. This loop operates on the basis of close to real-time feedback. The new functionality in the software deployed in the afternoon could be in response to a specific need that was brought up in the morning. Hence, the business focus and the business design change from software that has already been developed and tested (‘done done’) but not yet delivered, to one that has been developed, tested and deployed (‘done done done’) in ultra fast way.
It should also be pointed out that the line between developing content and developing software gets really blurry nowadays. From a company perspective both software and contents are entities that are being made available for dissemination. If you accept the premise that the generation of content and development of the corresponding software should be done under a unified Agile model, the desirability, the power and the benefits of managing development and delivery in unison become obvious. When applied to both content and software, an agile infrastructure paradigm could easily transform the publishing industry, and others.
In short, the business benefits Agile Infrastructure begets trump the (very significant) operational benefits it enables.
Three Criteria for Qualifying as Agile
Agile methods have been gaining popularity to the extent that one sees the term Agile used beyond the domain of software methods. Agile Infrastructure and Agile Business Service Management were used in this blog and elsewhere. Recently I have seen the term used in the domain of Business Process Management (BPM). For example, a presentations entitled Best Practices for Agile BPM will be delivered in the forthcoming Gartner Group Business Process Management Summit 2010.
I have no doubt the term Agile will be adopted in various fields. Using BPM as an example, I propose the following three criteria to differentiate between agile (small A) and Agile (capital A):
- Beyond software: A software team carrying out a BPM initiative might use Agile methods. This fact to itself does not suffice to make the initiative Agile BPM.
- Methodical specificity: Roles, forums/ceremonies and artifacts for the BPM initiative must be specified. Folks might be already applying Lean, TOC or other approaches to BPM, but a definitive Agile BPM method has not crystalized yet.
- Values: Adherence in spirit to the four principles of the Agile Manifesto. Replace the word “software” with “product” in the manifesto (just two occurences!) and you get a universal value statement that is not restricted to “just” software. It applies to BPM as well as to any other field in which products are produced and used.
You might be impressively agile in what you do but it does not necessarily make you Agile. The pace by which you do things must be anchored in a broader perspective that incorporates customers and employees. A forthcoming post entitled Indivisibility of the Principles of Operation will explore the connection between the Agile values (plural) you hold and the business value (singular) you generate.
The Agile Infrastructure cultural change problem – Agile Executive #06
To listen to this podcast, download the podcast directly, subscribe to the blog/podcast feed in iTunes (or whatever), or click play below to hear it:
As part two in our (planned to be) three part or so series, Israel Gat, Andrew Shafer, and I get back together to discuss the idea of Agile Infrastructure. See part one for an overview of what “Agile Infrustructure” aims to be in the first place.This time, we talk about the difficulty of cultural change to make a more Agile IT process possible. We spend much time on “motherhood and apple-pie” topics as always happens when it comes to discussions of organizational change management, but drawing on our experiences in both small and very large companies, we start to pull apart the tactics for not only implementing a change to Agile IT, but coping with the friction that occurs during any change, esp. something as dramatic as changing to a more Agile way of delivering software and business services.
Here’s a feel for what’s in the episode:
- I start out by asking how organizations start the process of changing to more Agile IT operations.
- Andrew says that change starting at a grass roots level, bottom-up, and is far from wide-spread.
- Israel speaks to one anecdote with 20% from the top and 80% from the bottom.
- What’s the process for change? What’s the context for trying to get people to change?
- Andrew says that people have to care – they must be interested in doing more than passing the time and getting paid for it, as I put it. Ideally, champions can start to “manage upwards” too, as needed. Also, having an expert on Agile available for internal assistance and answering questions is key.
- Israel adds in his “Agile Executive” view. He speaks to using people’s desire to do better, but figuring out a window (or time and environment) in which it’ll succeed.
- Using small successes to build the room to do large successes.
- This discussion leads Andrew to remember a talk by Lisa Crispin where the testers and others began to understand the “business,” or the larger context they were operating in.
- I ask Andrew and Israel why it’s important for IT employees to rise above being someone who “just works here.”
- We then discuss how changing to the more role-fluid nature of Agile conflicts with the static nature of jobs in organizations, where people are assigned a specific role and aren’t expected to go outside that role.
- When it comes to knowing how well you’re doing, Andrew introduces the Dunning-Kruger effect, wherein people are bad at evaluating themselves, esp. when their context is limited to their own group or organization.
- Reflect and adapt… making sure you do this in Agile. Israel connects this back to building confidence and moral in the organization as a way to enable change and Agile. This relates to one of those group psychology studies that’s always fascinating, namely, people preferring confidence over expertise.
Agile Infrastructure with Andrew Shafer – Agile Executive 004
To listen to this podcast, download the podcast directly, subscribe to the blog/podcast feed in iTunes (or whatever), or click play below to hear it:
As Israel alluded to last week, in this episode of the Agile Executive podcast, Israel and I talk with Reductive Lab’s Andrew Shafer. Put broadly, the topic is “Agile Infrastructure,” which kind of boils down to the connection between Agile development and the IT department, esp. in trying to get IT to be Agile itself. Here are some, admittedly, poor notes from the show:
After some brief introductory stuff, Andrew launches off: traditional operations has built up a resistance to change. “Change is what enables the business.” There’s an interesting discussion here of operations and infrastructure concerns being “non-functional requirements,” which are sort of second class citizens in some Agile practice.
Starting around this point, Andrew starts referring to talks from the recent Velocity conference. There’s special mention of the John Aspaw talk.
I ask Andrew and Israel where, beyond web companies, they see these practices happening or finding interest. Andrew admits that its only web companies that he sees applying this thinking, but analogizes it to early Agile, XP in particular, which had a small, narrow focus at first and then spread over 10 odd years to where it (Agile) is today. Israel weighs in with an example from a few years ago in the financial sector
Having talked about these ideas in abstract, we talk about some of the practices themselves:
- as mentioned about, treating your infrastructure like source code – something you can rebuild on-demand.
- automate your infrastructure – from bare metal to running services.
- capacity planning, but better, management and acquisition – e.g., rebuilding 60 machines from metal to production in a few hours at Digg.com vs. two full days or work.
Israel asks, is ITIL for the data-center like water-fall for development? Both Andrew and I weight in on how much water-fall you can buy into, making analogues to eat-the-whole-pie RUP. This also recalls a conversation I had on another podcast, the IT Management & Cloud podcast with Rob England, aka, The IT Skeptic, on the topic of CMDBs and ITIL.