The Agile Executive

Making Agile Work

Posts Tagged ‘Service Support

A Devops Case Study

leave a comment »

An outline of my forthcoming Agile 2010 workshop was given in the post “A Recipe for Handling Cultural Conflicts in Devops and Beyond” earlier this week. Here is the case study around which the workshop is structured:

NotHere, Inc. Case Study

NotHere, Inc. is a $500M company based in Jerusalem, Israel. The company developed an eCommerce platform for small to medium retailers. Through a combination of this platform and its hosting data center, NotHere provides online store fronts, shopping carts, order processing, inventory, billing and marketing services to tens of thousands of retailers in a broad spectrum of verticals. For these retailers, NotHere is a one-stop “shopping” for all their online needs. In particular, instead of partnering with multiple companies like Amazon, Ebay, PayPal and Shopzilla, a retailer merely needs to partner with NotHere (who partners with these four companies and many others).

The small to medium retailers that use the good services of NotHere are critically dependent on the availability of its data center. For all practical purposes retailers are (temporarily) dead when the NotHere data center is not available. In recognition of the criticality of this aspect of its IT operations, NotHere invested a lot of effort in maturing its ITIL[i] processes. Its IT department successfully implements the ITIL service support and service delivery functions depicted in the figure below. From an operational perspective, an overall availability level of four nines is consistently attained. The company advertises this availability level as a major market differentiator.

In response to the accelerating pace in its marketplace, NotHere has been quite aggressive and successful in transitioning to Agile in product management, dev and test. Code quality, productivity and time-to-producing-code have been much improved over the past couple of years. The company measures those three metrics (quality, productivity, time-to-producing-code) regularly. The metrics feed into whole-hearted continuous improvement programs in product management, dev and test. They also serve as major components in evaluating the performance of the CTO and of the EVP of marketing.

NotHere has recently been struggling to reconcile velocity in development with availability in IT operations. Numerous attempts to turn speedy code development into fast service delivery have not been successful on two accounts:

  • Technical:  Early attempts to turn Continuous Integration into Continuous Deployment created numerous “hiccups” in both availability and audit.
  • Cultural: Dev is a competence culture; ops is a control culture.

A lot of tension has arisen between dev and ops as a result of the cultural differences compounding the technical differences. The situation deteriorated big time when the “lagging behind” picture below leaked from dev circles to ops.

The CEO of the company is of the opinion NotHere must reach the stage of Delivery over Development. She is not too interested in departmental metrics like the time it takes to develop code or the time it takes to deploy it. From her perspective, overall time-to-delivery (of service to the retailers) is the only meaningful business metric.

To accomplish Delivery over Development, the CEO launched a “Making Cats Work with Dogs[ii]” project. She gave the picture above to the CTO and CIO, making it crystal clear that the picture represents the end-point with respect to the relationship she expects the two of them and their departments to reach. Specifically, the CEO asked the CTO and the CIO to convene their staffs so that each department will:

  • Document its Outmodel (in the sense explored in the “How We Do Things Around Here In Order to Succeed” workshop) of the other department.
  • Compile a list of requirements it would like to put on the other group “to get its act together.”

The CEO also indicated she will convene and chair a meeting between the two departments. In this meeting she would like each department to present its two deliverables (world view of the other department & and the requirements to be put on it) and listen carefully to reflections and reactions from the other department. She expects the meeting will be the first step toward a mutual agreement between the two departments how to speed up overall service delivery.


[i] “Information Technology Infrastructure library – a set of concepts and practices for Information Technology Services Management (ITSM), Information Technology (IT) development and IT operations” [Wikipedia].

[ii] I am indebted to Patrick DeBois for suggesting this title.

© Copyright 2010 Israel Gat

The Agile Flywheel

with 8 comments

Readers of The Agile Executive have been exposed to the “All In!” strategy used by Erik Huddleston to transform the software engineering process at Inovis and make it uniquely streamlined. In this post we follow up on the original discussion of the subject to explore the effect of Agile on IT Operations. As the title implies, Agile at Inovis served as a flywheel which created the momentum required to transform IT Operations and blend the best of Agile with the best of ITIL.

This guest post was written by Ray Riescher – a Six Sigma Black Belt, Agile evangelist and a business process change agent. Ray is currently responsible for business process management and IT governance at Inovis, a  leading provider of business-to-business (B2B) e-commerce services, in Alpharetta, GA

Here is Ray:

When we converted to an Agile Scrum software methodology some 24 months ago, I never imagined the lessons I’d learn and the organizational change that would be driven by the adoption of Scrum.

I’ve lived by the philosophy that managing a business is managing its processes and that all of those processes, especially the operational processes, are interconnected.  However, I don’t think I was fully prepared for effect Agile Scrum would have on our company operations.

We dove head first into Agile Scrum and adapted to it very quickly. However, it wasn’t until we landed a very large and demanding customer that Scrum was really put to the test. New enhancements, new features, and new configurations were all needed ASAP.  Scrum delivered with rapid development and deployment in the form of releases that were moving into production with amazing velocity. Our release cadence hit warp drive and at one point we experienced several months where multiple teams’ production releases were deploying at the end of every two week sprint.

We’ve subscribed to the ITIL service support processes for Release, Change, Incident, Problem and Configuration Management. ITIL has served us well, giving us a common language and a clear understanding of process boundaries.

As the Scrum release cadence kicked in, the downstream ITIL processes had to keep up, adapt, and support the dynamics of rapid production changes.  What happened was enlightening and maybe a bit ground breaking.

The Release Management process had to reassess its reliance on artifacts for gate keeping. The levels of sign offs had to be streamlined, the heavyweight deployment documentation had to be lightened, yet the process still had to control the production release to ensure deployment success.  The rapidity of the release cycles meant that maintenance window downtime would be experienced too frequently by customers, so “rolling bounce” deployment strategies were devised and implemented.

Change requests could no longer wait for a weekly Change Management review board to approve and schedule the changes.  Change management risk models had to be relied on for accurate detection of risky changes.

Early on in this dynamic environment, we weren’t quite as good as we needed to be and our Incident Management process was put to the test.  Faster releases meant more opportunity for problems with service degradation and outages. This reality manifested itself more frequently than we’d ever experienced. Monitoring, detecting and repairing became paramount for environment stability and customer satisfaction.

What we found out was that we became very agile at this break/fix game. We developed a small team approach to managing incidents and leveraged the ITIL Problem Management process to rapidly perform root cause analysis. Once the true root cause was determined, a fix would be defined and deployed. Sometimes the fix was software related and went through the Scrum process, sometimes the fix was hardware related and went through the Configuration Management process, other times it was more operational and the fix took the form of training or corrections to procedural documentation.

The point is we’ve become agile across the entire IT spectrum. Whether it’s development via Scrum, the velocity with which we now operate our ITIL processes, or the integrated break/fix operational support processes, we are performing all of these with an agile mindset and discipline. We have small teams, working on priorities, and completing what needs to be completed now.

Scrum set the flywheel in motion and caused the rest of the IT process life cycle to respond.  ITIL’s processes still form the solid core of service support and we’ve improved the processes’ capability to handle intense work velocity. The organization adapted by developing unprecedented speed in the ability to deliver production fixes and to solve root cause problems with agility.

What I think we are witnessing is a manifestation of Agile Business Service Management; a holistic agile methodology running across the IT process spectrum that’s delivering eye popping change and tremendous results.