A Devops Case Study

An outline of my forthcoming Agile 2010 workshop was given in the post “A Recipe for Handling Cultural Conflicts in Devops and Beyond” earlier this week. Here is the case study around which the workshop is structured:

NotHere, Inc. Case Study

NotHere, Inc. is a $500M company based in Jerusalem, Israel. The company developed an eCommerce platform for small to medium retailers. Through a combination of this platform and its hosting data center, NotHere provides online store fronts, shopping carts, order processing, inventory, billing and marketing services to tens of thousands of retailers in a broad spectrum of verticals. For these retailers, NotHere is a one-stop “shopping” for all their online needs. In particular, instead of partnering with multiple companies like Amazon, Ebay, PayPal and Shopzilla, a retailer merely needs to partner with NotHere (who partners with these four companies and many others).

The small to medium retailers that use the good services of NotHere are critically dependent on the availability of its data center. For all practical purposes retailers are (temporarily) dead when the NotHere data center is not available. In recognition of the criticality of this aspect of its IT operations, NotHere invested a lot of effort in maturing its ITIL[i] processes. Its IT department successfully implements the ITIL service support and service delivery functions depicted in the figure below. From an operational perspective, an overall availability level of four nines is consistently attained. The company advertises this availability level as a major market differentiator.

In response to the accelerating pace in its marketplace, NotHere has been quite aggressive and successful in transitioning to Agile in product management, dev and test. Code quality, productivity and time-to-producing-code have been much improved over the past couple of years. The company measures those three metrics (quality, productivity, time-to-producing-code) regularly. The metrics feed into whole-hearted continuous improvement programs in product management, dev and test. They also serve as major components in evaluating the performance of the CTO and of the EVP of marketing.

NotHere has recently been struggling to reconcile velocity in development with availability in IT operations. Numerous attempts to turn speedy code development into fast service delivery have not been successful on two accounts:

  • Technical:  Early attempts to turn Continuous Integration into Continuous Deployment created numerous “hiccups” in both availability and audit.
  • Cultural: Dev is a competence culture; ops is a control culture.

A lot of tension has arisen between dev and ops as a result of the cultural differences compounding the technical differences. The situation deteriorated big time when the “lagging behind” picture below leaked from dev circles to ops.

The CEO of the company is of the opinion NotHere must reach the stage of Delivery over Development. She is not too interested in departmental metrics like the time it takes to develop code or the time it takes to deploy it. From her perspective, overall time-to-delivery (of service to the retailers) is the only meaningful business metric.

To accomplish Delivery over Development, the CEO launched a “Making Cats Work with Dogs[ii]” project. She gave the picture above to the CTO and CIO, making it crystal clear that the picture represents the end-point with respect to the relationship she expects the two of them and their departments to reach. Specifically, the CEO asked the CTO and the CIO to convene their staffs so that each department will:

  • Document its Outmodel (in the sense explored in the “How We Do Things Around Here In Order to Succeed” workshop) of the other department.
  • Compile a list of requirements it would like to put on the other group “to get its act together.”

The CEO also indicated she will convene and chair a meeting between the two departments. In this meeting she would like each department to present its two deliverables (world view of the other department & and the requirements to be put on it) and listen carefully to reflections and reactions from the other department. She expects the meeting will be the first step toward a mutual agreement between the two departments how to speed up overall service delivery.

[i] “Information Technology Infrastructure library – a set of concepts and practices for Information Technology Services Management (ITSM), Information Technology (IT) development and IT operations” [Wikipedia].

[ii] I am indebted to Patrick DeBois for suggesting this title.

Schedule Constraints in the Devops Triangle

Last week’s post “The Devops Triangle” demonstrated the extension of Jim Highsmith‘s Agile Triangle to devops. The extension relied on adding compliance to the three traditional constraints of software development: scope, schedule, cost. A graphical representation of this extension is given in Figure 1.

Figure 1: Compliance as the Fourth Constraint in Devops Projects

This blog post examines how time/schedule should be governed in the devops context. It does so by building on the concluding observation in the previous post:

The Devops Triangle and the corresponding Tradeoff Matrix demonstrate how governance a la Agile can be extended to devops projects as far as compliance goes. The proposed governance framework however is incomplete in the following sense: schedule in devops projects can be a much more granular and stringent constraint than schedule in “dev only” projects.

For the schedule constraint in devops, I propose a schedule set.  It consists of  four components:

  • Lead Time or Engineering Time
  • Time to change
  • Time to deploy
  • Time to roll back

Lead Time/Engineering Time: These are customary metrics used in Kanban software development, as demonstrated in Figure 3.

Figure 3: The Engineering Time Metric Used by the BBC (David Joyce in the LSSC10 Conference)

Time to change: The amount of time it takes for the various stakeholders (e.g., dev, test, ops, customer support) to review the code to be deployed, approve its deployment and assign a time window for the deployment.

Time to deploy: The amount of time from (metaphorically speaking) pushing the Deploy “button” to completion of deployment.

Time to roll back: The amount of time to undo a deployment. (Rigorous that the engineering practices and IT processes might be, the time to roll back a deployment can’t be ignored – it is a critical risk parameter).

A graphical representation of these four schedule metrics together with the Devops Triangle is given in the figure below:

Figure 4: The Devops Triangle with a Schedule Set

Using hours as the common unit of measure, a typical schedule set could be {100, 48, 3, 2}. In this hypothetical example, it takes a little over 4 days to carry out the development of the code increment; 2 days to get approval for the change; 3 hours to deploy the code; and, 2 hours to roll back.

Whatever your specific schedule numbers might be, it is highly recommended you apply value stream mapping (see Figure 5 below) to your schedule set. Based on the findings of the value stream mapping, apply statistical process control methods like those illustrated in Figure 3 to continuously improving both the mean and the variances of the four schedule components.

Figure 5: An Example of Value Stream Mapping (Source: Wikipedia entry on the subject)

Apropos has been Open Sourced

Erik HuddlestonWalter BodwellStephen Chin and I unveiled Apropos – the Agile Project Portfolio Scheduler – a month ago in the LSSC10 conference in Atlanta, GA. The system is now available as open source. Click here to go to the home page of the project and download the software. It will enable you to:

  • Synergies R&D with downstream organizations such as Operations, Professional Services, and Sales
  • Increase delivery value through organization-wide alignment of priorities
  • Achieve continuous improvement by whole process feedback loops
  • Gain realtime visibility into delivery status and potential blockages

The core concept of Apropos – multiple parallel feedback loops – is  demonstrated by the following process control diagram:

Figure 1: Process Control View of Apropos

Enjoy Apropos, benefit from it and please give us feedback!