The Client
A $7 billion health insurance company in the Northeast with a new CIO that needed
to understand his organization and build trust with the other senior executives. A widely varied technology environment
with a combination of in-house and outsourced systems, processing, and support was difficult to measure and distill into meaningful
status and performance.
The Challenge
The technology environment at the client was complex and diverse, and systems
availability was not up to standard. IT was measuring availability in the classic sense of uptime by system, but this didn't
reflect the end-user's reality in day-to-day operations. Stressing transparency to his peers, as well as a business
focus for his own organization, the CIO wanted a clear set of measurements to understand, measure, and communicate the business
impact of systems performance.
How We Helped
IT shops tend to measure what they can measure, and frequently do not fully comprehend
how they impact the business. High availability of systems is critical to productive operations. But, with so many moving
parts of web layers, application layers, database layers, messaging infrastructures and so on to deliver availability to the
end user, "uptime" of an individual component is nearly meaningless. Non-IT executives do not want to hear about 99%
availability when a major system experiences an outage at the critical moment. Managers were accountable for their own components,
but no one was accountable for the end-to-end, the end user perspective.
The first step was to translate systems performance into business language, something
that all executives could appreciate and understand. Through a rigorous business process analysis, we identified the major
outputs of each business process (e.g., "Processed Claim") and how many users performed each process. This led to the
definition of "workhours disrupted" , or how many users could not function at full productivity due to system availability,
and "transactions disrupted", or how many business outputs could not be produced.
The second step was to build a model to translate system outage and response time
to business impact. This involved identifying the systems required for each business transaction and linking those systems
to each business process. Simple in concept but complex in the details, the accuracy of the model required careful design
that fully accounted for the timing of system outages or slow response times. When did the outage occur? Was it during
business hours or peak business hours? How many people needed the system when the outage occurred? What other systems were
affected (e.g., a network outage can affect other components)?
The final step was to develop a graphical presentation of the results to track
progress in stabilizing the systems environment. Although straightforward, this was eye-opening and actually revealed causality
that was not previously recognized.
Success
The client currently measures the business impact of over 60 business processes
supported by over 30 systems and infrastructure components. Data entry is simple - just log the system and the date
and time of the outage - the model does the rest. Not only can the CIO and his peers understand enterprise-level business
impact, but they can now pinpoint root causes and determine the impact of the root cause. This has allowed the CIO to focus
his resources on solving the real problems, not just reacting to whoever screams the loudest.