Recently there has been a hot debate, an ITIL forum thread of approximately 150 posts, over whether a Major Incident should automatically be classified as a Problem. It seemed that the voices advocating the creation of a Problem record for every Major Incident were more assertive and enthusiastic, to put it nicely. I’m sure my RightStar colleagues would often find me guilty of getting lost in the weeds of ITIL theory, but for this issue, I believe I’m more apt to consider what happens in real life for our organization and for our customers. I’ll share with you here some of my own thoughts and conclusions.
First we’ll look at textbook ITIL definitions. Many organizations and IT managers are still sorting out the difference between Incidents and Problems. The example I typically give is that of a user calling about a network printer issue. The user is unable to print to the network printer. What should the Service Desk analyst do? Well, since the Service Desk analyst is supposed to be responsible for Incident Management, by definition the analyst should be focused on restoring the service (in this case, printing) as quickly as possible. The analyst should follow standard troubleshooting procedures within predefined timelines. But if he or she is unable to resolve the Incident, the user’s system may need to be routed to an alternate printer. When the Service Desk analyst confirms that the user can print to the alternate printer successfully, the Incident record can be closed. An Incident is then simply defined as a disruption to normal service.
But of course the printer is still broken. This is where Problem Management steps in. The Service Desk analyst should create and link a Problem record for the printer issue to the Incident record and assign the Problem to the appropriate manager or group. The Problem Management process will be engaged to determine the root cause, develop or document workarounds (such as routing to an alternate printer) and ultimately provide a fix, whether it be to clear a printer jam or send the printer in for warranty repair. A Problem is defined as the cause of one or more Incidents, and the Problem Management process assumes that the cause of the Problem is not known when the Problem record is created.
Getting back to the debate and how that relates to real life for most of the organizations we’ve worked with—after reading some of the posts in the forum, I thought of an example of a Major Incident that doesn’t fit the criteria or definition of a Problem. What about a power outage? For smaller organizations especially, it’s not practical to have a fully redundant collocated data center for all of their services. And even if the data center is not impacted, there may be remote sites that need assistance with computer-related issues because the communication to headquarters or their own computing capabilities are compromised.
In this scenario, the power outage is the cause of one or more Incidents or disruptions to service. But we cannot assume that the cause of the Problem is not known. There was a storm, the lights are out and we should know why the computer won’t turn on. No root cause analysis is necessary. And the organization doesn’t have the influence to negotiate with the power company or with the weather if that was the cause of the outage. I would therefore argue that this situation would warrant the creation of a Major Incident, which is defined as an Incident that results in significant disruption to the organization. But it does not require the creation of a Problem record or the engagement of the Problem Management process. As long as alternate manual procedures are standardized and followed, the relevant information for handling the outage should reside within the Incident Management process.
It makes sense for the Major Incident record to be linked to the other Incident records so that IT can measure the impact of the outage. We would also like a mechanism that will allow us to share the information with the support staff and other customers as necessary. For users of BMC Service Desk Express, what should come to mind is the White Board notices module. This module, under the heading of “Crisis Management,” has many advantages. First of all, the sharing of information is facilitated by the scrolling White Board Ticker marquee. Individual records can be segmented to limit the display to a specific group or to include the display for the Self Service interface. Secondly, subsequent Incidents that are linked to a White Board notice can be automatically populated and updated with information from the White Board notice. Finally, when a White Board notice is closed, all related Incidents can automatically be closed with it, and a business rule will fire to send email notifications to affected users. I would even recommend renaming the “White Board” module the “Major Incident” module.
As always, ITIL is not a framework to be swallowed whole. It was developed to provide guidance for IT service management best practices and to be adapted to each organization’s needs. I expect that we’ll continue to debate the intentions of the ITIL authors. Ultimately, our goal is to provide better service and to deliver solutions that make sense.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5