Vivek Dubey

Vivek Dubey

IT Consultant | Traveller | Interested in worldly affairs

The Process called Problem Management

The intricacies of the process called Problem Management

5 minute read

One of the aims of Problem Management is to identify and manage the root causes of Incidents. Once we have identified the causes we could decide to remove these problems to prevent further users from being affected.

Obviously, all Problems are different in nature and managing a process in a particular direction is always a challenge.

From an organisation perspective, when talking about Problem Management it helps to have a good definition of one’s system. There are many possible causes of incidents that could affect your users including:

  • Software components
  • Services – in-house and outsourced
  • Policies, procedures and governance
  • Security controls
  • Documentation and Training materials

Any of these components could cause incidents for a user. Consider the idea that incorrect or misleading documentation would cause an incident. A user may rely on this documentation and make assumptions on how to use a service, discover they can’t and contact the service desk. This documentation component has caused an Incident and would be considered the root cause of the problem.

The Problem Management process flow is the sequence of steps that are followed to handle a problem. Steps are:

  1. Inputs to the process: The inputs to Problem Management can come from a number of sources. These include Incident Management, Event Management and the Service Desk. Additionally, proactive Problem Management may identify Problems. Suppliers and other processes such as Release Management, Capacity Management and Availability Management may also become aware of Problems.

  2. Problem detection: Problems can be detected in many ways. The Service Desk may believe that one or more Incidents are being caused by a particular problem. Second-line support areas may identify a Problem when conducting Incident handling. Problems can also be detected automatically by the Service Management tools in use. Proactive Problem Management will identify problems often before any Incidents occur. Likewise, other processes such as Release Management and Availability Management will become aware of Problems.

  3. Problem logging: It is crucial that the full details of the Problem are recorded. This will allow analysis to take place and will enable comparisons to be made between Problems. All Incidents caused by the Problem should be linked to the Problem record allowing the scope and scale of the impact to be ascertained easily. The date and the time that all Problems are logged must be recorded within the Problem record.

  4. Problem categorization: It is important to categorize Problems and it is recommended that the same system is used as adopted by the Incident Management process for any particular organization. Correct and meaningful categorization will allow helpful metrics to be produced and enable proactive Problem Management to identify areas to concentrate on.

  5. Problem prioritization: Problems should be prioritized in the same way as Incidents. For eg: A High urgency problem raised by senior management will be priority 1. Most IT Service companies have a matrix that they use to arrive at the priority of the problem taking multiple factors into consideration. They would be as below:

    • Impact of the Problem
    • The urgency of the Problem
    • Who raised it

    Target Problem resolution times will be assigned to each Priority level. These will have been agreed with the business and recorded in the SLA. One factor that will feed into the impact and urgency of Problems is the rate of reoccurrence.

  6. Problem investigation and diagnosis: The aim of the investigation and diagnosis phase is to ascertain the root cause of the problem. The priority allocated to the Problem should drive the number of resources working on the investigation and diagnosis. Priority should be reassessed during the lifetime of the Problem to ensure that it remains correct. Problem Records must remain open when a workaround has been identified and the workaround should be detailed in the Problem record. Permanent fixes should still be progressed. However, there may be reasons why workarounds remain in place for some time. These reasons include:

    • A permanent fix is too risky
    • A permanent fix is too costly
    • The business impact of the problem is not significant enough to justify further diagnosis at this time
    • The problem will be permanently fixed by a new Release that is currently being planned
  7. Raising a Known Error Record: The Known Error Database is an important source of information for the Service Desk and Support groups handling Incidents and Problems. A Known Error Record should be raised when the diagnosis has been completed and especially when a workaround has been identified.

  8. Problem resolution: Once a permanent fix has been identified, it should be implemented as soon as possible. However, there may be good reasons why this is not possible. The reasons are similar to the reasons why organizations live with workarounds and include cost and risk. Additionally, an immediate fix may require a service outage which is not justifiable in the short term. A Request For Change (RFC) should be raised and progressed for any required Change identified.

  9. Problem closure: Problem Records should be closed once a Change has successfully been applied. It is important that the Problem Record stays open until it is certain that the problem has been resolved. Checking that the problem has been resolved should be undertaken via testing. It may take some time to ensure that a fix has been successful, for example, it may be the next time a particular process is used such as the end of the day, end of the month, end of the quarter, year-end or end of the tax year.

  10. Major Problem Review: Whenever a Major Problem has occurred, a Major Problem Review should be undertaken. Each organization will have its own definition of a Major Problem based on the impact and urgency. It is crucial that these reviews look at lessons learnt rather than becoming ‘allocation of blame’ sessions. The output from a Major Problem Review ought to include what went well, what went badly, what could be done better in the future, how could the Problem have been prevented and how could the impact of the Problem have been reduced.

Say something

Comments

Recent posts

Categories

About

IT Consultant | Traveller | Interested in worldly affairs