Problem Process Guidelines

Problem Management

The Problem Management process is used to identify the cause of related incidents. ITIL defines a problem as the cause of one or more incidents.

 The primary objectives of problem management are:

  1. To prevent problems and resulting incidents from occurring
  2. To eliminate recurring incidents
  3. To minimize the impact of incidents that cannot be prevented.

 In line with these objectives, Problem Management is broken into two distinct sub-processes:

  1. Reactive Problem Management – the goal of reactive problem management is to identify the root cause, or provide suitable workarounds, of known incidents
  2. Proactive Problem Management  – the goal of proactive problem management is to identify and eliminate the root cause of incidents, or provide suitable workarounds, in order to prevent their recurrence.

Problem Management includes the activities required to diagnose the root cause of incidents identified through the Incident Management process, and to determine the resolution to those problems.  It is also responsible for ensuring that the resolution is implemented through appropriate control processes such as Change Management.

Problem Management will also maintain information about the appropriate workarounds and Resolutions to problems, so that the number and impact of incidents can be reduced over time.  In this respect, Problem Management  has a strong interface with Knowledge Management, and tools such as the Known Errors Database (KEDB) will be used yo document workarounds and root cause.

Although Incident Management and Problem Management are separate processes, they are closely related and will typically use the same tools, and may use similar categorization, impact and priority coding systems.  This will ensure effective and consistent communication when dealing with related incidents and problems.

Inputs -  When to Create a Problem Record:

Service Owners and Service Providers create a Problem whenever an issue impacting service requires investigation and resolution.  A problem should always be raised when a Major Incident occurs.

 Some typical examples include:

  1. A Major Incident has occurred
  2. A pattern of recurring Incidents that an underlying cause should be addressed to improve the service
  3. An Event has occurred triggered by monitoring where an underlying cause should be addressed

Outputs

Root Cause

The underlying or original cause of an Incident or Problem

Known Error (KE)

Known Error articles are documented both in the Problem Record and as Known Error articles in the IT Service Management tool’s Knowledge Base.

Request for Change

Within problem management, the Request for Change (RFC) ticket will be the output for fixing errors where cause is known.

Resolution

The ITIL definition of resolution is: action taken to repair the root cause of an incident or problem, or to implement a workaround

The scope of problem management

Problem management has a very limited scope and includes the following activities:

  • Problem detection
  • Problem logging
  • Problem categorization
  • Problem prioritization
  • Problem investigation and diagnosis
  • Creating a known error record
  • Problem resolution and closure
  • Major problem review