Major Incident Roles and Responsibilities
Service Desk Analyst (Tier 1)
Functions as the single point of contact (SPOC) between the Application/Service Owner and the user community.
Responsibilities
- Adhere to the Responsibilities detailed in the Incident Process
- Communicate appropriate updates to users contacting the Service Desk
- Establish a front end message to control call volume
- Answer user questions
- Receive report of Incident from Application/Service Owner, Crisis Manager or Users
- One of three roles with the authority to publish a notification relating to a Major Incident
- Initiates Crisis Protocol for any Incident where the Priority is 1-Critical, and the impact involves a Critical Business Application or Core Infrastructure Service
Application/Service Owner or Crisis Coordinator
Manages the lifecycle of all Major Incidents for the applications and services for which they are accountable. If unavailable during a Major Incident, the responsibilities can be delegate.
Responsibilities
- Monitor assigned Applications or Services
- Notify Service Desk when Major Incidents occur
- Analyze and identify possible problem sources to resolve incident
- Facilitate technical troubleshooting efforts and engages additional technical support as needed, including vendor support
- Liaison between Crisis Manager and the Technical Team during a Major Incident where the Priority is 1-Critical, and the impact involves a Critical Business Application or Core Infrastructure Service
- Provide half-hour updates on available workarounds
- Provide half-hour updates on estimated time to restore (ETA)
- Consult with Crisis Manager if a leadership decision is needed
- Document troubleshooting activities and resolution details in the Incident Work Notes, to provide an accurate timeline
- Document After Action Report (AAR) within 48 hours of Major Incident resolution where the Priority is 1-Critical, and the impact involves a Critical Business Application or Core Infrastructure Service
- See KB05018* for more information
- Document After Action Report (AAR) for non-Critical Applications or Services as directed by management or the Problem Manager
* login required
Crisis Manager
Provides role clarity, communication and facilitation during a Major Incident where the Priority is 1-Critical, and the impact involves a Critical Business Application or Core Infrastructure Service (Crisis)
Responsibilities
- Participate in weekly on-call rotation
- Respond timely to messages from Service Desk to join the Crisis Bridge
- Facilitate the Crisis Bridge Line
- Work with the Application/Service Owner or Crisis Coordinator to gather current details and craft messages for updates
- Manage communication during the lifecycle of the Crisis (see Communication Guidelines)
- Post update messages to the IT Status Page
- Text update messages to pre-determined group lists
- Ensure timely communication to the community
- Filter distractions that would hinder or slow down the efforts of the troubleshooting team
- Consult with Executive Leadership (CIO/DCIOs) for decision guidance, as necessary
- Resolve Major Incident record
- Inform Service Desk and community when resolution is implemented and service is restored
Incident Technician (Tier 2 or Tier 3)
Reports service interruptions to the Service Desk and Application/Service Owner, and first point of contact for service restoration
Responsibilities
- Adhere to the Responsibilities detailed in the Incident Process
- Monitor assigned Applications or Services
- Notify Service Desk when Major Incidents occur
- Investigate and diagnose Major Incident to restore failed Application or Service as quickly as possible
- Document troubleshooting steps and service restoration details for accurate timeline
- Performs responsibilities of Application/Service Owner, when assigned
Incident Manager
Escalation point, responsible for call and notification management by Tier 1.
Responsibilities
- Adhere to the Incident Manager Responsibilities detailed in the Incident Process Escalation point for Tier 1
Crisis Management Process Owner
Accountable for the Crisis Management process and maintains, designs and improves the process as necessary to achieve the objectives of the business.
Responsibilities
- Accountable for the overall quality of the process and oversees the management of and compliance with the procedures, data models, policies, and technologies associated with the process
- Owns the process and supporting documentation for the process from a strategic and tactical perspective
- Approves all changes to the process and development of process improvement plans
- Defines policies for the organization regarding the process
- Ensures that the process is fit for purpose
- Process Design
- Process Improvement
- Accountable for the overall process efficiency and effectiveness
- Ensures alignment of Key Performance Indicators (KPIs) to Critical Success Factors (CSFs) and that these objectives are realized
- Ensure the design of the Crisis Management process aligns with the business and industry best practices
- Works with the Process Owner for Incident Management to ensure both processes work in conjunction with each other
- Promote and reinforce adherence to the process and policies associated with both Incident Management and Crisis Management
- Works in conjunction with Continual Service Improvement (CSI)
RACI Matrix
Clear definition of accountability and responsibility is a critical success factor for any process. Without this step, functional staff can be unclear as to their roles and responsibilities within the process and revert back to how the activities were accomplished before.
To assist with this task the RACI model is used within this framework to indicate roles and responsibilities in relation to processes and activities.
- R - Responsible: Correct execution of process and activities
- A - Accountable: Ownership of quality and end result of process
- C - Consulted: Involvement through input of knowledge and information
- I - Informed: Receiving information about process execution and quality
Activity | Service Desk | Incident Manager | Technician | Application/Service Owner or Crisis Coordinator | Crisis Manager |
Create Incident Record | R2 |
| R1 | A |
|
Create Major Incident Record | R |
| C | A | C |
Update Major Incident Record/Timeline |
|
| R2 | A/R1 | I |
Update Status Page |
|
|
| C | A/R |
Update Front End Message | R | A |
| C | C |
Coordinate Bridge | I |
|
| R | A |
Update Senior Management and Key Contacts | I |
|
| C | A/R |
Page Technical Resources |
|
| R | A/R |
|
Monitor SLA’s |
|
| R | A/R |
|
Document AAR |
|
| R2/C | A/R1 |
|
Maintain/Follow-up on Action Items |
|
| C | A/R |
|