Crisis Management Protocol
Crisis Management was developed as an extension to the Major Incident process. The Crisis Manager provides role clarity, communication and facilitation during a crisis situation. A Crisis, within the Major Incident process, is defined as a Critical, P1, service interruption impacting a Critical Business Application or Core Infrastructure Service (KB04806*). These Incidents cause a significant disruption to the business and require separate procedures with increased communication, shorter timelines and greater urgency. This section outlines the activities to be performed during a Crisis.
* login required
Application/Service Owner or Crisis Coordinator
Regardless of the reported source, once the Critical Major Incident ticket has been created, the Application/Service Owner, or Crisis Coordinator, will perform the following steps:
- Open the Crisis Bridge line
- Ensure the Service Desk notifies the Crisis Manager
- Gather the appropriate resources needed to troubleshoot and identify a resolution
- Work with the Crisis Manager to create communication updates at the top and bottom of the hour
- Work with the Technical Team to:
- Identify problem source
- Review change calendar
- Review logs
- Create diagram
- Engage vendor, if needed
- Update incident work notes with troubleshooting activities
- Organize & Implement Fix or Workaround
- Organize into steps
- Establish timeframe for each step
- Consult Crisis Manager if leadership decision is needed
- Update incident work notes with troubleshooting activities
- Test & Validate Fix or Workaround
- Test application/service yourself
- Check availability dashboard
- Ask Bridge participants to test and validate
- Update incident work notes with troubleshooting activities
- Resolve
- Collect log information
- Save configurations
- Update the incident notes with resolve time
- Identify problem source
- Confirm Resolution with Crisis Manager, TOC, University Service Desk and Healthcare Service Desk (if impacted)
- Document After Action Report (AAR), see KB05018* for more information
* login required
Crisis Manager
Once a text is received from the Service Desk informing of the Critical Major Incident, the Crisis Manager on-call will perform the following steps:
- Join the Crisis Bridge line
- Gather basic preliminary information on the Incident
- Send the Initial Alert message, via RAVE, using the template 'OIT Crisis: Initial Alert', to the pre-defined group lists, see Communication Guidelines
- Update the IT Status Page, using the same verbiage from the Initial Alert message
- Identify the Application/Service Owner, or designated representative, on the Bridge line that will lead/manage the technical troubleshooting effort (Crisis Coordinator)
- Work with the identified Crisis Coordinator to organize the following tasks:
- Define current state and create specific problem statement to focus the troubleshooting effort
- Create list of possible fixes or workarounds
- Select the most promising fix or workaround and establish a timeframe for work
- Consult with Executive Leadership (CIO & DCIOs) for decision guidance, when necessary
- At the top and bottom of every hour, repeat the following tasks until a resolution has been implemented and the Major Incident resolved:
- Consult Crisis Coordinator to craft status update
- Send the Update Alert message, via RAVE, using the template 'OIT Crisis: Half Hour Updates', to the pre-defined group lists, see Communication Guidelines
- Update the IT Status Page, using the same verbiage from the Update Alert message
- Once a final resolution is received from the Crisis Coordinator, perform the following tasks:
- Confirm Resolution with TOC, University Service Desk and Healthcare Service Desk (if impacted)
- Consult Crisis Coordinator to craft resolution update
- Send the Resolution Alert message, via RAVE, using the template 'OIT Crisis: Resolution Notice', to the pre-defined group lists, see Communication Guidelines
- Update the IT Status Page, and Resolve the Major Incident record using the same verbiage from the Resolution Notice Alert message
- Close the Crisis Bridge line
- Contact the Service Desk to inform them the Major Incident is resolved and the front end message can be removed
NOTE: RAVE* is the tool used to send SMS text messages to the pre-defined group lists.
* login required