Crisis Management Protocol

Crisis Management was developed as an extension to the Major Incident process.  The Crisis Manager provides role clarity, communication and facilitation during a crisis situation.  A Crisis, within the Major Incident process, is defined as a Critical, P1, service interruption impacting a Critical Business Application or Core Infrastructure Service (KB04806*).  These Incidents cause a significant disruption to the business and require separate procedures with increased communication, shorter timelines and greater urgency.  This section outlines the activities to be performed during a Crisis.

* login required

Application/Service Owner or Crisis Coordinator

Regardless of the reported source, once the Critical Major Incident ticket has been created, the Application/Service Owner, or Crisis Coordinator, will perform the following steps:

  1. Open the Crisis Bridge line
  2. Ensure the Service Desk notifies the Crisis Manager
  3. Gather the appropriate resources needed to troubleshoot and identify a resolution
  4. Work with the Crisis Manager to create communication updates at the top and bottom of the hour
  5. Work with the Technical Team to:
    1. Identify problem source
      1. Review change calendar
      2. Review logs
      3. Create diagram
      4. Engage vendor, if needed
      5. Update incident work notes with troubleshooting activities
    2. Organize & Implement Fix or Workaround
      1. Organize into steps
      2. Establish timeframe for each step
      3. Consult Crisis Manager if leadership decision is needed
      4. Update incident work notes with troubleshooting activities
    3. Test & Validate Fix or Workaround
      1. Test application/service yourself
      2. Check availability dashboard
      3. Ask Bridge participants to test and validate
      4. Update incident work notes with troubleshooting activities
    4. Resolve
      1. Collect log information
      2. Save configurations
      3. Update the incident notes with resolve time
  6. Confirm Resolution with Crisis Manager, TOC, University Service Desk and Healthcare Service Desk (if impacted)
  7. Document After Action Report (AAR), see KB05018* for more information

* login required

Crisis Manager

Once a text is received from the Service Desk informing of the Critical Major Incident, the Crisis Manager on-call will perform the following steps:

  1. Join the Crisis Bridge line
  2. Gather basic preliminary information on the Incident
  3. Send the Initial Alert message, via RAVE, using the template 'OIT Crisis: Initial Alert', to the pre-defined group lists, see Communication Guidelines
  4. Update the IT Status Page, using the same verbiage from the Initial Alert message
  5. Identify the Application/Service Owner, or designated representative, on the Bridge line that will lead/manage the technical troubleshooting effort (Crisis Coordinator)
  6. Work with the identified Crisis Coordinator to organize the following tasks:
    1. Define current state and create specific problem statement to focus the troubleshooting effort
    2. Create list of possible fixes or workarounds
    3. Select the most promising fix or workaround and establish a timeframe for work
    4. Consult with Executive Leadership (CIO & DCIOs) for decision guidance, when necessary
  7. At the top and bottom of every hour, repeat the following tasks until a resolution has been implemented and the Major Incident resolved:
    1. Consult Crisis Coordinator to craft status update
    2. Send the Update Alert message, via RAVE, using the template 'OIT Crisis: Half Hour Updates', to the pre-defined group lists, see Communication Guidelines
    3. Update the IT Status Page, using the same verbiage from the Update Alert message
  8. Once a final resolution is received from the Crisis Coordinator, perform the following tasks:
    1. Confirm Resolution with TOC, University Service Desk and Healthcare Service Desk (if impacted)
    2. Consult Crisis Coordinator to craft resolution update
    3. Send the Resolution Alert message, via RAVE, using the template 'OIT Crisis: Resolution Notice', to the pre-defined group lists, see Communication Guidelines
    4. Update the IT Status Page, and Resolve the Major Incident record using the same verbiage from the Resolution Notice Alert message
  9. Close the Crisis Bridge line
  10. Contact the Service Desk to inform them the Major Incident is resolved and the front end message can be removed

NOTE: RAVE* is the tool used to send SMS text messages to the pre-defined group lists. 

* login required