Secret Server Cloud US Issues
Incident Report for Delinea
Postmortem

Title:
Secret Server Cloud (SSC) Issues in East US starting on April 3rd, 2024.

Impact:
Beginning on April 3rd, 2024, at approximately 7:52:40 PM ET, SSC customers would have started experiencing issues accessing their systems.

Issue:
On April 3rd, 2024, at approximately 7:52:40 PM ET, Azure SQL Databases in East US started experiencing issues. New connections to databases in this region may have resulted in an error or timeout. Existing connections remained available to accept new requests, however if those connections were terminated and re-established, they may have failed. Microsoft suspects that a potential deployment on their frontend gateways caused SQL database availability issues impacting customer connectivity. Microsoft worked to mitigate the issue by stopping the active rollout of their deployment.

Resolution:
At approximately 7:54 PM our monitoring systems started alerting regarding an issue. After initial triage, we estimated that roughly 25% of US based SSC customers were experiencing issues. At approximately 8:26 PM ET,  SC failover was initiated. However, due to an issue with the automated tool to perform failover, we began to manually perform failover, which increased our response and resolution time. At approximately 10:36 PM ET the  failover had completed, and all SSC systems were operational. Our engineers monitored the Azure incident and waited for it to be fully resolved before closing the incident on our side. At approximately 11:29 PM ET on April 3rd, 2024, the Azure team declared the incident mitigated. At approximately 3:56 AM ET on April 4th, 2024, a rollback to restore functionality in the primary East US region for SSC was completed.

Action Items:
1.    Investigate issues with tooling that facilitates automated failover.

To address this and prevent future occurrences:

IN PROGRESS – Improve failover automation tooling and capabilities.

Incident Start Time: April 3rd, 2024, 07:52:40 PM ET
Incident End Time: April 3rd, 2024, 11:29 PM ET

Posted Apr 05, 2024 - 14:49 EDT

Resolved
We’re pleased to inform you that the incident affecting the service(s) listed below has been resolved.
All systems are now operating normally.

We apologize for any inconvenience this incident may have caused, and we appreciate your understanding and support.

For any questions or concerns, please reach out to our support team at https://support.delinea.com.
Posted Apr 04, 2024 - 00:51 EDT
Monitoring
We wanted to provide you with an update on the incident affecting the service(s) listed below.

A subset of customers are coming back online. We are continuing to monitor the incident and will provide updates as warranted.

For any questions or concerns, please reach out to our support team at https://support.delinea.com.
Posted Apr 03, 2024 - 21:43 EDT
Identified
We wanted to provide you with an update on the incident affecting the service(s) listed below.

Our team has been working diligently to address the issue, and we have made significant progress.
We appreciate your continued patience and support as we work to fully restore normal service.

For any questions or concerns, please reach out to our support team at https://support.delinea.com.
Posted Apr 03, 2024 - 21:36 EDT
Investigating
We are currently investigating issues in Secret Server Cloud US
Posted Apr 03, 2024 - 20:30 EDT
This incident affected: US (Secret Server Cloud).