Delinea System Status

Delinea agrees to use its commercially reasonable efforts to make the Cloud Service generally available 99.9% of the time.
Current Status
Normal Operations
Degraded Operations
Partial Outage
Outage
Maintenance
% Delinea Platform Availability*

* Platform availability may not reflect customer availability.

Pod 29 Outage
Incident Report for Centrify Cloud Service
Postmortem

Pod29 Service Interruptions
Impact:
Beginning on 7/30/22 at approx. 06:15 AM UTC, pod29 began experiencing intermittent errors in operations requiring new connections to the backend storage system. This resulted in those operations failing and an overall service degradation.
Issue:
Due to an increase in application activity, periodic spikes were occurring in the number of connections to the storage system being requested by the front-end servers. This resulted in the maximum limit for connections being hit causing additional operations to fail. Monitoring on the number of storage connections paged the Operations staff for the issue. The initial investigation led Operations to believe there may be an underlying problem in the current primary storage node. A failover to the secondary storage node was initiated on 7/30 at 1:45 PM UTC. This appeared to resolve the problem since all the front end servers had to reconnect to the storage system clearing the connection alert. However, the alert triggered again the following day on 7/31 at 7:49 PM UTC. Operations staff again investigated and determined the issue was due to idle connections to the storage system not being cleaned up as expected.
Mitigation:
At 8:51 PM UTC on 7/31, Operations staff began a process of manually executing a command to kill idle connections to the storage system when the alert fired. This prevented impact to customers or reduced the time customers were impacted. Over the next several days, the alert went from firing 10-13 times a day to 3-4. Additionally, Operations engaged the Development team to investigate the cause for the connection leak and to provide an automated script to detect the issue and kill idle connections. Since the automation script was deployed to Production the alert has decreased even further to 0-2 times a day.
Resolution:
After investigation by the Development team, it was determined there is a bug in the 3rd party library used by the application for operations with the backend storage system resulting in a connection leak. This bug has existed for some time but had not manifested in our application until the increase in activity on 7/30. The bug has been fixed by the 3rd party vendor, however the fix requires a major version upgrade of the library. Due to the potential impact of making a major version change of the library, extensive application wide testing is being conducted before deploying the update to Production. The results of the testing will determine when the update can be deployed to Production.

Posted Sep 01, 2022 - 12:18 PDT

Resolved
Pod 29 began experiencing intermittent errors that required new connections to to the backend storage system which resulted in operation failures and service degradation,
Posted Jul 30, 2022 - 00:30 PDT