|| 45 mins
||System connectivity errors resulted in long wait times to login. The login issues also impacted navigation within the account for a period of the incident.
||Database Connection Lost
||A global database freeze caused system wide failures. The system was fully rebooted to restore service.
||Database locking resulted in blocked connections. This caused impairment for multiple accounts.
|| 24 mins
During a failover, the databases stalled connections to the application servers. The application servers became stuck waiting for the connections to recover, which resulted in a system outage. Additional health monitoring was put in place to prevent this from occurring. We also worked with the AWS team to account for an updated version of their failover process.
|| 8 mins
||A database failover issue, similar to the below, caused the application servers to become unresponsive. We escalated and released an update on Nov 4th, 2016 in an attempt to handled this process properly.
||A database failover experienced an unforeseen issue when one failed and another instance was promoted. While the original database attempted to recover, it began accepting connections and the application servers incorrectly read the machine as being available. This caused the servers to become unresponsive while waiting for timeout thresholds to expire. The servers did not recover and had to undergo full host replacement.