N.Virginia2 replication outage

Incident Report for Mambu

Resolved

Upon further investigation in collaboration with AWS, we found that the MultiAZ failover was caused by an underlying host failure, which led to an automated failover to a healthy host.
This occurred because the original host was unresponsive to the communication system - while such incidents are rare, they can happen due to hardware-related issues.
We have been assured that AWS's automated systems promptly detect and resolve such issue to minimize impact.

Furthermore, we have re-created the ReadReplica at the same time we made contact with AWS, who in turn have reinforced that is best course of action in these situations.

Thank you for your for your patience and understanding.

Posted Jan 13, 2025 - 10:59 UTC

Update

We have determined the replication was interrupted due to a MultiAZ failover, we're working with AWS to gain further insights and will share further updates accordingly.

Posted Jan 09, 2025 - 10:52 UTC

Monitoring

Please note we have identified a RR replication outage on N.Virginia2 shared environment that took place between 08:35 UTC and 09:12 UTC on Jan-9 2025.
We have successfully mitigated impact and are currently monitoring to ensure environment health, while also investigating for the exact root cause in parallel.

Posted Jan 09, 2025 - 09:32 UTC

This incident affected: Mambu Production 2 (N. Virginia, USA).