Degraded performance for read operations on Dublin,Ireland shared environment
Incident Report for Mambu
Postmortem

Post Mortem - EuWest1 Production ReadReplica high CPU utilization

Summary

EuWest1 Read Replica experienced high CPU utilization on 18th of October 2019, between 13:00 and 14:20 UTC due to a high load generated by complex queries. Only one tenant was affected by this incident. During our investigations, we identified the affected endpoints and the queries that were causing high load. We pinpointed the cause of the issue: requests generated by one of the tenants. We reconfigured our databases to support the increased load, contacted the tenant to check their systems. Once the tenant has confirmed that they reduced the API rate, the incident was over.

What Are We Doing About This?

In order to avoid future incidents, we defined the following actions:

  • Create an alert based on logs / CloudWatch metrics to be triggered when the number of slow queries for a time frame is higher than a threshold in order to detect this kind of incidents proactively in future.
  • Reach out to the customers and outline that using certain API calls and/or with certain parameters can affect the performance of the whole environment in order to maintain the proper level of Mambu system performance.

At Mambu we take our commitment to deliver a high quality service very seriously and we sincerely apologise for the inconvenience this issue has caused. If you have any questions, please contact us via our usual support channels.

Posted Nov 05, 2019 - 16:27 UTC

Resolved
This incident has been resolved.
Posted Oct 18, 2019 - 17:45 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 18, 2019 - 16:50 UTC
Identified
The issue has been identified and we are currently working on a fix.
Posted Oct 18, 2019 - 14:27 UTC
Investigating
Mambu has become aware of a situation affecting some tenants regarding GET operations. Users may experience latency when inquiring information on loan accounts. We are currently investigating the root cause and will update you when have identified it.
Posted Oct 18, 2019 - 13:56 UTC
This incident affected: Mambu Production (Dublin, Ireland).