High response times on CMA. Admin interface not loading. CDA partially impacted.

Incident report for DatoCMS

Resolved

Our upstream partner published their post-mortem (Root Cause Analysis (RCA)) about this downtime. You can read the full post here: https://status.heroku.com/incidents/2684

Summary: On October 8, 2024 from 16:27 Coordinated Universal Time (UTC) to 22:10 UTC, and again on October 9, 2024 from 13:06 UTC to 16:36 UTC, a Distributed Denial of Service (DDoS) caused routing failures in a single partition of Heroku’s Common Runtime infrastructure in the EU Region. This resulted in increased error rates and latencies for some clients connecting to some customer applications hosted in that partition. The Salesforce Technology team worked with our upstream infrastructure provider to mitigate the immediate impact of this event, and to put additional network-level protections in place to improve resilience.

Posted at Oct 23, 14:22 GMT+00:00

Resolved

Heroku resolved the issue: https://status.heroku.com/incidents/2685

We are still waiting for their post-mortem message about these two incidents.

Posted at Oct 9, 16:48 GMT+00:00

Identified

The upstream provider is working on a fix

Posted at Oct 9, 14:51 GMT+00:00

Investigating

The upstream provider is currently investigating the issue. This is their new status page issue: https://status.heroku.com/incidents/2685

Posted at Oct 9, 13:47 GMT+00:00

Investigating

It looks like Heroku is having the same problem again. We have already contacted them.

Posted at Oct 9, 13:10 GMT+00:00

Resolved

We are waiting for Heroku's post mortem in order to collect more information about the downtime

Posted at Oct 9, 07:53 GMT+00:00

Resolved

The upstream issue has been resolved and services are fully restored. Their last update:

Starting at 4:27 PM UTC on October 8th , 2024, customers experienced increased error rates and latencies on customer applications hosted on common runtime in EU Region. Heroku engineers investigated and mitigated the impact at 9:52 PM UTC on October 8th , 2024. All application have fully recovered by 10:10 PM UTC.

Posted at Oct 8, 22:59 GMT+00:00

Monitoring

Services (CMA, UI, CDA) appear to be coming back up and are accessible at normal performance again.

However, we still have not yet received an official confirmation of resolution from the upstream provider. We will continue to monitor until an official resolution is announced.

Posted at Oct 8, 22:12 GMT+00:00

Identified

The upstream provider is still investigating the issue. We will continue to post updates here, or you can see their status directly at https://status.heroku.com/incidents/2684

Informational note: Already-cached Content Delivery API (GraphQL) requests do not seem to be affected. Uncached CDA queries may or may not be impacted, depending on the specific query and region.

We are still awaiting a resolution and will continue to monitor the situation.

Posted at Oct 8, 19:00 GMT+00:00

Identified

The upstream provider identified an issue on their infrastructure: https://status.heroku.com/incidents/2684

Posted at Oct 8, 18:11 GMT+00:00

Investigating

We are still investigating the issue, and have contacted an upstream provider. We will update again once we have more information.

Posted at Oct 8, 18:08 GMT+00:00

Investigating

We are currently investigating the issue

Posted at Oct 8, 17:06 GMT+00:00