We have confirmed that all systems are stable and operating normally. The networking configuration has been fully restored across all regions, and we have implemented additional monitoring and safeguards to prevent recurrence.
We will continue to monitor systems closely over the coming days as part of our standard post-incident procedures, and share an RCA as soon as possible.
Resolved
We have confirmed that all systems are stable and operating normally. The networking configuration has been fully restored across all regions, and we have implemented additional monitoring and safeguards to prevent recurrence.
We will continue to monitor systems closely over the coming days as part of our standard post-incident procedures, and share an RCA as soon as possible.
Monitoring
We have completed the reset of the networking stack across all regions and continue to monitor all systems closely.
Monitoring
We believe this is related to a known bug in the version of our networking stack currently in use and are mitigating the issue by resetting this layer across all AWS regions. The team continues to monitor all systems closely.
Monitoring
Additional configuration changes have been applied to prevent the recurrence of this issue. The team continues to monitor the systems.
Monitoring
The team mitigated the issue in affected regions and continues to investigate the root cause.
Investigating
We are investigating an elevated error rate in AWS eu-central-1 and ap-southeast-1.
Resolved
All components are operational after the restart of an internal component (Cilium) in the affected regions. The team is working on an RCA.
Investigating
We are investigating an elevated error rate in AWS us-east-1 and eu-west-2.
Resolved
This issue is now fixed.
Identified
Our engineering team has identified the root cause of the service disruption affecting ClickHouse some instances in US-EAST-1. We are currently deploying a fix across the region. ClickHouse instances should begin resuming normal operations within the next 10-30 minutes. We will continue to monitor closely and provide updates as services are restored.
Investigating
We are investigating a partial outage affecting services in the AWS region us-east-1.
Availability metrics are reported at an aggregate level across all tiers and error types.
Individual customer availability may vary depending on their workload, autoscaling settings and API features in use.