55 private links
Post-Mortem über einen Incident mit dem Ceph-Cluster bei Uberspace, verursacht durch eine Spannungsschwankung.
Incident Report on Fastly's service disruption this week.
The Epic Games Reliability Engineering team did a post-mortem on a certificate expiration issue they recently experienced.
In this post, Michael Prokop does an in-depth post-mortem on the outage of a Proxmox hyper-converged Ceph cluster.
Short post-mortem on a Grafana Cloud Prometrheus outage.
A postmortem by Slack on the outage from the beginning of January, caused by the overload of one of their AWS transit gateway.
This post mortem explains how a change in the quota system has lead to an outage of customer-facing services requiring Google OAuth.
Five days ago, the internet had a conniption. In broad patches around the globe, YouTube sputtered. Shopify stores shut down. Snapchat blinked out. And millions of people couldn’t access their Gmail accounts. The disruptions all stemmed from Google Cloud, which suffered a prolonged outage—which also prevented Google engineers from pushing a fix.
A compilation of startup failure post-mortems by founders and investors.