GitHub Copilot incident
Incident with multiple GitHub services
Started
April 23, 2026 at 04:12 PM UTC
Duration
1h 18m
Resolved
April 23, 2026 at 05:30 PM UTC
Updates timeline
- Investigating
We are investigating reports of degraded availability for Copilot and Webhooks
- Investigating
We are investigating multiple unavailable services.
- Investigating
Actions is experiencing degraded performance. We are continuing to investigate.
- Investigating
We have identified the root problem and are working on mitigation.
- Investigating
The degradation affecting Actions and Copilot has been mitigated. We are monitoring to ensure stability.
- Investigating
Many services are mitigated and are validating the remaining services.
- Investigating
Webhooks is operating normally.
- Resolved
On April 23, 2026, between 16:03 UTC and 17:27 UTC, multiple GitHub services experienced elevated error rates and degraded performance due to DNS resolution failures originating from our DNS infrastructure in our VA3 datacenter. Approximately 5–7% of overall traffic was affected during the impact window: <br /><br />- Webhooks: ~0.35% of API requests returned 5xx (peak ~0.39%). ~0.88% of requests exceeded 3s latency; at peak, >3s responses represented ~10% of Webhooks API traffic. <br /><br />- Copilot Metrics: ~9% of Copilot Insights dashboard requests returned 5xx. <br /><br />- Copilot cloud agents: ~10% of cloud agent sessions were affected and failing. <br /><br />- Octoshift: 0.88% of active repo migrations failed and 79% saw elevated durations (avg. 5.2 min) during this period. <br /><br />- Git Operations: averaged 1.25% errors over the duration of the incident, with a peak of 2.07% errors. <br /><br />- Actions: Workflow run status updates experienced delays of up to ~8s over the duration of the incident window. <br /><br />Our DNS infrastructure in VA3 entered a degraded state and began intermittently returning NXDOMAIN responses and timing out on lookups for both internal service discovery and external endpoints. This caused a cascading impact across the dependent services listed above. <br /><br />We identified a specific load pattern under which our DNS resolvers began failing. The evidence points to a recently introduced traffic-balancing mechanism, rolled out progressively to support our growth, as the root cause. We have since reverted this change. <br /><br />We are immediately prioritizing investments in a more controlled rollout and validation process, including a dedicated environment to safely shadow production DNS traffic and detect these failure modes before they can affect production.
Live GitHub Copilot status
Current indicator + 24h latency
All incidents
Cross-service timeline
Subscribe via RSS
Atom feed for any reader