GitHub Copilot incident

Incident with multiple GitHub services

Critical Resolved Upstream link ↗

Started

April 23, 2026 at 04:12 PM UTC

Duration

1h 18m

Resolved

April 23, 2026 at 05:30 PM UTC

Updates timeline

  1. Investigating

    We are investigating reports of degraded availability for Copilot and Webhooks

  2. Investigating

    We are investigating multiple unavailable services.

  3. Investigating

    Actions is experiencing degraded performance. We are continuing to investigate.

  4. Investigating

    We have identified the root problem and are working on mitigation.

  5. Investigating

    The degradation affecting Actions and Copilot has been mitigated. We are monitoring to ensure stability.

  6. Investigating

    Many services are mitigated and are validating the remaining services.

  7. Investigating

    Webhooks is operating normally.

  8. Resolved

    On April 23, 2026, between 16:03 UTC and 17:27 UTC, multiple GitHub services experienced elevated error rates and degraded performance due to DNS resolution failures originating from our DNS infrastructure in our VA3 datacenter. Approximately 5–7% of overall traffic was affected during the impact window: <br /><br />- Webhooks: ~0.35% of API requests returned 5xx (peak ~0.39%). ~0.88% of requests exceeded 3s latency; at peak, >3s responses represented ~10% of Webhooks API traffic. <br /><br />- Copilot Metrics: ~9% of Copilot Insights dashboard requests returned 5xx. <br /><br />- Copilot cloud agents: ~10% of cloud agent sessions were affected and failing. <br /><br />- Octoshift: 0.88% of active repo migrations failed and 79% saw elevated durations (avg. 5.2 min) during this period. <br /><br />- Git Operations: averaged 1.25% errors over the duration of the incident, with a peak of 2.07% errors. <br /><br />- Actions: Workflow run status updates experienced delays of up to ~8s over the duration of the incident window. <br /><br />Our DNS infrastructure in VA3 entered a degraded state and began intermittently returning NXDOMAIN responses and timing out on lookups for both internal service discovery and external endpoints. This caused a cascading impact across the dependent services listed above. <br /><br />We identified a specific load pattern under which our DNS resolvers began failing. The evidence points to a recently introduced traffic-balancing mechanism, rolled out progressively to support our growth, as the root cause. We have since reverted this change. <br /><br />We are immediately prioritizing investments in a more controlled rollout and validation process, including a dedicated environment to safely shadow production DNS traffic and detect these failure modes before they can affect production.

Live GitHub Copilot status

Current indicator + 24h latency

All incidents

Cross-service timeline

Subscribe via RSS

Atom feed for any reader