An outage affecting Cloudflare’s 1.1.1.1 DNS resolver service stemmed from a Border Gateway Protocol (BGP) hijack paired with a route leak. The issue disrupted 300 networks across 70 countries, but many users remained unaffected overall.
The Border Gateway Protocol (BGP) is kind of like the internet’s postal service. It’s a set of rules that determines the best routes for data transmission across the vast network. The internet is made up of many different networks, called autonomous systems (AS), that are run by different entities like governments or internet service providers (ISPs). BGP helps these ASes talk to each other and figure out the most efficient way to get data packets from one place to another.
Incident Timeline and Technical Breakdown
Starting on June 27 at 18:51 UTC, Eletronet S.A. (AS267613) mistakenly announced the 1.1.1.1/32 IP address to its peers and upstream providers. This incorrect announcement was accepted by multiple networks, including a Tier 1 provider, interpreting it as a Remote Triggered Blackhole (RTBH) route. The specificity of BGP routes caused traffic intended for Cloudflare’s DNS resolver to be misrouted, effectively blackholing it.
As the incident unfolded, a second problem arose at 18:52 UTC when Nova Rede de Telecomunicações Ltda (AS262504) leaked the 1.1.1.0/24 route upstream to AS1031. This leak propagated further, worsening the initial issue and disrupting global routing. Cloudflare detected these issues around 20:00 UTC and resolved the hijack within roughly two hours. The route leak was fully addressed by 02:28 UTC the following day.
Immediate Mitigations
To counter the impact, Cloudflare coordinated with affected networks and deactivated peering sessions with those propagating incorrect routes. The implementation of Resource Public Key Infrastructure (RPKI) helped automatically reject invalid routes, preventing internal routing disruptions.
Cloudflare suggests enhancing route leak detection by incorporating more data sources and real-time information. They also push for wider adoption of RPKI for Route Origin Validation (ROV) and adherence to Mutually Agreed Norms for Routing Security (MANRS), including rejecting invalid prefix lengths and applying rigorous filtering. Additionally, Cloudflare advises networks to prohibit IPv4 prefixes longer than /24 in the Default-Free Zone (DFZ).
Profile of Cloudflare’s DNS Service
Since its launch in 2018, Cloudflare’s 1.1.1.1 public DNS resolver has grown in popularity. However, its well-known IP address has encountered operational challenges, such as being appropriated by networks for testing, leading to unexpected traffic or routing issues.
The technical error analysis shows that AS267613 advertised 1.1.1.1/32 to their peers and providers, while AS262504 leaked 1.1.1.0/24 upstream, disrupting normal BGP anycast paths. Public route collectors and the monocle tool tracked these errant BGP updates, confirming acceptance of the invalid 1.1.1.1/32 route by numerous networks.
Last Updated on November 7, 2024 3:40 pm CET