We sat down with Steve Wallace, AIS Data Centers’ Chief Technology Offer, to get a clear understanding of what happened with Level 3’s Southern California fiber cut yesterday.
At approximately 8:10 am PT yesterday (Monday) morning there was a fiber cut in El Centro, CA. Level 3 was using this fiber link in support of their Internet service in San Diego.
Level 3’s network did not fail completely (as BGP was still operational and links were up) but latency from Level 3 in San Diego to Level 3 in Los Angeles increased dramatically — as high as 140ms from a normal 4ms.
Since BGP routing is not latency-aware, traffic continued to flow through the high-latency links with no automatic peer failover.
AIS Data Centers’ Operations Control Center (OCC) team observed the unusual latency and verified that the issue was local and isolated to Level 3.
The AIS Engineering team then began to move customer traffic to alternate paths, including a 10Gbps direct connection to Level 3 in Anaheim which had no latency. The fine tuning was accomplished through use of several tools including a combination of BGP communities and local preference changes across several peers.
There was a lot of traffic to move and significant potential for making matters worse, but the AIS Engineering team proceeded cautiously and carefully to minimize customer impact. In all it took nearly an hour and a half to groom the traffic to other connections but the end result was happiness all around.
For additional details about this incident and how the AIS Engineering team regularly tackles a myriad of sophisticated challenges including large DDoS attacks, please contact us — we’d be happy to share our experiences.