In a status update on Monday, Alibaba Cloud services said it was still gradually restoring access to servers and data affected by an explosion and fire last week. Specifically, servers were being “carefully dried” after sprinklers, and firefighters drenched them to quench a fire. We hope that works well, as unintentional liquid cooling can harm electronics.
Alibaba Cloud monitoring first flagged “network access anomalies” at its Zone C Singapore facilities last Tuesday, September 10. It quickly became aware of a fire in progress and updated customers to reassure them that firefighters were on the scene to quench the reported lithium battery explosion and fire. It seems to have taken just 30 minutes for Alibaba to switch its cloud network and security products to use data centers elsewhere. However, it asked customers who had control over such things to migrate their production workloads ASAP.
Several hours later, on the day of the explosion and subsequent fire, Alibaba updated customers to announce some hardware was being affected by some “abnormalities in high-temperature environment.” Over 12 hours after the first reports on the network abnormalities, Alibaba implemented an emergency power shutdown in the affected building zones. It said that water spray from firefighting was “posing a risk of electrical short circuits.”
The Register revealed further details about the data center disaster via local Singaporean media. It recounted reports that firefighting robots were deployed – thus keeping personnel safe from any potential further explosions and toxic fumes. It also pointed out that service providers like Lazada and Bytedance (TikTok) have experienced significant disruptions while cloud resources were shuffled around.
The weekend saw the first engineering access to fire-affected machines since the day of the lithium-ion-fired catastrophe. An on-site team began preparations for drying the equipment, wiring, powering on, verification, debugging, etc.
According to Alibaba Cloud’s status page for the affected Zone C data center, all services are ‘normal’ at the time of writing. At worst, the status page indicates that 15 of the data center’s services were abnormal after the accident.