Update (2): T-Mobile has told us that everything is back to normal and this was not an attack.
This was not a cyberattack. This has been resolved – it was an internal technical issue that temporarily impacted some platforms.
Update (1): Everything seems to be getting back to normal. According to a Reddit post, a rogue employee ran a script that brought the system down. This claim hasn’t been verified but if that’s what happened, it suggests a bad actor was able to access T-Mobile‘s internal system and make it dance to their tune. Alternatively, it may have been an honest mistake.
A script was executed that deleted every single namespace managed by the Conducktor platform. A namespace is essentially an abstraction of a cluster of EC2 instances that are leased from AWS. Conducktor manages the leasing, organization, networking configuration, and API orchestration to handle deployment and configuration of AWS stuff in general but its primarily EC2 instances, K8S configuration, some Redis, Elsaticache, Routing/Load Balancers, etc. It’s a lot. Too much to list and it gets complicated in a hurry and I don’t know how to succinctly summarize it. Maybe “Giant magical AWS wrapper” ?
But the overall is that this means that every team that owned an application or service, that deployed to AWS via Conducktor, had their stuff nuked. Conducktor is very widely used in Digital for APIs and applications. So most UI applications that are served from a webserver, APIs running on a java server, etc., were impacted as those servers themselves were deployed to EC2 instances managed by Conducktor. This is why this was such a widespread problem across channels (Retail, Care/Telesales, Web and App) as well as across lines of business (Prepaid, Postpaid, Business, Tmobile Money – which nobody knows exists nor should they, etc etc).
Quoting from a guy on the bridge, this was done by “A rogue admin ID” …so …I dunno, that smells really bad to me. Like, someone’s going to jail kinda bad.
###
T-Mobile is experiencing what appears to be a widespread outage, with affected users flocking to outage monitoring site Downdetector and social media platforms Reddit and X to report issues.
More worryingly, some customer accounts have been suspended because they were unable to make payments due to the outage.
Oops, somebody unplugged the site. The site is currently unavailable. We’re working on it, but in the meantime please give us a call for anything you need.”
The breadth of disruption seems wide, with one X user reporting that they saw an error for T-Mobile‘s Apache Kafka event store and received a warning that the service to route the host address account.t-mobile.com couldn’t be found.
P.S. Even assuming that this is caused by an innocent error (which is common), it could still be leveraged by attackers who were waiting for a disruption like this to strike. So regardless of the explanation, monitor and exercise extra care for now if you are a t-mobile user.
— zooko ⓩ (@zooko) January 11, 2024
Apparently, even T-Mobile‘s frontline teams are seeing login and other errors in many applications, hindering their ability to carry out tasks like process activations and account modifications.
If you reach out to retail and customer care representatives, they will likely tell you that they are experiencing “challenges” that have impacted their ability to process transactions.
We’re experiencing system challenges impacting our ability to process nearly all transactions.”