UPS Failures Are Still Taking Down the Cloud—Here’s What We’ve Learned

April 17, 2025

On March 20, 2025, a UPS system failure triggered a regional outage for Google Cloud, once again spotlighting a persistent and critical weakness in data center infrastructure. Despite advances in redundancy and monitoring, uninterruptible power supply (UPS) systems remain a leading source of downtime—particularly in highly available, hyperscale environments.

Google Cloud Platform us-east5 region UPS failure

The incident, as reported by Data Center Dynamics, involved a control system fault that prevented the UPS from transferring to battery during a utility failure, ultimately impacting customer workloads in the us-east5 region. Google’s postmortem noted that while alarms were generated, they were not escalated or acted upon in time—highlighting the dual challenge of both mechanical and human reliability in power chain management.

MCIM’s Static UPS System Reliability Benchmarking Report—one of the most comprehensive in the industry—provides context for how widespread this issue really is. The report analyzed over 18,000 static UPS units across 2,000+ facilities and found:

18.1% of all UPS failures are due to control system faults, the same root cause as the Google incident.
A full 40.7% of UPS-related outages resulted in at least partial loss of power to critical loads.
Despite the mission-critical nature of UPS equipment, only 21% of organizations track component-level reliability metrics that could help predict failures before they occur.

These findings underscore a troubling gap in both visibility and proactive maintenance.

The MCIM platform helps operators close this gap. By digitizing UPS and power system monitoring, maintenance, and incident workflows, MCIM enables:

Early warning detection via structured incident tracking and real-time equipment performance analytics.
Benchmarking and trend analysis across a client’s entire asset base to identify higher-risk models, configurations, or maintenance practices.
Data-driven decision-making for maintenance scheduling, capital planning, and component replacement—reducing both the risk and impact of UPS failure.

The cloud doesn’t fail often—but when it does, the cost is enormous. As Google Cloud’s outage shows, even the most sophisticated providers aren’t immune. The industry must move from reactive incident response to predictive infrastructure management.

With MCIM, the data center industry has the opportunity to learn from past failures—before they happen again.

To learn more or to schedule a demo, please visit: www.mcim24x7.com

More Resources

Insights

Data-driven maintenance: The backbone of modern preventative strategies

Data-driven maintenance reduces costs, improves reliability, and enhances preventative strategies with predictive insights and real-time data.

Insights

Unlock the Power of KPIs for Better Data Center Maintenance

Maintenance KPIs measure how well an organization meets its maintenance goals, such as reducing costs or downtime.

Insights

Prevent Costly Data Center Outages: Human Error, Downtime Costs & Solutions with MCIM

Discover how MCIM helps mitigate data center outages caused by human error and complexity. Predictive maintenance and operational intelligence cut

Avoid Costly Mistakes: Your Guide to Choosing the Right Mission-Critical Facility Management Solution

By Role

By Use Case

Fulcrum Collaborations and Salute partner to elevate data center operations

Latest

Data-driven maintenance: The backbone of modern preventative strategies

Unlock the Power of KPIs for Better Data Center Maintenance

Prevent Costly Data Center Outages: Human Error, Downtime Costs & Solutions with MCIM

How To Overcome Data Center Staffing Challenges In 2025

UPS Failures Are Still Taking Down the Cloud—Here’s What We’ve Learned

More Resources

Data-driven maintenance: The backbone of modern preventative strategies

Unlock the Power of KPIs for Better Data Center Maintenance

Prevent Costly Data Center Outages: Human Error, Downtime Costs & Solutions with MCIM

Talk To An Expert

Subscribe to the MCIM Newsletter.

Contact