Project Details
Description
Data centers (DC) are the backbone of the modern digital economy, critical to the U.S. economical growth, national security, public health, and enhanced data security and management. DCs are an energy intensive infrastructure, accounting for over 4% of total electricity use worldwide. As demand for AI and cloud computing grows, efficient cooling systems are critical to ensuring reliable and resilient DC operations. A critical failure in DC cooling systems can have catastrophic consequences, including total system shutdown, loss of data, and IT equipment. To prevent such catastrophic events, novel Fault Detection and Diagnostics (FDD) and mitigation techniques are essential. Currently, most FDD methods rely on conventional statistical techniques, machine learning models, or ad-hoc estimations. However, these methods are often limited in scope and may fail to detect rare or complex failure scenarios – particularly those arising from complex cascading events or malicious cyber-attacks. To tackle this challenge, this project develops a new FDD method based on failure and cyber-attack detection in supervisory control theory of discrete event systems. The intellectual merits of this project are: (1) new FDD methods for detecting and mitigating cascading faults and cyber-attacks resulting in resilient DC cooling system operation, (2) an open-source virtual testbed for evaluating performance of the proposed algorithms, and (3) a hardware-in-the-loop testbed to understand the challenges of FDD-enabled controls in real-world DC cooling equipment. The broader impacts of this project include new FDD methods to transform conventional DC cooling system design and management into future resilient DC cooling infrastructure and a field-validated computational framework for advanced FDD analysis of resilient cyber-physical infrastructure.
By integrating the event-driven supervisory control and physics-based modeling, the goal of this project is to develop a field-validated, FDD-enabled, model-based control and computation framework for the robust design and reliable operation of next-generation resilient DC cooling systems. The proposed FDD method: (1) identifies, analyzes, and captures complex dynamics of benign and malicious faults, with rigorous detection guarantees, (2) characterizes critical attack vectors that pose a severe threat to cooling system management and DC operation, and (3) generates robust, real-time control responses that enable adaptive system adjustments to sustain normal cooling operation during major disruptions. The findings of this research are expected to have a broad range of real-world applications, particularly in design and development of attack-resilient DC cooling infrastructure across the United States and can be used by DC developers, technology companies, utilities, and HVAC manufacturers.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
| Status | Active |
|---|---|
| Effective start/end date | 10/1/25 → 9/30/28 |
Funding
- National Science Foundation: $340,000.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.