TY - GEN
T1 - CloudPD
T2 - 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2013
AU - Sharma, Bikash
AU - Jayachandran, Praveen
AU - Verma, Akshat
AU - Das, Chita R.
PY - 2013
Y1 - 2013
N2 - In this work, we address problem determination in virtualized clouds. We show that high dynamism, resource sharing, frequent reconfiguration, high propensity to faults and automated management introduce significant new challenges towards fault diagnosis in clouds. Towards this, we propose CloudPD, a fault management framework for clouds. CloudPD leverages (i) a canonical representation of the operating environment to quantify the impact of sharing; (ii) an online learning process to tackle dynamism; (iii) a correlation-based performance models for higher detection accuracy; and (iv) an integrated end-to-end feedback loop to synergize with a cloud management ecosystem. Using a prototype implementation with cloud representative batch and transactional workloads like Hadoop, Olio and RUBiS, it is shown that CloudPD detects and diagnoses faults with low false positives (< 16%) and high accuracy of 88%, 83% and 83%, respectively. In an enterprise trace-based case study, CloudPD diagnosed anomalies within 30 seconds and with an accuracy of 77%, demonstrating its effectiveness in real-life operations.
AB - In this work, we address problem determination in virtualized clouds. We show that high dynamism, resource sharing, frequent reconfiguration, high propensity to faults and automated management introduce significant new challenges towards fault diagnosis in clouds. Towards this, we propose CloudPD, a fault management framework for clouds. CloudPD leverages (i) a canonical representation of the operating environment to quantify the impact of sharing; (ii) an online learning process to tackle dynamism; (iii) a correlation-based performance models for higher detection accuracy; and (iv) an integrated end-to-end feedback loop to synergize with a cloud management ecosystem. Using a prototype implementation with cloud representative batch and transactional workloads like Hadoop, Olio and RUBiS, it is shown that CloudPD detects and diagnoses faults with low false positives (< 16%) and high accuracy of 88%, 83% and 83%, respectively. In an enterprise trace-based case study, CloudPD diagnosed anomalies within 30 seconds and with an accuracy of 77%, demonstrating its effectiveness in real-life operations.
UR - http://www.scopus.com/inward/record.url?scp=84883417867&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84883417867&partnerID=8YFLogxK
U2 - 10.1109/DSN.2013.6575298
DO - 10.1109/DSN.2013.6575298
M3 - Conference contribution
AN - SCOPUS:84883417867
SN - 9781467364713
T3 - Proceedings of the International Conference on Dependable Systems and Networks
BT - 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2013
Y2 - 24 June 2013 through 27 June 2013
ER -