Ph.D. Thesis Defense
S.R. Number : 06-18-02-10-12-17-1-14492
Venue : The Thesis Défense will be held on HYBRID Mode
# 102 CDS Seminar Hall /MICROSOFT TEAMS
Please click on the following link to join the Thesis Defense:
MS Teams link
The existing works on storage resiliency focus on maintaining sufficient user data redundancy in the system to maintain a reliable service. However, providing a global-scale storage solution requires various functional and management layers to ensure the service is accessible and all the stored items are durable. The first part of our work proves that resiliency at the stored data level does not guarantee service level reliability. A generic cloud storage system model is designed to analytically show that the reliability achieved at the service level drastically differs from the reliability ensured by stored data redundancy. This motivates us to bring the entire system into purview to understand cloud storage resiliency.
Due to the complexity and variation of large-scale storage architectures, assessing end-to-end storage resiliency is a challenging task. To achieve this, the second part of the work proposes a generic resiliency evaluation method for cloud storage services. The method identifies the essential functional layers for storage service and the components constituting the layers. It then performs an in-depth behavior analysis during all possible failures of each component. The method is used to assess the resiliency of two diverse and real-world cloud storage services, OpenStack Swift and CephFS. The analysis identifies various resiliency weak points in the service architectures and depicts the effectiveness of different resiliency methods used at various layers.
The third part of the work extends the resiliency evaluation method to understand the correlation of resiliency with the service usage pattern. A storage service can be used for different use cases resulting in the variation of request interarrival time, read and write ratio, accessed data and metadata, etc. Hence, the components involved in access sequences may differ, and so can their failure impact. Using the improved resiliency evaluation method and access patterns identified from real traces, we show that resiliency can be selective and dynamically adjusted based on workloads without affecting service reliability.
Finally, the work defines an end-to-end resiliency analysis framework for cloud storage services that enables quantification, comparison, and optimization of cloud storage resiliency. The framework allows effective modeling of cloud storage resilience by combining the resiliency of each component participating in service reliability maintenance for specific workloads. The framework successfully models the resiliency of OpenStack Swift and CephFS as Stochastic Petri Nets (SPNs). The models are used to quantify and compare the resiliency of the above two service architectures and demonstrate how to optimize resiliency while achieving expected service reliability.
ALL ARE WELCOME