BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.2.3.1//EN
TZID:Asia/Kolkata
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
UID:55@cds.iisc.ac.in
DTSTART;TZID=Asia/Kolkata:20240618T103000
DTEND;TZID=Asia/Kolkata:20240618T113000
DTSTAMP:20240610T175649Z
URL:https://cds.iisc.ac.in/events/ph-d-thesis-colloquium-cds-application-s
 ervice-resilience-framework-an-end-to-end-perspective/
SUMMARY:Ph.D. Thesis {Colloquium}: CDS: "Application Service Resilience Fra
 mework: An end-to-end perspective."
DESCRIPTION:DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES\nPh.D. Thesis Co
 lloquium \n\n\n\n\n\nSpeaker                 : Ms. Dhanya R Math
 ews\nS.R. Number        : 06-18-02-10-12-18-1-15855\nTitle        
                :  "Application Service Resilience Framework: An e
 nd-to-end perspective "\nResearch Supervisor:  Dr. J. Lakshmi\nDate &amp
 \; Time         : June 18\, 2024 (Tuesday) at 10:30 AM\nVenue     
                 : The Thesis Colloquium will be held on HYBRID 
 Mode\n                                  # 102 CDS Seminar
  Hall /MICROSOFT TEAMS.\nPlease click on the following link to join the T
 hesis Colloquium:\nMS Teams link\n\n\n\n\n\n\n\n\n\n\nABSTRACT\n\n\nThe id
 ea of computing as a utility was realized with the emergence of the cloud 
 computing paradigm. Cloud service providers offer a wide range of services
  that are delivered over the Internet to cloud service consumers. In its c
 urrent manifestation\, the Cloud services are realized over multiple logic
 al\, virtualized\, and distributed resources\, typically using a multi-lay
 ered architecture. The providers document the non-functional service level
  guarantees like availability\, performance\, security\, etc\, in Service 
 Level Agreements (SLAs) provided to the consumer as Service Level Objectiv
 es (SLO). The wide adoption of cloud computing\, compounded with the emerg
 ence of microservice architecture\, has resulted in a considerable increas
 e in the number of components involved in service delivery. Manually addre
 ssing failures in real-time is inefficient and often impossible at the clo
 ud scale\, where failures are a norm rather than an exception. Ensuring th
 e quality of an application service\, as documented in the SLA\, therefore
  requires autonomous mechanisms to enhance cloud services' resilience.\n\n
 Though cloud setups rely on highly autonomous service layers for managing\
 , provisioning\, and monitoring applications\, most of them focus on a spe
 cific cloud service architecture layer or consider only a particular set o
 f faults. Any component across the cloud service stack involved in the ser
 vice delivery could disrupt the SLO. Further\, as cloud services use share
 d infrastructure\, monitoring and acting on the individual service layer m
 etrics is limiting. In such a scenario\, the visibility of failure anywher
 e in the stack can offer effective recovery/remediation strategies\; hence
 \, an application-oriented approach that takes an end-to-end view of failu
 res makes the case for any resiliency solution. Towards this\, we propose 
 an end-to-end service resilience framework that employs data-dependent int
 elligent autonomous mechanisms to deal with cloud service disruptions effi
 ciently. The intelligence to reduce the effect of disruptions is based on 
 understanding the complex interconnections and inter-dependencies of end-t
 o-end components in the cloud service stack.\n\nThe different cloud servic
 e abstraction layers and infrastructure sharing have resulted in increased
  occurrence of faults\, more specifically\, saturation faults. The initial
  phase of this work examines real-world disruption scenarios to understand
  the faults that could disrupt a cloud service. With ever-changing applica
 tions and environments on which they are hosted\, realizing a failure repo
 sitory for cloud service faults is infeasible. This makes conventional dat
 a-oriented approaches less practical and dynamic observability data-orient
 ed methods more desirable. Towards this\, the second phase of this work de
 veloped a Topology Aware Root Cause Detection Algorithm (TA-RCD) that cons
 iders the observability data from end-to-end service components and their 
 interconnectedness. Our results from the fault injection studies show that
  the proposed approach performs better than the state-of-the-art RCD algor
 ithm\, at least by 2x times for Top-5 recall and 4x times for Top-3 recall
 \, on average.\n\nTo autonomously recover a service from its anomalous sta
 te\, the remediation should target the root cause of anomalous behavior. T
 he root-cause localizations\, though accurate\, are not restricted to a sp
 ecific component because of causal effects due to service interactions. In
  order to identify the anomalous component\, the third phase of this work 
 developed a Topology Aware end-to-end failure Recovery framework (TA-REC) 
 that identifies the appropriate remediation strategy for an anomaly. The a
 nomaly scores assignment and component activity tracking in TA-REC facilit
 ates the identification of the component and the remediation that needs to
  be applied to the component. For the saturation fault scenarios injected 
 across the stack\, TA-REC can identify an adequate remediation/recovery st
 rategy than the state-of-the-art because of the better visibility of the o
 rigin of the failure. The end-to-end visibility hence enables TA-REC to be
  effective against an anomaly.\n\nIn conclusion\, this work demonstrated t
 he usefulness of the end-to-end topology of a cloud application service to
  remediate anomalies that challenge the service quality efficiently. The o
 bservations prove that looking at the service as a black box restricts the
  development of intelligent autonomous approaches to guarantee SLOs. The p
 roof-of-concept evaluations demonstrated that the intelligence to maintain
  service resilience effectively is based on an accurate understanding of t
 he end-to-end state\, as it facilitates maintaining component serviceabili
 ty by targeting the cause of failure in the stack. Future work aims to eva
 luate both TA-RCD and TA-REC for a broader range of fault scenarios in rea
 l-life production deployments.\n\n\n\n\n\n\nALL ARE WELCOME\n\n
CATEGORIES:Events,Ph.D. Thesis Colloquium
END:VEVENT
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
X-LIC-LOCATION:Asia/Kolkata
BEGIN:STANDARD
DTSTART:20230619T103000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
END:STANDARD
END:VTIMEZONE
END:VCALENDAR