Rescheduled: M.Tech Research Thesis {Colloquium}: CDS: “Scalable Distributed Frameworks for analysis on Large Evolving Graphs”

When

12 Dec 24    
10:00 AM - 11:00 AM

Event Type

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES

M.Tech Research Thesis Colloquium


Speaker : Ms. Ruchi Bhoot

S.R. Number : 06-18-01-10-22-22-1-20916

Title : “Scalable Distributed Frameworks for analysis on Large Evolving Graphs”

Research Supervisor :Prof. Yogesh Simmhan

Date & Time : December 11, 2024 (Wednesday), 2:00pm

Venue : # 102 CDS Seminar Hall


ABSTRACT

The analysis of graph-structured data has become increasingly important as networks in various domains, including science, engineering, and business, grow in size, complexity, and dynamism. While static graph analysis has been well-explored, there is now a shift towards understanding how networks evolve over time due to changes in vertices and edges. These evolving networks are termed temporal graphs, and they require new methods to study their dynamic properties. However, there is limited research on distributed programming frameworks and scalable platforms for designing and executing incremental algorithms on such time-varying graphs, especially those updated at high rates in applications like social networks and financial systems.

In response to the challenges of processing dynamic temporal graphs, we introduce TARIS, a distributed platform designed to execute time-respecting algorithms on large-scale, evolving networks. TARIS extends the windowed Interval-centric Computing Model (ICM) by enabling it to perform incremental computation, making it suitable for continuous graph updates. We formalize the properties of temporal algorithms and prove that our model of incremental computing over streaming updates is equivalent to recomputing from scratch. This helps reduce runtime by multiple orders of magnitude. TARIS uses optimized data structures to reduce memory access and enhance locality during graph updates. We also propose scheduling strategies to pipeline updates and computations, and streaming strategies to adapt the execution window to variable input rates.

Related to this is the need to partitioning large evolving graphs whose edges stream in incrementally. Traditional partitioning methods prioritize balancing edge counts and minimizing vertex replication but overlook the community structure inherent in many graphs. We leverage a novel optimization function that preserves local triangle counts and the graph’s community structure and evaluate its scalable distributed runtime design. Our results report significantly improved triangle count metrics while maintaining edge balance and vertex replication efficiency and achieving scalability.


ALL ARE WELCOME