Ph.D. Thesis {Colloquium}: CDS : “A scalable asynchronous discontinuous-Galerkin method for massively parallel PDE solvers.”


7 Feb 24    
10:00 AM - 11:00 AM

Event Type

Ph.D. Thesis Colloquium

Speaker  : Mr. Shubham Kumar Goswami

S.R. Number  : 06-18-00-10-12-19 -1-17224

Title :  “A scalable asynchronous discontinuous-Galerkin method for massively parallel PDE solvers ”

Research Supervisor:  Dr. Konduri Aditya
Date & Time  : February 07, 2024 (Wednesday) at 10:00 AM
Venue   : The Thesis Colloquium will be held on HYBRID Mode
# 102 CDS Seminar Hall /MICROSOFT TEAMS.

Please click on the following link to join the Thesis Colloquium:

MS Teams link


Accurate simulations of turbulent flows in computational fluid dynamics (CFD) are crucial for comprehending numerous complex phenomena in engineered systems and natural processes. These flows are governed by nonlinear partial differential equations (PDEs), which are approximated as algebraic equations and solved using PDE solvers. However, the complexity of turbulence makes these simulations computationally expensive, necessitating the use of massively parallel supercomputers. While advancements such as hardware-aware computing, fault tolerance, and overlapping computation and communication have improved solver scalability, achieving efficient performance at extreme scales remains a challenge owing to the communication and synchronization overhead. To address this issue, an asynchronous computing approach was introduced that relaxed communication and synchronization at a mathematical level, allowing PEs to operate independently regardless of the status of messages, potentially decreasing communication overhead and enhancing scalability. This approach has been developed specifically for finite difference schemes, which are widely used but not ideal for complex geometries and unstructured meshes. The objective of this work is to develop an asynchronous discontinuous-Galerkin method that can provide high-order accurate solutions for various flow problems on unstructured meshes and demonstrate its scalability.

Based on the asynchronous computing approach, several PDE solvers have been developed that use high-order asynchrony-tolerant finite difference schemes for spatial discretization to simulate reacting and non-reacting turbulent flows, achieving significant improvements in scalability. However, for time integration, most of them used either multi-step Adams-Bashforth schemes, which possess poor stability, or multi-stage Runge-Kutta (RK) schemes with an over-decomposed domain that necessitates larger message sizes for communication and redundant computations. In this work, we propose a novel method to couple asynchrony-tolerant and low-storage explicit RK (LSERK) schemes to solve time-dependent PDEs with reduced communication efforts. We developed new asynchrony-tolerant schemes for ghost or buffer point updates that are necessary to maintain the desired order of accuracy. The accuracy of this method has been investigated both theoretically and numerically using simple one-dimensional linear model equations. Thereafter, we demonstrate the scalability of the proposed numerical method through three-dimensional simulations of decaying Burgers’ turbulence performed using two different asynchronous algorithms: communication-avoiding and synchronization-avoiding algorithms. Scalability studies up to 27,000 cores yielded a speed-up of up to 6× compared to a baseline synchronous algorithm.

In recent years, the discontinuous Galerkin (DG) method has received broad interest in developing PDE solvers, particularly for nonlinear hyperbolic problems, due to its ability to provide high-order accurate solutions in complex geometries, capture discontinuities, and exhibit high arithmetic intensity. However, the scalability of DG-based solvers is hindered by communication bottlenecks that arise at extreme scales. In this work, we introduce the asynchronous DG (ADG) method, which combines the benefits of the DG method with asynchronous computing by relaxing the need for data communication and synchronization at the mathematical level to overcome communication bottlenecks. The proposed ADG method ensures flux conservation and effectively addresses challenges arising from asynchrony. To assess its stability, we employ Fourier-mode analysis to examine the dissipation and dispersion behavior of fully-discrete DG and ADG schemes with the Runge-Kutta (RK) time integration schemes across the entire range of wavenumbers. Furthermore, we present an error analysis within a statistical framework, which demonstrates that the ADG method with standard numerical fluxes achieves at most first-order accuracy. To recover accuracy, we derived asynchrony-tolerant (AT) fluxes that utilize data from multiple time levels. Finally, extensive numerical experiments are conducted to validate the performance and accuracy of the ADG-AT scheme for both linear and nonlinear problems.

With the development of the asynchronous discontinuous-Galerkin (ADG) method, we finally put our focus on implementing and evaluating its performance in solving hyperbolic equations with shocks/discontinuities. To achieve this, we chose a highly scalable DG solver for compressible Euler equations from deal.II, which is one of the widely used open-source finite element libraries. The solver uses low-storage explicit Runge-Kutta schemes for the time integration. We implemented the ADG method in deal.II, incorporating the communication-avoiding algorithm (CAA), and performed validation and benchmarking, showcasing the accuracy limitations of standard ADG schemes and the effectiveness of newly developed asynchrony-tolerant (AT) fluxes. Strong scaling results are provided for both synchronous and asynchronous DG solvers, demonstrating a speedup of up to 80%. Since these AT fluxes are also compatible with the finite volume (FV) method, the overall work highlights the potential benefits of the asynchronous approach for the development of accurate and scalable DG and FV-based PDE solvers, paving the way for simulations of complex physical systems on massively parallel supercomputers.