Change in Date: Ph.D:Colloquium: Finite-element methods & exascale algorithms for fully relativistic noncollinear pseudopotential density functional theory: From mathematical formulations to efficient computational realization & magnetic materials applications

When

10 Jun 26    
3:30 PM - 4:30 PM

Event Type

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES
Ph.D. Thesis Colloquium


Speaker: Mr. NIKHIL KODALI
S.R. Number: 06-18-01-10-12-21-1-19476
Title: “Finite-element methods and exascale algorithms for fully relativistic noncollinear pseudopotential density functional theory: From mathematical formulations to efficient computational realization and magnetic materials applications”
Research Supervisor: Dr. Phani Motamarri
Date & Time : June 10, 2026 (Wednesday), 03:30 PM
Venue : #102, CDS Seminar Hall


ABSTRACT
Next-generation technologies such as energy-efficient spintronic memory (MRAM), skyrmionic logic devices, and topological quantum computing platforms rely on quantum materials exhibiting noncollinear (NC) magnetism and spin-orbit coupling (SOC). Predictive first-principles simulations are indispensable for understanding and designing these systems, in which magnetic anisotropy, spin textures, and frustrated magnetic order play a central role. Realistic modeling of these phenomena in layered magnets and magnetic heterostructures often requires large simulation domains to capture moiré patterns, defects, or long-wavelength magnetic textures. However, performing NC-SOC density functional theory (DFT) calculations efficiently remains challenging, as they introduce complex two-component spinor wavefunctions, local and nonlocal spin-dependent Hamiltonian terms, and significantly larger eigenvalue problems than their collinear counterparts, making NC-SOC calculations 12x–30x more computationally expensive than unpolarized calculations. To address these steep computational demands, this thesis develops exascale algorithms and mathematical formulations for NC-SOC DFT within a systematically convergent finite-element (FE) framework. Specifically, we propose a local reformulation of DFT electrostatics, devise a unified force/stress framework, develop a residual-based Chebyshev-filtered subspace iteration (R-ChFSI) eigensolver robust under reduced-precision operations, and design GPU-optimized data-movement schemes—integrating these advancements into the open-source DFT-FE code. These developments make fully relativistic calculations for medium-scale systems (~10,000–20,000 electrons) highly efficient, while bringing systematically convergent NC-SOC simulations of large-scale systems containing up to 100,000 electrons within reach for the first time.

We establish the local real-space formalism for NC-SOC DFT within an FE discretization utilizing optimized norm-conserving Vanderbilt (ONCV) pseudopotentials. We develop a highly efficient strategy for the local reformulation of DFT electrostatics to derive the FE-discretized governing equations involving two-component spinors. To handle exchange-correlation (XC) effects, we employ the locally collinear approximation and propose robust regularization strategies tailored for FE discretization to address numerical singularities in generalized-gradient approximation (GGA) functionals in regions of vanishing or small magnetization. Furthermore, we devise a unified generalized force and stress framework to compute accurate atomic forces and periodic unit-cell stresses for NC-SOC systems, enabling structural relaxation, unit-cell optimization, and the investigation of competing magnetic configurations.

To address the computational bottleneck of solving the resulting sparse generalized eigenvalue problem, we introduce the residual-based Chebyshev-filtered subspace iteration (R-ChFSI). While traditional ChFSI is suited for repeated eigensolves in self-consistent field (SCF) iterations, it is highly sensitive to inexact operations. Consequently, it fails to converge when using approximate inverses of the overlap matrix to construct the Chebyshev-filtered subspace rich in the desired eigenvectors for generalized eigenproblems, or when leveraging low-precision GPU arithmetic to reduce time-to-solution for the eigensolve. By recasting the Chebyshev polynomial recurrence in terms of residuals rather than direct eigenvector updates, we derive a scheme with provably more robust convergence characteristics under inexact operations. Consequently, R-ChFSI achieves robust convergence under approximations, tolerating the use of inexpensive approximate inverses for generalized eigenproblems, low-precision arithmetic (FP32/TF32), and reduced-precision (BF16) interprocess communication in distributed sparse matrix-vector products. We demonstrate that R-ChFSI reliably meets stringent electronic-structure tolerances (e.g., $10^{-8}$ residual tolerance) while providing a robust mathematical foundation for leveraging modern GPU hardware.

To translate these algorithmic advances into high-performance scalability on exascale architectures, we target key floating-point and data movement bottlenecks. Although modern GPU architectures offer dramatically higher throughput for low-precision arithmetic, eigensolvers in scientific simulations have struggled to exploit this capability without sacrificing accuracy. The proposed R-ChFSI algorithm resolves this challenge, enabling mixed-precision computations and block floating-point compressed MPI communication with over 4x compression ratios. Together, these optimizations dramatically reduce time-to-solution and communication overhead, enabling fully relativistic NC-SOC DFT simulations of systems with up to 100,000 electrons.

We validate the accuracy, robustness, and scaling of our framework through systematic benchmarks and large-scale studies. Eigensolver benchmarks confirm that R-ChFSI maintains robust convergence under approximate inverses and reduced precision, yielding filtering speedups of up to 2.7x on GPU accelerators compared to standard implementations. By leveraging these eigensolver advances, the overall FE framework achieves up to 8x–11x speedups in minimum wall time for semi-periodic and non-periodic systems with thousands of electrons compared to widely used plane-wave implementations on CPUs, while maintaining excellent agreement in ground-state energetics, forces, and stresses. Large-scale performance tests demonstrate excellent strong and weak scalability on modern GPU-accelerated supercomputers.

To demonstrate the capability of these developments to enable novel physical investigations previously computationally inaccessible, we study the layered ferromagnet Fe3GeTe2, a system of key interest in 2D magnetism and spintronics. Specifically, we investigate the role of Fe vacancies, exploring their impact on the system’s energetics, localized magnetic order near defects, and defect-defect interactions using fully relativistic, noncollinear calculations.

Finally, we present formulations and results for extending this finite-element approach to curvilinear coordinates. This extension enables the efficient resolution of sharp variations in wavefunctions and densities using adaptive, non-uniform meshes (leveraging the unique flexibility of finite-element methods), thereby reducing the number of degrees of freedom (DoFs) required to achieve a target accuracy. We detail the coordinate transformations of the spinor-valued Kohn-Sham equations and electrostatic formulations, while leveraging the previously developed generalized force framework to compute forces and stresses. These developments extend systematically convergent, adaptive real-space DFT simulations to curvilinear coordinates.

In summary, this work advances both the theoretical formulation and computational realization of relativistic noncollinear DFT, dramatically accelerating medium-scale calculations while enabling systematically convergent simulations at an unprecedented scale on exascale supercomputers. By developing an FE discretization for Kohn-Sham DFT utilizing fully relativistic ONCV pseudopotentials, formulating a generalized force/stress framework, designing the R-ChFSI eigensolver, implementing GPU-centric optimizations, and extending these methods to curvilinear coordinates, this work provides a robust and highly efficient platform for predictive ab initio materials simulations. These developments effectively bridge the gap between the complex physics of relativistic noncollinear magnetic systems and the computational efficiency required to access experimentally relevant length and time scales.


ALL ARE WELCOME