- This event has passed.

# Ph.D: Thesis Defense: ONLINE: CDS: 05 October 2021 : “Algorithms for estimating integrals in high dimensional spaces”

## 05 Oct @ 11:30 AM -- 12:30 PM

**DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES**

__Ph.D. Thesis Defense (Online)__

**Speaker** : Mr. Arun I

**S.R. Number **: 06-18-01-10-12-15-1-12867

**Title ** : “Algorithms for estimating integrals in high dimensional spaces”

**Date & Time **: 05th October 2021 (Tuesday),11:30 AM

**Venue ** : Online

__ABSTRACT__

Sampling, estimation, and integration in high dimensional continuous spaces is required in diverse areas ranging from modeling multi-particle physical systems and optimization to inference from data. When the number of independent parameters ‘n’ increases, analytical methods are not always tractable and numerical methods require exponentially increasing computational effort (NP-hardness). In the first and a major part of the thesis, a general Monte Carlo sampling method to estimate n-volumes and integrals, that is agnostic to the non-convexity and roughness of the boundaries of the domain is proposed. Deterministic sampling methods such as the Quasi-Monte Carlo are very efficient when the integrand can be reduced to a function of a single effective variable. Similarly, the naive Monte Carlo is very effective when the independent variables are sampled uniformly over an n-orthotope (a rectangle for n = 2, cuboid for n = 3, etc.). Even in problems where the independent variables can be sampled with an implicit probability distribution accompanied by a re-weighting, correctly sampling the n-volume of the domain amounts to be NP-hard in general, where an arbitrary function defines the boundary of the domain or its membership.

Markov Chain Monte Carlo (MCMC) methods are suited for convex domains and scale as O(n^4) in samples required for the estimation of volume. The proposed n-sphere Monte Carlo (NSMC) method preserves the independence of the random samples making it well suited for parallel computing. It decomposes the estimated volume into volumes of weighted n-spheres, and these weights are trivially estimated by sampling the extents of the domain. While other methods typically scale well only for relatively smooth convex bodies, the performance of the proposed method is only dependent on the variance of the distribution of extents and is independent of the smoothness and the convexity of the body. The required number of samples to estimate n-volumes scales linearly with the number of dimensions for a fixed distribution of extents. A straight-forward adaptation of this method for estimating arbitrary integrals is shown. In the case of convex shapes which are not defined by a fixed distribution of extents, numerical results show that the naive NSMC has significant advantages over MCMC for number of dimensions n < 100. Also, this approach has challenges with highly eccentric volumes given by tailed distributions, and the large front constants of the linear scaling in such cases can be reduced by an appropriate non-uniform importance sampling of the extents. The challenges in such a non-uniform sampling in high dimensional spaces are described along with the proposed solution. With this geometric importance sampling, it is shown that the O(n) scaling in samples can be achieved for spheroids even with the distribution of extents varying with n, and for higher dimensions and eccentricities where the tail of the distribution of extents contributes significant volume. Future work is aimed at maintaining this favorable scaling in samples over MCMC even for other challenging convex shapes and large values of n.

In the second and minor part of the thesis, a non-sampling method to compute functions of scalar random variables using their moments, is proposed. This method, while restricted to simple functions, can be applied to augment the NSMC approach to integration and provides semi-analytical expressions for evaluating the moments of functions of random variables.