Stochastic Optimization | Mohammad Fili

Abstract

At the heart of nearly every modern AI system is an optimization algorithm trying to find the best parameters in a vast, complex landscape full of ridges, valleys, and deceptive local minima. Stochastic Gradient Descent (SGD) and its variants have become the workhorses of deep learning, but they remain fundamentally limited: in high-dimensional, non-convex loss landscapes, these algorithms frequently converge to local minima that are far from the global optimum. Escaping these traps requires some form of perturbation â€” but not all perturbations are created equal.

This project, conducted in collaboration with Dr. Farzad Sabzikar from the Statistics department at Iowa State University, introduces a noise-injection optimization framework based on correlated heavy-tailed tempered fractional noise. The key insight is that the statistical properties of the injected noise â€” its heaviness of tail, its temporal correlation structure, and its tempering â€” should be matched to the structure of the optimization landscape. Heavy tails allow occasional large jumps that can escape deep local minima. Temporal correlation ensures that the perturbations are not simply random shocks but have directional persistence. Tempering controls the tail behavior to maintain mathematical tractability.

Preliminary experiments demonstrate that this framework effectively escapes local minima in cases where other advanced algorithms â€” including fractional perturbed gradient descent â€” become trapped. We have submitted a proposal to NSF’s Mathematical Foundations of Artificial Intelligence (MFAI) program to pursue this work, and the manuscript is in preparation for JMLR.

Any improvement in optimization convergence that yields better solutions with fewer computational resources has implications far beyond the specific application: it benefits the entire community of researchers and practitioners working in deep learning, machine learning, generative AI, and any field that relies on efficient and scalable optimization.