Optimization | Mohammad Fili

Multi-Fractional Brownian Motion and SGD

Mon, 01 Jan 0001 00:00:00 +0000

Understanding why deep learning optimization works â€” not just that it works â€” is one of the fundamental open questions in machine learning theory. A promising theoretical approach models Stochastic Gradient Descent not as a deterministic algorithm with random perturbations, but as the discretization of a continuous stochastic process. Previous work proposed that SGD can be viewed as a stochastic differential equation driven by fractional Brownian motion (FBM), a process characterized by a single Hurst parameter that governs its memory structure â€” how strongly past behavior influences the future.

Our investigation revealed that this model is incomplete. When we fit FBM to SGD trajectories, the Hurst parameter is not constant â€” it changes over the course of training. Early in optimization, the dynamics exhibit one type of memory structure; later, as the algorithm approaches a solution, the structure shifts. This means that FBM, which assumes a fixed Hurst parameter, is an inadequate model for the full training process.

The natural generalization is multi-fractional Brownian motion (mFBM), in which the Hurst parameter is itself a function of time. Our finding â€” that the Hurst parameter is time-dependent in SGD â€” suggests that mFBM may serve as a more suitable theoretical framework for understanding the dynamics of deep learning optimization. This has implications for algorithm design: if we understand how the memory structure of optimization evolves during training, we can potentially design algorithms that adapt their exploration strategy to match.

We plan to prepare a proposal to investigate this line of research in depth.

Multi-Modal Delivery Planning

Mon, 01 Jan 0001 00:00:00 +0000

Last-mile delivery â€” the final leg of a package’s journey from warehouse to doorstep â€” accounts for a disproportionate share of logistics costs and environmental impact. Emerging technologies are reshaping this landscape: autonomous delivery robots can handle short-range deliveries at low cost, crowdsourced delivery networks offer flexible capacity, and traditional trucks provide reliable backbone coverage. The optimization challenge is to combine these heterogeneous resources â€” each with different range, capacity, speed, cost, and energy constraints â€” into an integrated system that minimizes cost while meeting service requirements.

This project develops mixed-integer linear programming (MILP) models for multi-modal delivery planning using combinations of trucks, robots, and crowdsourced delivery. A particular focus is the practical constraints that make these problems hard: battery limitations for robots (including the logistics of battery swapping), transshipment points where packages transfer between vehicle types, and the multi-objective nature of the problem (minimizing cost, time, and environmental impact simultaneously).

We developed a mathematical model for an efficient two-tiered truck-robot delivery system, presented at the IEEE Conference on Service Operations and Logistics and Informatics (SOLI, 2024). A follow-up study compared different MILP formulations for handling transshipments and battery swapping, presented at the IISE Annual Conference (2025). While the primary application domain is commercial logistics, the modeling framework has potential applications in healthcare delivery â€” medication distribution, sample transport, and equipment logistics â€” where similar multi-modal coordination challenges arise.

Stochastic Optimization

Mon, 01 Jan 0001 00:00:00 +0000

At the heart of nearly every modern AI system is an optimization algorithm trying to find the best parameters in a vast, complex landscape full of ridges, valleys, and deceptive local minima. Stochastic Gradient Descent (SGD) and its variants have become the workhorses of deep learning, but they remain fundamentally limited: in high-dimensional, non-convex loss landscapes, these algorithms frequently converge to local minima that are far from the global optimum. Escaping these traps requires some form of perturbation â€” but not all perturbations are created equal.

This project, conducted in collaboration with Dr. Farzad Sabzikar from the Statistics department at Iowa State University, introduces a noise-injection optimization framework based on correlated heavy-tailed tempered fractional noise. The key insight is that the statistical properties of the injected noise â€” its heaviness of tail, its temporal correlation structure, and its tempering â€” should be matched to the structure of the optimization landscape. Heavy tails allow occasional large jumps that can escape deep local minima. Temporal correlation ensures that the perturbations are not simply random shocks but have directional persistence. Tempering controls the tail behavior to maintain mathematical tractability.

Preliminary experiments demonstrate that this framework effectively escapes local minima in cases where other advanced algorithms â€” including fractional perturbed gradient descent â€” become trapped. We have submitted a proposal to NSF’s Mathematical Foundations of Artificial Intelligence (MFAI) program to pursue this work, and the manuscript is in preparation for JMLR.

Any improvement in optimization convergence that yields better solutions with fewer computational resources has implications far beyond the specific application: it benefits the entire community of researchers and practitioners working in deep learning, machine learning, generative AI, and any field that relies on efficient and scalable optimization.