Harvard & MIT Reinforcement Learning Thesis - Philip Ndikum (Harvard & MIT, 2024)

‍

Date Published: May 2024.

This research presented a robotics and mathematically-inspired approach to Deep Reinforcement Learning (DRL) for options trading and investing, a domain that aligns closely with optimal stopping theory in classical economics and decision-making concepts in psychology. By adopting multi-agent DRL philosophies, the study explored Sim-to-Real (simulation-to-reality) techniques to replicate realistic quantitative investment environments. The thesis, "Deep Reinforcement Learning for Quantitative Investing", spanned 150 pages and included over 230 references, synthesizing methodologies from classical mathematical finance, econometrics, robotics, and modern Artificial Intelligence.

Deploying Deep Reinforcement Learning (DRL) models for options trading posed significant computational challenges, requiring substantial resources such as multi-GPU cloud computing to enable optimization, parallelization, and scalability. Managing high-frequency trading demands necessitated careful architectural design to improve convergence times and computational efficiency while ensuring performance across large datasets. The study leveraged Sim-to-Real methodologies—commonly applied in robotics—to create simulated environments for trading agents. While mathematically replicating realistic financial markets remains a complex challenge requiring further research, the simulations demonstrated that multi-agent DRL systems effectively outperformed traditional trading strategies within the options trading domain.

The nature of options trading closely aligns with optimal stopping problems in economics, where agents must decide the optimal time to act under conditions of uncertainty. This mirrors challenges in robotics and decision-making psychology, particularly in managing exploration vs. exploitation trade-offs. By applying DRL to address these sequential decision-making challenges, the research highlights the utility of learning-based methods in environments that demand adaptive strategies under uncertainty.

While the study achieved performance improvements over existing techniques, it also underscored ongoing challenges in explainable AI (XAI) and robustness. For real-world financial deployment—where transparency and trust are critical—further research into model interpretability and reliability remains essential. Lastly, the research demonstrated the value of interdisciplinary integration, combining classical insights from mathematical finance (e.g., options pricing and derivatives), econometrics, and modern AI techniques with methodologies from robotics and mathematical physics. This cross-domain approach emphasizes that solving complex financial decision-making problems under uncertainty benefits from the convergence of tools and perspectives from multiple scientific disciplines.

Conclusion

The research highlights the unique challenges of applying DRL to options trading, particularly through the lens of multi-agent systems and Sim-to-Real methodologies. While significant computational power is necessary to deploy these systems at scale, simulations demonstrated their ability to outperform traditional strategies. Moving forward, replicating realistic trading environments and improving explainability will likely require substantial research efforts. Nonetheless, this study underscores that an interdisciplinary approach—combining AI, robotics, mathematics, and economics—offers the most promising pathway for advancements in the field of quantitative investing.

‍

DEEP REINFORCEMENT LEARNING FOR QUANTITATIVE INVESTING, HARVARD & MIT. PHILIP NDIKUM (2024)

Conclusion