Leonard Wexler, and Trevor Ellington. “Advancing Mathematical Reasoning Excellence via Self Play Reinforcement Learning Frameworks for Recursive Logic Improvement in Large Language Models”. International Journal of Artificial Intelligence Research, vol. 1, no. 2, May 2026, doi:10.66280/ijair.v1i2.155.