Leonard Wexler, and Trevor Ellington. 2026. “Advancing Mathematical Reasoning Excellence via Self Play Reinforcement Learning Frameworks for Recursive Logic Improvement in Large Language Models”. International Journal of Artificial Intelligence Research 1 (2). https://doi.org/10.66280/ijair.v1i2.155.