Leonard Wexler, & Trevor Ellington. (2026). Advancing Mathematical Reasoning Excellence via Self Play Reinforcement Learning Frameworks for Recursive Logic Improvement in Large Language Models. International Journal of Artificial Intelligence Research, 1(2). https://doi.org/10.66280/ijair.v1i2.155