1.
Leonard Wexler, Trevor Ellington. Advancing Mathematical Reasoning Excellence via Self Play Reinforcement Learning Frameworks for Recursive Logic Improvement in Large Language Models. IJAIR [Internet]. 2026 May 14 [cited 2026 May 17];1(2). Available from: https://www.isipress.org/index.php/IJAIR/article/view/155