LEONARD WEXLER; TREVOR ELLINGTON. Advancing Mathematical Reasoning Excellence via Self Play Reinforcement Learning Frameworks for Recursive Logic Improvement in Large Language Models. International Journal of Artificial Intelligence Research, [S. l.], v. 1, n. 2, 2026. DOI: 10.66280/ijair.v1i2.155. Disponível em: https://www.isipress.org/index.php/IJAIR/article/view/155. Acesso em: 17 may. 2026.