1.
Frederick Prescott, Samuel Thornton. Optimizing Process Based Reward Models through Reinforcement Learning for Verifiable Multi Step Reasoning in Large Language Model Architectures. IJAIR [Internet]. 2026 May 14 [cited 2026 May 17];1(2). Available from: https://www.isipress.org/index.php/IJAIR/article/view/156