ReTool

To benefit the broader research community, we will open-source the recipe of ReTool soon, including algorithm details, model weights, dataset and code. We utilize verl to perform RL training.

Datasets

We provide training and validation datasets for Retool training.

Cold-Start: ReTool-SFT
RL Training: DAPO-Math-17k by DAPO, thanks for their great work!
RL Validation: AIME 2024, AIME 2025

Verifier

We adopt the same rule-based verifier with DAPO via string normalization and matching.

Models & Scripts

We provide the model weights of ReTool-Qwen-32B and ReTool-DeepSeek-R1-Distill-Qwen-32B, which are trained based on Qwen2.5-32B-Instruct and DeepSeek-R1-Distill-Qwen-32B. The inference framework can be found in code.

Citation

If you find our project helpful, please cite:

@misc{feng2025retoolreinforcementlearningstrategic,
title={ReTool: Reinforcement Learning for Strategic Tool Use in LLMs}, 
author={Jiazhan Feng and Shijue Huang and Xingwei Qu and Ge Zhang and Yujia Qin and Baoquan Zhong and Chengquan Jiang and Jinxin Chi and Wanjun Zhong},
year={2025},
eprint={2504.11536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.11536}, 
}

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Introducing ReTool

Open-Source

Datasets

Verifier

Models & Scripts

Citation