ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

ByteDance Seed
Date: April 15, 2025

Introducing ReTool

In this work, we embrace the RL paradigm and introduce ReTool, a Tool-augmented Reinforcement learning framework explicitly designed to guide LLMs towards optimal strategies for leveraging external computational tools during reasoning. Our comprehensive experiments on AIME2024 and AIME2025 demonstrate that ReTool not only achieves superior accuracy compared to conventional text-based RL approaches, but also converges with significantly fewer training steps.

Figure 1: AIME 2024 & 2025 scores of ReTool and text-based RL baseline on the Qwen2.5-32B-Instruct model. The x-axis represents the training steps.

Open-Source

To benefit the broader research community, we will open-source the recipe of ReTool soon, including algorithm details, model weights, dataset and code. We utilize verl to perform RL training.

Datasets

We provide training and validation datasets for Retool training.

Verifier

We adopt the same rule-based verifier with DAPO via string normalization and matching.

Models & Scripts

We will release the trained model checkpoints and code soon.

Citation

If you find our project helpful, please cite:

@misc{feng2025retoolreinforcementlearningstrategic,
title={ReTool: Reinforcement Learning for Strategic Tool Use in LLMs}, 
author={Jiazhan Feng and Shijue Huang and Xingwei Qu and Ge Zhang and Yujia Qin and Baoquan Zhong and Chengquan Jiang and Jinxin Chi and Wanjun Zhong},
year={2025},
eprint={2504.11536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.11536}, 
}