「We introduce the first publicly available pipeline CLI- Gym for scalable derivation of environment-intensive tasks in agentic coding. • A collection of 1,655 environment-intensive tasks is built from 29 open-source repositories, serving as a good data source for LLM fine-tuning. 」「With a pilot study on fine-tuning with only 291 successful trajectories, we demonstrate highly competitive performance on the Terminal-Bench.」とCLI関連のデータ収集とそれを用いた強化に関する報告。MCPよりもコンテキスト的に有利という指摘もあり注目されているLLM/LRMの強化方法。(ベンチマーク的には意外と厳しい結果になることもしばしばだが・・・)