Empowering LLMs in Decision Games through Algorithmic Data Synthesis

Empowering LLMs in Decision Games through Algorithmic Data Synthesis [29.1]
意思決定ゲームは、大規模言語モデルの推論能力を評価し、強化するための理想的なサンドボックスとして機能する。データ合成戦略を設計し、2つの古典ゲーム、DoudizhuとGoから広範囲のオフラインデータセットをキュレートする。我々は、このデータをLLMトレーニングに効果的に組み込むための一連の技術を開発し、その結果、Mastermind-Dou と Mastermind-Go という2つの新しいエージェントを生み出した。
論文参考訳（メタデータ） (Tue, 18 Mar 2025 07:30:29 GMT)
一般的に数学やコード生成を対象にLRM化が行われているがこの論文では「Through a suite of our designed techniques in data collection and training, we have developed MasterMind agents, demonstrating commendable performance in both Doudizhu and Go.」とゲームが対象。「Empirical experiments also serve to substantiate the potential of this approach in improving general reasoning capabilities of LLMs.」というのがとても興味深い。人間でいうところの「脳によい〇〇」的なタスクがあるのだろうか。（もっとも性能が落ちるタスクがあることも指摘されているが・・・）
データセットが公開されている。OpenDILabCommunity/MasterMind · Datasets at Hugging Face

コメントを残す

コメントを残す コメントをキャンセル