LLMs Are In-Context Reinforcement Learners – arXiv最新論文の紹介

LLMs Are In-Context Reinforcement Learners [30.2]
大規模言語モデル(LLM)は、コンテキスト内教師あり学習(ICL)を通じて新しいタスクを学習することができる。この研究は、この能力が文脈内強化学習(ICRL)にまで拡張されるかどうかを研究する。本稿では、テスト時間計算の増加と計算バウンド近似により、この欠陥に対処するアルゴリズムを提案する。
論文参考訳（メタデータ） (Mon, 07 Oct 2024 17:45:00 GMT)
「ICRL is a natural combination of ICL and reinforcement learning (RL).Instead of constructing the LLM context from supervised input-output pairs, the LLM context is constructed using triplets consisting of input, model output prediction, and the corresponding rewards.」というスタイルのインコンテキスト強化学習の提案。ナイーブな実装がうまくいかないのが興味深い。「Its poor performance is due to its incapacity to explore the output space.」とのこと。
プロジェクトサイトはLLMs Are In-Context Reinforcement Learners (lil-lab.github.io)

コメントを残す

コメントを残す コメントをキャンセル