Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena [126.7]
AI駆動のアノテーションを使ってアリーナの戦いをシミュレートするために設計された、革新的なオフライン戦略であるArena Learningを紹介します。 Arena Learningは、オフラインシミュレーションとオンラインコンペティションの正確な評価と一貫性を保証する。ターゲットモデルであるWizardLM-$beta$をトレーニングするためにArena Learningを適用し、大幅なパフォーマンス向上を示します。
論文参考訳（メタデータ） (Mon, 15 Jul 2024 11:26:07 GMT)
ChatBot Arenaの評価を再現する環境をAIで実現、「This paper introduces Arena Learning, a simulated offline chatbot arena that utilizes AI LLMs to bypass the manual and time-intensive cost typically associated with preparing the arena battle data, while preserving the core advantages of the arena-based evaluation and training.」、「Furthermore, the model trained iteratively on synthetic data generated by Arena Learning exhibits significant performance improvements using various training strategies.」とのこと。
自己改善、合成データ活用の文脈でも非常に興味深い。

AgentInstruct: Toward Generative Teaching with Agentic Flows [12.2]
我々は、ポストトレーニングに合成データを使うこと、特に、他のモデルに新しいスキルや振る舞いを教えるために、強力なモデルでデータを作成することに重点を置いている。本稿では,多種多様な高品質な合成データを自動生成するエージェントフレームワークであるAgentInstructを紹介する。テキスト編集,創造的執筆,ツール使用,コーディング,理解の理解など,さまざまなスキルを学習するための,2500万対のポストトレーニングデータセットを作成することで,AgentInstructの有用性を実証する。
論文参考訳（メタデータ） (Wed, 03 Jul 2024 21:01:12 GMT)
上記とは異なりAgenticなデータ合成アプローチも有望。

コメントを残す

コメントを残す コメントをキャンセル