GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning [107.0]
大規模言語モデル（LLM）の適応には、強化学習（RL）手法が普及しており、特にGroup Relative Policy Optimization（GRPO）などが用いられます。しかし、これらの手法は数万回のロールアウトを必要とし、非効率的です。そこで提案されたGEPA（Genetic-Pareto）は、自然言語を活用して試行錯誤から高レベルのルールを学び、少数のロールアウトで質的な向上を達成し、GRPOやMIPROv2を平均して10%以上上回る性能を示しました。
論文参考訳（メタデータ） (Fri, 25 Jul 2025 17:42:32 GMT)
「We introduced GEPA, a novel prompt optimizer for arbitrary LLM agents and workflows. GEPA leverages reflective prompt evolution and Pareto-based selection, showing superior sample efficiency compared to reinforcement learning (GRPO) alongside robust generalization, while outperforming leading prompt optimizers (MIPROv2).」、プロンプトチューニングを行うアプローチとGRPOを比べられるのかという疑問はありつつ、他のチューニング手法よりも性能が高いとのこと。

コメントを残す

コメントを残す コメントをキャンセル