「This paper introduces DISCIPL, a method for “self-steering” LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models.」というアプローチの紹介。
「By decomposing reasoning into planning and execution, our architecture preserves flexibility while enabling orchestration of highly efficient, parallel search patterns.」というのは経験的にも有効そうに思う。検証がしっかりされているのはありがたい。