Self-Improving Pretraining: using post-trained models to pretrain better models
Self-Improving Pretraining: using post-trained models to pretrain better models [40.2] 本稿では、文書をストリームし、強化学習(RL)を用いて次のK生成トークンを各ステップで改善する新しい事前学習手法を提案する。 実験では, 実効性と安全性の点で標準事前訓練よりも36.2%と18.5%の相対的な改善が得られ, 総生産品質の86.3%まで向上した。 論文参考訳(メタデータ) (Thu, 29 Jan 2026 07:09:30 GMT)
「Our work re-envisions pretraining by using a strong post-trained model to provide superior supervision signals. This works in two ways: (i) by providing rewrites on the original streaming pretrain data; and (ii) by acting as a judge. (i) We showed that such a self-improving setup can improve the factuality, safety and overall generation quality of pretrained models.」というフレームの提案。効果はありそうと思いつつ、これを実行できる研究機関がどれくらいあるかは気になるところ。Discussionの「Going further, there are other aspects of a powerful model one may wish for pretraining to also capture, i.e. other skills! – an obvious one being stronger reasoning ability.」を含めて・・・。