「(i) verifiable tasks use programmatic equivalence on final answers; (ii) non-verifiable tasks use self-proposed rubrics—binary, auditable criteria scored by an independent LLM judge, with reward given by the fraction satisfied.」と検証困難なタスクにも効果があるのが興味深い。「CaT can be applied at test time for inference-time gains or inside RL (CaT-RL) to improve the policy.」とのこと。強化学習でも効果を確認している。