2025年7月23日 – arXiv最新論文の紹介

Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training

Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training [121.6]
本研究では,長期強化学習が多種多様な推論領域にまたがる小言語モデルに及ぼす影響について検討する。我々は,長期的パフォーマンス向上の鍵となる重要な要素として,制御KL正規化,クリッピング率,定期参照ポリシーリセットを導入する。私たちのモデルは、数学の+14.7%、コーディングの+13.9%、論理パズルの+54.8%など、強力なベースラインよりも大幅に改善されている。
論文参考訳（メタデータ） (Wed, 16 Jul 2025 17:59:24 GMT)
「Our work demonstrates that through careful algorithm design, including decoupled clipping, dynamic sampling, controlled KL regularization, and periodic reference policy resets, even small- scale models can achieve substantial reasoning improvements without the computational demands of larger architectures.」と小規模モデルでも有効な強化学習手法の提案。
リポジトリはnvidia/Nemotron-Research-Reasoning-Qwen-1.5B · Hugging Face

Conformal Prediction for Privacy-Preserving Machine Learning [83.9]
AESで暗号化されたMNISTデータセットの変種を用いて、暗号化されたドメインに直接適用しても、コンフォーマル予測法が有効であることを示す。我々の研究は、安全でプライバシーに配慮した学習システムにおける原則的不確実性定量化の基礎を定めている。
論文参考訳（メタデータ） (Sun, 13 Jul 2025 15:29:14 GMT)
「We then assess the same model architecture under encryption. When trained on MNIST images encrypted with a fixed key and initialization vector (AES encryption; see Section 3), the model attains an average training accuracy of 39.48% and a test accuracy of 36.88%.」って本当なんだろうか…「In contrast, training the same model on the MNIST dataset with randomized encryption per sample (a unique key per image) results in a test accuracy of 9.56%, indistinguishable from random guessing.」と記載されているということはleakとかではなさそうだが。。。キーとIVが固定とはいえ、結構驚きがある。

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality [108.9]
ビデオ生成モデルは5～16秒間のビデオしか生成できないが、しばしば「ロングフォームビデオ」とラベル付けされる。 16秒を超えるビデオは、物語全体を通して一貫したキャラクターの外観とシーンレイアウトを維持するのに苦労する。近年の研究では、複数のキャラクター、物語のコヒーレンス、高忠実度の詳細を特徴とする長編ビデオの制作が試みられている。
論文参考訳（メタデータ） (Wed, 09 Jul 2025 18:20:33 GMT)
一貫した長い動画を生成するための手法等のサーベイ

Probing for Arithmetic Errors in Language Models [86.8]
言語モデルの内部アクティベーションは、算術誤差を検出するために使用できる。単純なプローブはモデルが予測した出力と正解の両方を隠蔽状態から正確に復号できることを示す。モデル精度を90%以上の精度で予測する軽量エラー検出器を訓練する。
論文参考訳（メタデータ） (Wed, 16 Jul 2025 16:27:50 GMT)
「Starting with a controlled set- ting of 3-digit addition, we show that simple probes can accurately decode both the model’s predicted output and the correct an- swer from hidden states, regardless of whether the model’s output is correct.」はまぁできるだろうとして、「We then extend this analysis to a more complex setting, where the model is asked to solve math word problems only requiring addition (Cobbe et al , 2021) using a structured chain-of-thought (CoT) format (Wei et al , 2022), in which intermediate steps are expressed as equations (e g , <a+b=c>). Remarkably, we find that the same probes trained on simple arithmetic queries can be applied directly to this setting, maintaining over 80% accuracy in detecting whether the model is producing correct intermediate results.」やself correlationに役立ったりは面白い結果。