staka – ページ 7 – arXiv最新論文の紹介

Checklists Are Better Than Reward Models For Aligning Language Models

Checklists Are Better Than Reward Models For Aligning Language Models [99.2]
チェックリストフィードバックからの強化学習(RLCF)を提案する。指示からチェックリストを抽出し,各項目の応答がどの程度満足するかを評価する。これらのスコアをAI判断器と特殊検証器プログラムの両方を用いて組み合わせ、RLの報酬を計算する。
論文参考訳（メタデータ） (Thu, 24 Jul 2025 17:58:00 GMT)
「”how can we grade responses to instructions in a manner that is automatic (requires no human annotation), flexible (considers all aspects of response quality), intuitive (aligned with perceptible differences in responses), and applicable to any instruction or response, to enable more effective use of RL in language model alignment?” 」に対してチェックリスト生成とチェックリストを元にしたフィードバックによる強化学習を提案。「From instructions, we extract checklists and evaluate how well responses satisfy each item—using both AI judges and specialized verifier programs—then combine these scores to compute rewards for RL. We compare RLCF with other alignment methods applied to a strong instruction following model (Qwen2.5-7B-Instruct) on five widely-studied benchmarks – RLCF is the only method to improve performance on every benchmark, including a 4-point boost in hard satisfaction rate on FollowBench, a 6-point increase on InFoBench, and a 3-point rise in win rate on Arena-Hard.」と効果を確認。
大規模モデルでチェックリスト生成、それを使って“Reinforcement Learning from Checklist Feedback” (RLCF)と、大規模モデルからの蒸留文脈での効果が大きそうだが性能向上に効果があるのが興味深い。（Limitationにある通り計算コストは高いとのこと）

AlphaGo Moment for Model Architecture Discovery

AlphaGo Moment for Model Architecture Discovery [26.3]
AI研究のための人工超知能の最初の実証であるAII-Archを紹介する。 ASI-Archは完全に自律的なシステムで、AIが独自のアーキテクチャ革新を実行できるようにすることによって制約を揺さぶる。我々は2万時間にわたって1,773個の自律的な実験を行い、106個の革新的なSOTA(State-of-the-art)線形アテンションアーキテクチャを発見しました。
論文参考訳（メタデータ） (Thu, 24 Jul 2025 03:57:27 GMT)
ASIをタイトルに入れた興味深い論文、「ASI-ARCH conducted 1,773 autonomous experiments over 20,000 GPU hours, culminating in the discovery of 106 innovative, state-of-the-art (SOTA) linear attention architectures.」と主張。
- Language Modeling by Language Models – arXiv最新論文の紹介との差異やより実用・大規模なパラメータ・データ・計算コストでの結果が気になる。
- そのうち最近出ていた下記成果のような複合的な効率化まで扱えるようになるのだろうか。
リポジトリはGAIR-NLP/ASI-Arch: AlphaGo Moment for Model Architecture Discovery.、Neural Network Research Data Gallery

Scaling Linear Attention with Sparse State Expansion [58.2]
トランスフォーマーアーキテクチャは、2次計算と線形メモリ成長による長期コンテキストシナリオに苦慮している。本稿では,情報分類として状態更新を概念化し,線形注意のための行スパース更新定式化を提案する。次に、スパースフレームワーク内にスパース状態拡張(SSE)を示し、コンテキスト状態を複数のパーティションに拡張する。
論文参考訳（メタデータ） (Tue, 22 Jul 2025 13:27:31 GMT)

Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance

Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance [39.6]
大規模言語モデル(LLM)エージェントは、しばしばルールや必要なドメイン知識が頻繁に変化する環境で苦労する。テスト時に更新されたドメイン知識を継続的に学習するための適応反射型対話エージェント(ARIA)を提案する。 ARIAはTikTok Pay内にデプロイされ、月間アクティブユーザ数は1億5000万を超えている。
論文参考訳（メタデータ） (Wed, 23 Jul 2025 02:12:32 GMT)
「ARIA addresses conventional model limitations in dynamic environments by as- sessing uncertainty via self-dialogue, soliciting expert corrections, and updating a timestamped, conflict-resolving knowledge base.」と記憶を通じた自己改善を行っていくフレームワークの提案。実際にデプロイされているのがすごい。
リポジトリはyf-he/aria

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra [29.6]
本稿では,エージェント・ベース・モデリングを用いて経済政策を設計・評価する新しい枠組みを提案する。下位レベルでは、有界な労働者エージェントは、テキストベースのユーティリティ関数をテキストで学習するために労働供給を選択する。上位のレベルでは、プランナーエージェントは、現在の連邦政府の括弧に固定された一貫した境界税制を提案するために、文脈内強化学習を採用する。
論文参考訳（メタデータ） (Mon, 21 Jul 2025 17:21:14 GMT)
「Our results show that a Llama-3 model can (i) recover the Mirrleesian trade-off between equity and efficiency, (ii) approach Saez-optimal schedules in heterogeneous settings where analytical formulas are unavailable, and (iii) reproduce political phenomena—such as majority exploitation and welfare-enhancing leader turnover—without any hand-crafted rules. Taken together, the experiments suggest that large language models can serve as tractable test beds for policy design long before real-world deployment, providing a bridge between modern generative AI and classical economic theory.」とのこと。LLM basedなマルチエージェントシミュレーションとして興味深い結果であるのと、（凝ったアプローチのように見えるが）Llama-3.1-8B-InstructでOKというのが若干驚き。
リポジトリはsethkarten/LLM-Economist: Official repository of the 2025 paper, LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra.

FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale [91.8]
FourCastNet 3は、確率的アンサンブル予測にスケーラブルで幾何学的な機械学習(ML)アプローチを実装することで、グローバルな気象モデリングを推進している。 FourCastNet 3は、従来のアンサンブルモデルを上回る予測精度を提供し、最良の拡散ベースのメソッドに匹敵する。その計算効率、中距離確率的スキル、スペクトルの忠実度、およびサブシーズンタイムスケールでのロールアウト安定性は、大規模なアンサンブル予測を通じて気象予知と早期警報システムを改善するための強力な候補となる。
論文参考訳（メタデータ） (Wed, 16 Jul 2025 11:22:18 GMT)
機械学習ベースの気象予測。
- リポジトリはGitHub – NVIDIA/makani: Massively parallel training of machine-learning based weather and climate models

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes [42.3]
EXAONE 4.0は、EXAONE 3.5の優れた使いやすさとEXAONE Deepの高度な推論能力の両方を達成するために、非推論モードと推論モードを統合している。 EXAONE 4.0シリーズは、高性能に最適化された中型32Bモデルと、オンデバイスアプリケーション用に設計された小型1.2Bモデルである。
論文参考訳（メタデータ） (Tue, 15 Jul 2025 15:24:51 GMT)
LLM/LRMハイブリッドなLGのモデル。「Unified Mode Training In the combined dataset, the NON-REASONING data primarily consists of diverse tasks, while the REASONING data is centered on Math and Code domains. Rather than fine-tuning the two modes sequentially, we combine both modes and train them together.」とのこと。構築過程の「After unified NON-REASONING/REASONING mode fine-tuning, to address domain imbalance, we perform a second round of training using high-quality REASONING data from the Code and Tool Use domains, reusing these samples to further enhance the performance.」が興味深い。
リポジトリはLGAI-EXAONE (LG AI Research)

A Survey on Latent Reasoning

A Survey on Latent Reasoning [100.5]
大きな言語モデル(LLM)は印象的な推論機能を示している。中間ステップを言語化するCoT推論は、モデルの表現帯域幅を制限する。潜在的推論は、モデルの連続的な隠れ状態に完全にマルチステップの推論を実行することで、このボトルネックに対処する。
論文参考訳（メタデータ） (Tue, 08 Jul 2025 17:29:07 GMT)
Latent CoT （Unlike traditional CoT reasoning that generates explicit textual intermediate steps, latent CoT methods perform reasoning through continuous representations and hidden states within the model’s computational graph.）など推論過程を表に出すことなく潜在空間で行うタイプの手法に関するサーベイ。
プロジェクトサイトはGitHub – multimodal-art-projection/LatentCoT-Horizon: 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation [50.0]
本研究では,Mixture-of-Recursions (MoR)を導入した。 MoRはパラメータ効率を達成するために再帰ステップをまたいだ共有レイヤのスタックを再利用し、軽量ルータは適応トークンレベルの思考を可能にする。また、KVペアを最初の再帰から再利用するKV共有変種を提案し、特にプリフィルレイテンシとメモリフットプリントの削減を図っている。
論文参考訳（メタデータ） (Mon, 14 Jul 2025 17:49:00 GMT)
「We propose Mixture-of-Recursions (MoR)—a framework that dynamically adjusts recursion step for each token during pretraining and inference. The core of MoR lies in two components: a routing mechanism that assigns token-specific recursion steps to adaptively concentrate computation on more challenging tokens, and a KV caching strategy that defines how KV pairs are stored and selectively utilized for attention at each recursive step.」という構造の提案。「MoR consistently outperforms recursive baselines and matches or exceeds the standard Transformers at larger scales, despite using significantly fewer parameters (approximately one-third due to layer tying with 𝑁𝑅= 3).」とのこと。
リポジトリはGitHub – raymin0223/mixture_of_recursions: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking

Lizard: An Efficient Linearization Framework for Large Language Models

Lizard: An Efficient Linearization Framework for Large Language Models [100.6]
我々は,事前学習したトランスフォーマーベース大規模言語モデル(LLM)を,無限コンテキスト生成のための柔軟性のあるサブクワッドアーキテクチャに変換する線形化フレームワークであるLizardを提案する。 Lizardは、出力品質を保ちながらソフトマックスアテンションを正確に近似するサブクワッドアテンションメカニズムを導入することで、この制限に対処する。そこで本研究では,Lizardが従来の線形化手法を著しく上回りながら,標準言語モデリングタスクにおける教師モデルの性能のほぼ無作為な回復を実現していることを示す。
論文参考訳（メタデータ） (Fri, 11 Jul 2025 21:19:18 GMT)
「Lizard (Linearizing Softmax Attention with Recurrent Gate Dynamics), an efficient framework for linearizing LLMs」の提案。
「We train our model in two stages: (1) an attention approximation stage where the subquadratic modules are trained to mimic softmax attention outputs, and (2) a fine-tuning stage where the linearized model is adapted to downstream language modeling objectives.」と既存モデルを活用していくアプローチで拡張に使用する学習データが少なく、性能劣化も抑えられるとのこと。

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety [85.8]
CoTモニタリングは不完全であり、一部の誤った行動に気づかないままにすることができる。我々は、既存の安全手法とともに、CoT監視可能性とCoT監視への投資についてさらなる研究を推奨する。 CoTの監視性は脆弱である可能性があるので、フロンティアモデル開発者がCoTの監視性に対する開発決定の影響を考慮することを推奨します。
論文参考訳（メタデータ） (Tue, 15 Jul 2025 16:43:41 GMT)
CoT監視可能性に関する検討。できそうに思いつつCoTの実際の例を見ると結構難しそうにも思える。

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31