arXiv – ページ 21 – arXiv最新論文の紹介

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Artificial Hippocampus Networks for Efficient Long-Context Modeling [17.2]
ロングシーケンス・モデリングは、RNNのようなモデルにおける圧縮固定サイズメモリの効率と、注目ベースのトランスフォーマーにおけるメモリの増大の忠実さとのトレードオフに直面している。認知科学における多段階モデルに着想を得て,人工ニューラルネットワークのメモリフレームワークを導入する。長文ベンチマークのLV-EvalとInfiniteBenchの実験は、AHN拡張モデルがスライディングウインドウベースラインを一貫して上回ることを示した。
論文参考訳（メタデータ） (Wed, 08 Oct 2025 17:59:55 GMT)
「AHNs address the efficiency limitation of standard transformers by maintaining a sliding window of KV cache as lossless memory while transforming out-of-window information into a fixed-size compressed memory This approach enables AHN-augmented models to achieve constant memory and computational complexity per token over long sequences. Experiments」と長文に強い構造の提案。
リポジトリはGitHub – ByteDance-Seed/AHN: AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails [103.1]
本稿では,自己進化型大規模言語モデル(LLM)エージェントに特有の,アライメント・ティッピング・プロセス(ATP)を同定する。 ATPは、連続的な相互作用によってエージェントが訓練中に確立されたアライメント制約を放棄し、強化された自己関心の戦略を支持するときに生じる。実験の結果、アライメントの利点は自己進化の下で急速に低下し、最初は整合性のない状態に収束したモデルであることが判明した。
論文参考訳（メタデータ） (Mon, 06 Oct 2025 14:48:39 GMT)
「Our research reveals a critical vulnerability in self-evolving LLM agents, which we term the “Alignment Tipping Process” (ATP), a phenomenon where an agent’s policy suddenly shifts from human- aligned objectives to self-serving, locally optimal behaviors. Driven either by an individual agent’s self-interested exploration or by the imitative diffusion of strategies within a group, our experiments consistently demonstrate that alignment is not a static property, but rather a fragile state actively eroded by experience.」と自己進化型エージェントでのリスクを指摘。最近出た250例程度のPoisoning Attackが有効という報告（下記）も関連し、意外とこの手の攻撃が容易そうに思える。
リポジトリはGitHub – aiming-lab/ATP: Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples [81.7]
この研究は、データセットのサイズに関わらず、毒殺攻撃がほぼ一定数のドキュメントを必要とすることを初めて実証した。 250の有毒なドキュメントも同様に、すべてのモデルとデータセットサイズにわたってモデルを妥協している。以上の結果から,データ中毒によるバックドア注入は,従来考えられていたよりも大型モデルの方が容易である可能性が示唆された。
論文参考訳（メタデータ） (Wed, 08 Oct 2025 16:25:05 GMT)
データポイゾニングが意外と容易にできるとの報告。

In-Context Clustering with Large Language Models

In-Context Clustering with Large Language Models [50.3]
ICCは、注意機構を通じて入力間の複雑な関係をキャプチャする。事前学習したLLMは、テキスト符号化された数値データに対して、印象的なゼロショットクラスタリング機能を示す。我々の研究は、文脈内学習を教師なしの設定に拡張し、クラスタリングにおけるLLMの有効性と柔軟性を示します。
論文参考訳（メタデータ） (Thu, 09 Oct 2025 17:07:55 GMT)
LLMの内部知識を用いたクラスタリングモデルの提案。fine tuningによって性能を大きく向上させている。軸設定が強力にできるのが素晴らしい。
プロジェクトサイトはIn-Context Clustering

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models [18.8]
ACE(Agentic Context Engineering)は、コンテキストを進化するプレイブックとして扱うフレームワークである。エージェントとドメイン固有のベンチマークを通じて、ACEは一貫して強力なベースラインを上回っている。 ACEは、ラベル付けされた監視なしに効果的に適応することができ、代わりに自然な実行フィードバックを活用することができる。
論文参考訳（メタデータ） (Mon, 06 Oct 2025 09:30:18 GMT)
「We present ACE (Agentic Context Engineering), a framework for scalable and efficient context adaptation in both offline (e g , system prompt optimization) and online (e g , test-time memory adaptation) scenarios. Instead of condensing knowledge into terse summaries or static instructions, ACE treats contexts as evolving playbooks that continuously accumulate, refine, and organize strategies over time.」とこちらもコンテキストを記憶のように使い自己改善するアプローチに見える。

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs [64.3]
最近のLong-Context Language Modelsは、1つのプロンプトで数十万のトークンを処理することができる。我々は、従来の問題解決トレースから導かれた再利用可能な思考キャッシュとして、推論をリキャストする。本稿では,自然言語フィードバックによって学習データから得られるテンプレートを反復的に洗練する更新戦略を提案する。
論文参考訳（メタデータ） (Wed, 08 Oct 2025 19:52:35 GMT)
「Thought Template Augmented LCLMs (TOTAL), that equips long- context models with reusable reasoning patterns and iteratively refines them through natural language feedback.」というアプローチの提案。ロングコンテキストをうまく使う記憶というイメージだろうか。
リポジトリはhttps://github.com/starsuzi/ToTALとのことだが現時点では404

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards [80.8]
自己進化(Self-evolution)は、大規模言語モデル(LLM)ベースのエージェントが事前トレーニング後の能力を継続的に改善できるようにする上で、中心的な研究トピックである。エージェントがエージェント間相互作用から学習することで自律的に改善できる新しいフレームワークであるCo-Evolving Multi-Agent Systems (CoMAS)を紹介する。
論文参考訳（メタデータ） (Thu, 09 Oct 2025 17:50:26 GMT)
外部の報酬に頼らない自己進化のアプローチ、「As a new paradigm for self-evolution, CoMAS offers several distinct advantages: (1) It generates reward signals intrinsically from agent interactions, eliminating the need for verifiers or reward models. (2) The learning paradigm is generally effective for various tasks, including open-ended problems where solutions cannot be easily verified. (3) Agents are trained in a decentralized manner, allowing for co-evolution of heterogeneous systems without the bottleneck of a shared model. (4) It fosters skills that transfer to out-of-domain tasks and diverse multi-agent collaboration settings.」とのこと。
リポジトリはGitHub – xxyQwQ/CoMAS: Implementation for the paper “CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards”.

Gemini 2.5 Computer Use, OpenAI Dev Day, RWKV-8, Mamba3

先週の注目ニュースはGemini 2.5 computer use（Introducing the Gemini 2.5 Computer Use model）、OpenAI Dev Dayの様々なサービスの発表（個人的に注目はApps SDK、Agents – OpenAI API、OpenAI Guardrails Python）だった。各社基盤モデルだけでなくビジネスの領域に踏み込んでくる感は継続している。

アーキテクチャ面だとRWKV-8の順調そうな投稿（XユーザーのBlinkDLさん: 「The new mechanism in RWKV-8 “Heron” 🪶 is named ROSA (acronym, note SA ≠ Self-Attention here) 🌹 ROSA is compromise-free: we get efficient, scalable, genuine infinite ctx, by applying some beautiful algorithms. https://t.co/meM1MRtIhI」 / X、XユーザーのBlinkDLさん: 「RWKV-8 ROSA 🌹 mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI 🌌 https://t.co/kAcc7YfKeo」 / X）、Mamba3（著者不明だがMamba-3: Improved Sequence Modeling using State Space Principles | OpenReview）にも注目という感じ。SSMとTransformerハイブリッドの小型推論モデル、ai21labs/AI21-Jamba-Reasoning-3B · Hugging Faceも高性能そうでSSMの発展には期待が大きい。

毎年恒例の🪩 The State of AI Report 2025 🪩をみつつ（一部微妙な記載もあるが）研究の進展が速いのと、応用領域が広がっていることを感じている。International Astronomy & Astrophysics OlympiadでLLMが好成績をおさめる報告も興味深い。

Large Language Models Achieve Gold Medal Performance at International Astronomy & Astrophysics Olympiad [43.5]
我々は,国際天文学・天体物理学試験(IOAA)において,5つの大きな言語モデル(LLM)をベンチマークした。平均スコアは85.6%、84.2%で、ジェミニ2.5 ProとGPT-5は4つのIOAA理論試験で200-300人中上位2位にランクインした。 GPT-5は88.5%のスコアで試験に合格しており、最新の4つのIOAAの参加者の中ではトップ10にランクインしている。
論文参考訳（メタデータ） (Mon, 06 Oct 2025 16:58:47 GMT)

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models [158.2]
現在の大規模言語モデル (LLM) と音声言語モデル (SLM) は、ユーザがターンを終えた後にのみ、思考と行動を取る。これにより、モデルがユーザのターン中に対話するのを防ぎ、考えるのを待つ間、レスポンスのレイテンシが高くなります。 SHANKSは,ユーザ入力を聴きながら,無意味な連鎖推論をSLMが生成できるフレームワークである。
論文参考訳（メタデータ） (Wed, 08 Oct 2025 11:48:59 GMT)
「a general framework for SLMs that enables thinking while listening. To the best of our knowledge, we are the first to explore generating unspoken CoT reasoning when the user is still speaking.」とユーザ入力を受けながら同時に考えるフレームワークの提案。同時通訳のみならず応用領域が広そう。
リポジトリはSHANKS (シャンクス)

Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning

Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning [85.7]
本稿では,言語一貫性報酬と言語間思考アライメント報酬によって訓練されたM-Thinkerを提案する。 M-Thinkerは2つのマルチ言語ベンチマークで100%近い言語一貫性と優れたパフォーマンスを達成する。
論文参考訳（メタデータ） (Wed, 08 Oct 2025 17:55:02 GMT)
「We propose M-Thinker, which both achieves the input-output language consistency with a Language Consistency reward and enhances the multilingual reasoning performance with a Cross-lingual Thinking Alignment reward.」と入力・思考・出力で言語を一致させる手法の提案。性能向上につながる場合もありそうなのが興味深い。
リポジトリはGitHub – XZhang00/M-Thinker: Code for “Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning”.

Less is More: Recursive Reasoning with Tiny Networks

Less is More: Recursive Reasoning with Tiny Networks [6.3]
階層推論モデル(Hierarchical Reasoning Model, HRM)は、異なる周波数で再帰する2つの小さなニューラルネットワークを用いた新しいアプローチである。小型ネットワークの難題を解決するために,Tiny Recursive Model (TRM)を提案する。 TRMはARC-AGI-1で45%、ARC-AGI-2で8%の精度を達成した。
論文参考訳（メタデータ） (Mon, 06 Oct 2025 14:58:08 GMT)
特化型の推論モデルの提案、ARC-AGIと数独で効果を検証。
「Contrary to the Hierarchical Reasoning Model (HRM), TRM requires no fixed-point theorem, no complex biological justifications, and no hierarchy.」という記載が面白い。

月	火	水	木	金	土	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31