2025年11月3日 – arXiv最新論文の紹介

MiniMax M2, Kimi-Linear, Ling-V2, Ouro, Emu3.5, gpt-oss-safeguard

先週は公開モデルの話題が多く、その中でもMiniMax-M2 とKimi-Linearは要注目。特に後者は効率性も高い。先週のRingとややこしいが、Ling-V2も強力なモデルである（This report focuses on three reflex-grade non-thinking (instruct) models in the Ling 2.0 family—Ling-mini-2.0, Ling-flash-2.0, and Ling-1T. These models emphasize general reasoning and instruction-following capability, while the Ring series (Ling-Team, 2025), built upon the same Ling 2.0 base, extends toward deep thinking models.とのこと）。また、小型モデルであるOuro-2.6B 、Ouro-2.6B-Thinkingも興味深かった。

上記とは異なるがマルチモーダルなEmu3.5、分類タスク（safety classification tasks）用のgpt-oss-safeguardなど強力なモデルが公開されるのは良いことだと思う。（最後の例は想定活用例が他とはだいぶ異なりそうではあるが。。）

Kimi Linear: An Expressive, Efficient Attention Architecture [75.9]
Kimi Linearはハイブリッドな線形アテンションアーキテクチャで、初めて、公正な比較で完全にアテンションを上回ります。中心となるKimi Delta Attention (KDA)は、Gated DeltaNetを拡張した表現力のある線形アテンションモジュールである。我々は,Kimi Linearがより優れた性能と効率で十分な注意を払って,ドロップインで置き換えられることを示す。
論文参考訳（メタデータ） (Thu, 30 Oct 2025 16:59:43 GMT)
「At its core lies Kimi Delta Attention (KDA), a hardware-efficient linear attention module that extends Gated DeltaNet [111] with a finer-grained gating mechanism. While GDN, similar to Mamba2 [16], employs a coarse head-wise forget gate, KDA introduces a channel-wise variant in which each feature dimension maintains an independent forgetting rate, akin to Gated Linear Attention (GLA) [114]. This fine-grained design enables more precise regulation of the finite-state RNN memory, unlocking the potential of RNN-style models within hybrid architectures.」をハイブリッド構成で活用。
GitHub – MoonshotAI/Kimi-Linear

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation [149.0]
Ling 2.0は、すべてのアクティベーションが推論能力を促進するという原則に基づいて構築された一連の推論指向の言語基盤である。 Ling 2.0は、経験的スケーリング法則によって導かれる、高い分散性、クロススケール一貫性、効率性を強調している。シリーズには、Ling-mini-2.0、Ling-flash-2.0、Ling-1Tの3つの非思考モデルが含まれている。
論文参考訳（メタデータ） (Sat, 25 Oct 2025 01:51:37 GMT)
長いReasoningにフォーカスしたRing-1Tとはことなり、一般的な推論や指示に従う能力にフォーカス
GitHub – inclusionAI/Ling-V2: Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.

Scaling Latent Reasoning via Looped Language Models [109.6]
事前学習されたループ言語モデル(LoopLM)のファミリーであるOuroを提示し、オープンソース化する。 Ouro は (i) 潜時空間における反復計算, (ii) 学習深度割り当てのためのエントロピー規則化された目的, (iii) 7.7T トークンへのスケーリングによる事前学習段階への推論を構築する。
論文参考訳（メタデータ） (Wed, 29 Oct 2025 17:45:42 GMT)
Looped Language Model (LoopLM) architectureによるモデル構築の報告。「we introduced Ouro, a family of Looped Language Models that demonstrate exceptional parameter efficiency by integrating iterative computation and adaptive depth directly into pre-training on 7.7T tokens. Our 1.4B and 2.6B models consistently match or exceed the performance of 4B and 8B standard transformers, showcasing a 2-3× efficiency gain.」と非常に効率が高い。
Ouro: Looped Language Models

Parallel Loop Transformer for Efficient Test-Time Computation Scaling [34.8]
大規模言語モデル(LLM)は強力だが、推論中に現実世界で使うには遅すぎるしコストもかかる。ループ変換器は、複数の計算ステップで同じ重みを再利用することでパラメータを節約する。ループが次々と実行され、各追加ループで推論遅延とメモリ要求が増大する。
論文参考訳（メタデータ） (Tue, 28 Oct 2025 15:35:50 GMT)
こちらは並列のParallel Loop Transformer (PLT)

Emu3.5: Native Multimodal Models are World Learners [65.9]
Emu3.5は大規模マルチモーダル世界モデルで、視覚と言語をまたいだ次の状態をネイティブに予測する。 Emu3.5は、視覚言語間のインターリーブデータのコーパスに基づいて、一貫した次トーケン予測目標を持つ、エンドツーエンドで事前訓練された。それは、一貫した世界探索とオープンワールドの具体的操作を可能にする、一般化可能な世界モデリング能力を示す。
論文参考訳（メタデータ） (Thu, 30 Oct 2025 15:11:16 GMT)
Emuシリーズ（Emu3: Next-Token Prediction is All You Need – arXiv最新論文の紹介）の最新版。「Emu3.5 further exhibits generalizable worldmodeling abilities encompassing world exploration and embodied manipulation, enabling controllable interaction, free-form navigation, and dynamic scene simulation across both real and imagined environments. We carefully evaluate these new capabilities and demonstrate clear superiority of Emu3.5, a single 32B unified model, over the closed-source Gemini 2.5 Flash Image [91].」とのこと。
emu.world/pages/web/landingPage、GitHub – baaivision/Emu3.5: Native Multimodal Models are World Learners

The Era of Agentic Organization: Learning to Organize with Language Models

The Era of Agentic Organization: Learning to Organize with Language Models [107.4]
我々は,非同期思考(AsyncThink)を大規模言語モデルを用いた推論の新しいパラダイムとして紹介する。実験では、AsyncThinkは並列思考に比べて28%低い推論遅延を実現している。 AsyncThinkは学習した非同期思考機能を一般化し、未確認タスクを追加のトレーニングなしで効果的に処理する。
論文参考訳（メタデータ） (Thu, 30 Oct 2025 16:25:10 GMT)
マルチエージェントのように非同期処理を行えるフレームワーク。「In this work, we introduce asynchronous thinking (AsyncThink) as a new paradigm for reasoning with large language models, with the goal of learning to organize the internal thinking into con- currently executable structures. Specifically, we propose a thinking protocol where an LLM plays both roles: an organizer that dynamically structures the process through Fork and Join actions, and workers that execute sub-queries and return intermediate knowledge or results.」
プロジェクトサイトはAdvancing AI for Humanity

A Survey of AI Scientists: Surveying the automatic Scientists and Research

A Survey of AI Scientists: Surveying the automatic Scientists and Research [34.9]
人工知能は、計算機器から科学知識の自律的創始者へと大きく移行している。本調査では, エンド・ツー・エンドの科学的プロセスを, 文献レビュー, イデオロギー生成, 実験準備, 実験実施, 科学著作, 論文生成に分解する, 統合された6段階の方法論的枠組みを紹介する。
論文参考訳（メタデータ） (Mon, 27 Oct 2025 06:13:21 GMT)
「This survey provides a systematic and comprehensive synthesis of this emerging domain by introducing a unified, six-stage methodological framework that deconstructs the scientific process into: Literature Review, Idea Generation, Experimental Preparation, Experimental Execution, Scientific Writing, and Paper Generation. Through this analytical lens, we systematically map and analyze dozens of seminal works from 2022 to late 2025, revealing a clear three-phase evolutionary trajectory.」と科学へのAI活用に関するサーベイ。
リポジトリはGitHub – Mr-Tieguigui/Survey-for-AI-Scientist: A comprehensive survey for AI Scientist.

Tongyi DeepResearch Technical Report

Tongyi DeepResearch Technical Report [109.8]
Tongyi DeepResearchは、自律的な深層研究機関にインセンティブを与えるため、エンドツーエンドのトレーニングフレームワークを通じて開発されている。 Tongyi DeepResearchは合計35億のパラメータを達成している。私たちは、コミュニティを強化するためのモデル、フレームワーク、完全なソリューションをオープンソースにしています。
論文参考訳（メタデータ） (Tue, 28 Oct 2025 17:53:02 GMT)
「Tongyi DeepResearch establishes a new state-of-the-art with substantially fewer parameters, comprising a total of 30.5 billion parameters while activating only 3.3 billion per token, building upon the Qwen3- 30B-A3B-Base model (Yang et al , 2025). Empirical evaluations on deep research benchmarks demonstrate the effectiveness of our agent.」と高効率なモデルを活用したDeepResearch、商用環境を上回る性能を主張。
プロジェクトサイトはTongyi DeepResearch: A New Era of Open-Source AI Researchers | Tongyi DeepResearch

2025年11月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30