staka – ページ 46 – arXiv最新論文の紹介

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs [111.7]
ミキチャー・オブ・エキスパート(MoE)と1兆近いパラメータを持つ疎大言語モデル(LLM)が、最も有能な言語モデルの領域を支配している。本稿では,Ascend NPU上でそのようなスケールを利用するレシピを明らかにすることを目的としている。主な目的は、動的スパースモデル構造下でのコンピューティングリソースのより良い使用と、実際のハードウェアで期待されるパフォーマンス向上の実現である。
論文参考訳（メタデータ） (Wed, 07 May 2025 15:46:36 GMT)
Llama 4, Nemotron-H, Pangu Ultra, Kimi-VL, Kimi-VL-Thinking, Deep Coder – arXiv最新論文の紹介にも関連するPangu Ultraの主に実装に関する論文。
「Our system optimizations focus on Expert Parallelism and memory management, significantly lowering communication and activation overhead across 6K NPUs. These innovations enable a 30.0% MFU, demonstrating Ascend NPUs’ capability to support full-scale training of large-scale sparse LLMs, e g , Pangu Ultra MoE, with comparable performance as DeepSeek R1.」とのことでNVIDIAのGPUに頼らずとも最先端モデルを構築可能と主張しているように見える。

Teaching Models to Understand (but not Generate) High-risk Data

Teaching Models to Understand (but not Generate) High-risk Data [38.3]
SLUNG(Selective Loss to Understand but not Generate)を紹介する。 SLUNGは、モデルが高リスクデータを生成せずに理解することを学ぶための事前学習パラダイムである。 SLUNGは、生成を増大させることなく、モデルによる高リスクデータの理解を一貫して改善することを示す。
論文参考訳（メタデータ） (Mon, 05 May 2025 22:24:06 GMT)
「This work introduces SLUNG, a pre-training paradigm that enables language models to learn from high-risk data without being trained to generate it. By selectively adjusting the training objective at the token level based on risk, SLUNG decouples a model’s ability to understand from its ability to generate, allowing models to condition on high-risk inputs while learning from adjacent low-risk tokens.」という手法の提案。口外することはできないが学ぶ必要があるもの、というのは現実的に多いわけでこのような手法は非常に面白い。

On Path to Multimodal Generalist: General-Level and General-Bench

On Path to Multimodal Generalist: General-Level and General-Bench [154.0]
本稿では,MLLMの性能と汎用性を5段階に定義した評価フレームワークであるGeneral-Levelを紹介する。フレームワークの中核はSynergyの概念であり、モデルが理解と生成をまたいだ一貫性のある機能を維持するかどうかを測定する。既存の100以上のMLLMを含む評価結果は、ジェネラリストの能力ランキングを明らかにする。
論文参考訳（メタデータ） (Wed, 07 May 2025 17:59:32 GMT)
「This leads to a critical question: Can we simply assume that higher performance across tasks indicates a stronger MLLM capability, bringing us closer to human-level AI?」に対する評価フレームワーク。自動運転のような大きく5段階のレベル設定を行っている。現時点では「Our evaluation of over 100 existing top-performing LLM/MLLM systems has uncovered critical insights into their capabilities and rankings as multimodal generalists. The most notable finding is that most MLLMs lack the cross-task or cross-modal synergy ability required for higher-level classifications, with even advanced models like GPT-4V and GPT-4o not achieving top ranks.」とのことだが…
プロジェクトサイトはPath to Multimodal Generalist、リーダーボードはPath to Multimodal Generalist

下記サーベイも注目

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models [79.5]
推論は知性の中心にあり、決定し、結論を導き、ドメインをまたいで一般化する能力を形成する。人工知能において、システムがオープンで不確実でマルチモーダルな環境でますます機能するにつれて、推論は堅牢で適応的な行動を可能にするために不可欠となる。大規模マルチモーダル推論モデル(LMRM)は、テキスト、画像、オーディオ、ビデオなどのモダリティを統合し、複雑な推論機能をサポートする、有望なパラダイムとして登場した。
論文参考訳（メタデータ） (Thu, 08 May 2025 03:35:23 GMT)
リポジトリはGitHub – HITsz-TMG/Awesome-Large-Multimodal-Reasoning-Models: The development and future prospects of multimodal reasoning models.

Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption

Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption [52.0]
そこで本研究では,大規模な言語モデルにおいて,チェーン・オブ・ディフェンシブ・思想と呼ばれる単純な手法を用いて,参照破損に対するロバスト性を大幅に向上したことを示す。特に、メソッドの単純さと適用性を考えると、この改善は驚くべきものです。
論文参考訳（メタデータ） (Tue, 29 Apr 2025 13:50:05 GMT)
「Number the references (if they are not already). 2.Include additional task instructions to firstly identify relevant and reliable contexts. 3. Before responses, insert structured reasoning steps that enunciates the indices of the relevant contexts (Irelevant) and the indices of reliable contexts (Ireliable).」というChain of defensive thoughtの提案
「In particular, we show how a wide range of large language models exhibit significantly improved robustness against reference corruption using a simple method called chain-of-defensive-thought, where only a few exemplars with structured and defensive reasoning are provided as demonstrations.」とのこと。

A Survey of AI Agent Protocols

A Survey of AI Agent Protocols [35.4]
大きな言語モデル(LLM)エージェントが外部ツールやデータソースと通信する標準的な方法はありません。この標準化されたプロトコルの欠如は、エージェントが協力したり、効果的にスケールするのを難しくする。 LLMエージェントの統一通信プロトコルは、これを変更できる。
論文参考訳（メタデータ） (Wed, 23 Apr 2025 14:07:26 GMT)
「In this paper, we provide a systematic overview of existing communication protocols for LLM agents.」とAgent間の通信プロトコルのサーベイ。
様々なモチベーションで設計も様々。

The Rise of Small Language Models in Healthcare: A Comprehensive Survey

The Rise of Small Language Models in Healthcare: A Comprehensive Survey [8.6]
小型言語モデル(SLM)は、次世代医療情報学にスケーラブルで臨床的に実行可能なソリューションを提供する。包括的調査では、医療従事者に対して分類・分類するための分類学的枠組みを提示する。本研究は,医療におけるSLMの変容可能性を明らかにするために,広く研究されているNLPタスクを対象とした実験結果のまとめである。
論文参考訳（メタデータ） (Wed, 23 Apr 2025 22:02:25 GMT)
ヘルスケアにおけるSLMのサーベイ。
リポジトリはGitHub – drmuskangarg/SLMs-in-healthcare: Unlike vanilla contextual pre-trained fundamentally \textit{small} language models (e.g., ClinicalBERT), our interest lies in compressed and optimized approaches for language models in healthcare, developed as a resource-efficient and domain-specialized solution to LLMs.で、かなり多くのモデルが構築されていることが分かる。

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory [0.6]
大規模言語モデル(LLM)は、文脈的に一貫性のある応答を生成する際、顕著な進歩を示した。しかし、それらの固定されたコンテキストウィンドウは、長時間のマルチセッション対話に対する一貫性を維持するための根本的な課題を生じさせる。私たちはMem0というスケーラブルなメモリ中心アーキテクチャを導入し、進行中の会話から健全な情報を動的に抽出し、統合し、取得することでこの問題に対処します。
論文参考訳（メタデータ） (Mon, 28 Apr 2025 01:46:35 GMT)
「(1) Mem0 implements a novel paradigm that extracts, evaluates, and manages salient information from conversations through dedicated modules for memory extraction and updation. The system processes a pair of messages between either two user participants or a user and an assistant. (2) Mem0 extends this foundation by incorporating graph-based memory representations, where memories are stored as directed labeled graphs with entities as nodes and relationships as edges.」というグラフ構造を活用しつつ”記憶”を管理するLLM用メモリアーキテクチャの提案。
プロジェクトサイトはScalable Long-Term Memory for Production AI Agents | Mem0

Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning

Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning [122.7]
Plasticineは、深層強化学習における塑性最適化をベンチマークするためのオープンソースのフレームワークである。 Plasticineは13以上の緩和メソッド、評価メトリクス10、学習シナリオの単一ファイル実装を提供する。
論文参考訳（メタデータ） (Thu, 24 Apr 2025 12:32:13 GMT)
「We introduce Plasticine, the first open-source framework for benchmarking plasticity optimization in deep RL.」というベンチマーク。
- 「plasticity loss, a phenomenon in which neural networks in RL agents gradually lose their ability to adapt and incorporate new information as training progresses (Dohare et al , 2024; Klein et al , 2024), thus significantly impeding the development of truly lifelong learning agents (Lyle and Pascanu, 2024).」
リポジトリはGitHub – RLE-Foundation/Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning.

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models [121.0]
VisuLogicは、6つのカテゴリにまたがる1,000の人間認証された問題のベンチマークです。これらの質問は、複数の視点からMLLMの視覚的推論能力を評価するために評価することができる。ほとんどのモデルは精度が30%以下で、25%のランダムベースラインよりわずかに高く、人間によって達成された51.4%よりはるかに低い。
論文参考訳（メタデータ） (Mon, 21 Apr 2025 17:59:53 GMT)
「We propose a challenging visual reasoning benchmark that is inherently difficult to articulate using language, providing a more rigorous evaluation of the visual reasoning capabilities of MLLMs.」というベンチマークの提案。商用APIのスコアも良くなく、非常に難しいベンチマークになっている。
リポジトリはVisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities

UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities [53.8]
UniversalRAGは異種情報源からの知識を多様さと粒度で検索・統合するための新しいRAGフレームワークである。本稿では,最も適切なモダリティ固有コーパスを動的に識別し,その内部でターゲット検索を行うモダリティ対応ルーティング機構を提案する。複数のモダリティにまたがる8つのベンチマークでUniversalRAGを検証する。
論文参考訳（メタデータ） (Tue, 29 Apr 2025 13:18:58 GMT)
マルチモーダルなRAGに対応するため「UniversalRAG dynamically determines the most suitable knowledge source to retrieve from, based on the modality requirement of the given query, then routes the retrieval process to the corresponding modality-specific corpus.」というアプローチ。ルーターは「Training-free Router（実験ではGPT-4o）」と「Trained Router （実験ではDistilBERT 、T5-Large）」が試されていて平均的にはTrained Routerが優勢に見える。
プロジェクトサイトはUniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities

2025年11月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30