Memory – arXiv最新論文の紹介

MLP Memory: Language Modeling with Retriever-pretrained External Memory

MLP Memory: Language Modeling with Retriever-pretrained External Memory [26.0]
そこで本研究では,事前学習可能な外部メモリを用いてデコーダから切り離すことを提案する。私たちのアーキテクチャは、下流のタスクに強い難易度とパフォーマンスを示します。 3つの幻覚ベンチマークと9つのメモリ集約タスクにおいて優れた性能を示す。
論文参考訳（メタデータ） (Sun, 03 Aug 2025 16:40:53 GMT)
「In this work, we propose an external memory for LLM that is pretrained to mimic a retriever on the entire pretraining dataset. Specifically, following the RAG setting in kNN-LM [27], this memory learns to map the LLM hidden state at a certain step to a vocabulary distribution matching the output of the kNN retriever. During inference, the LLM’s native output is interpolated with the retriever-pretrained output from the external memory.」と記憶（知識）部分を切り離したアーキテクチャの提案
これがうまく動作するのであれば面白いなと思う一方で、知識と思考が切り離せるのかはやや疑問で思考・生成部分への影響が気になるところ。

RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems [30.5]
本稿では,脳にインスパイアされたマルチメモリ・フレームワークであるRoboMemoryについて紹介する。継続的学習、マルチモジュールメモリレイテンシ、タスク相関キャプチャ、クローズドループ計画における無限ループ緩和といった現実の環境における課題に対処する。
論文参考訳（メタデータ） (Sat, 02 Aug 2025 15:39:42 GMT)
「Inspired by the brain’s unified memory mechanisms, we design a lifelong embodied mem- ory system with four parallel modules (Spatial, Temporal, Episodic, Semantic) under a unified framework. This framework supports parallelized update and retrieval across modules, mitigating latency accumulation in complex systems while facilitating coherent knowledge integration for lifelong learning.」という、AgenticなアプローチのMemory。
現状、現実的にはAgenticなアプローチだと思う一方で、どの段階でモデル構造に踏み込むべきなのかは気になるところ。

Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance

Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance [39.6]
大規模言語モデル(LLM)エージェントは、しばしばルールや必要なドメイン知識が頻繁に変化する環境で苦労する。テスト時に更新されたドメイン知識を継続的に学習するための適応反射型対話エージェント(ARIA)を提案する。 ARIAはTikTok Pay内にデプロイされ、月間アクティブユーザ数は1億5000万を超えている。
論文参考訳（メタデータ） (Wed, 23 Jul 2025 02:12:32 GMT)
「ARIA addresses conventional model limitations in dynamic environments by as- sessing uncertainty via self-dialogue, soliciting expert corrections, and updating a timestamped, conflict-resolving knowledge base.」と記憶を通じた自己改善を行っていくフレームワークの提案。実際にデプロイされているのがすごい。
リポジトリはyf-he/aria

MemOS: A Memory OS for AI System, MIRIX: Multi-Agent Memory System for LLM-Based Agents

RAGでは厳しい問題を扱うためのMemory関連の研究がとても盛ん。

MemOS: A Memory OS for AI System [115.3]
大規模言語モデル(LLM)は、人工知能(AGI)にとって不可欠な基盤となっている。既存のモデルは、主に静的パラメータと短命なコンテキスト状態に依存しており、ユーザの好みを追跡したり、長い期間にわたって知識を更新する能力を制限する。 MemOSはメモリを管理可能なシステムリソースとして扱うメモリオペレーティングシステムである。
論文参考訳（メタデータ） (Fri, 04 Jul 2025 17:21:46 GMT)
MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models – arXiv最新論文の紹介からのアップデート、AgenticなアプローチのLLM用メモリ。時系列性など通常のRAGでは簡単ではない部分の性能向上が大きい。（が、「To ensure architectural parity, all methods are implemented over the same LLM backbone (GPT-4o-mini)」とベースモデルがGPT-4o miniで良いのかは若干謎ではある）
リポジトリはGitHub – MemTensor/MemOS: MemOS (Preview) | Intelligence Begins with Memory

MIRIX: Multi-Agent Memory System for LLM-Based Agents [7.1]
MIRIXは言語モデルのためのモジュール型マルチエージェントメモリシステムである。 MIRIXは、リッチな視覚的およびマルチモーダル体験を受け入れるためにテキストを超越する。 MIRIXはメモリ拡張LDMエージェントの新たなパフォーマンス標準を設定している。
論文参考訳（メタデータ） (Thu, 10 Jul 2025 17:40:11 GMT)
こちらもAgenticなアプローチのメモリ管理フレームワーク。ベースモデルが異なるためMemOSと直接比較が困難だが、他システムと比べ高い性能を主張。
リポジトリはGitHub – Mirix-AI/MIRIX: Mirix is a multi-agent personal assistant designed to track on-screen activities and answer user questions intelligently. By capturing real-time visual data and consolidating it into structured memories, Mirix transforms raw inputs into a rich knowledge base that adapts to your digital experiences.

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions [19.5]
メモリ機構を持つエージェントをメモリエージェントと呼ぶ。本稿では,メモリエージェントに不可欠な4つのコア能力,すなわち,正確な検索,テスト時間学習,長距離理解,コンフリクト解決の4つを同定する。既存のデータセットは、限られたコンテキスト長に依存するか、書籍ベースのQAのような静的で長いコンテキスト設定用に調整されている。既存のベンチマークでは4つの能力をすべてカバーしていないため、メモリエージェント用に特別に設計された新しいベンチマークであるMemoryAgentBenchを紹介します。
論文参考訳（メタデータ） (Mon, 07 Jul 2025 17:59:54 GMT)
こちらはMemoryを持つエージェントのためのベンチマークの提案
「we identify four core competencies essential for memory agents: accurate retrieval, test-time learning, long-range understanding, and conflict resolution.」とのこと。
結果にある「While Mem0 has demonstrated relatively strong performance on conversational tasks such as LOCOMO—where information density is comparatively low—it tends to perform poorly on benchmarks containing dense informational content, including RULER and ∞-Bench. For tasks emphasizing Time-to-Live (TTL) and Least Recently Used (LRU) retrieval, these limitations are often even more pronounced.」という指摘は興味深く、ドメインを選ばない汎用的な構造を作るのは大変そうという印象。
リポジトリはai-hyz/MemoryAgentBench · Datasets at Hugging Face、GitHub – HUST-AI-HYZ/MemoryAgentBench: Open source code for Paper: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent [53.8]
我々は,セグメント内のテキストを読み,上書き戦略を用いてメモリを更新する新しいエージェントワークフローであるMemAgentを紹介した。 MemAgentは、32Kテキストでトレーニングされた8Kコンテキストから3.5M QAタスクへの外挿が可能で、パフォーマンスが5%低下し、512K RULERテストで95%以上を実現している。
論文参考訳（メタデータ） (Thu, 03 Jul 2025 03:11:50 GMT)
長文を扱うためのAgenticなフレームワークの提案、下記が特徴とのこと（プロジェクトサイトより引用）
- 1 Novel memory mechanism: The agent reads text in segments and efficiently updates memory through an overwriting strategy. This design enables the model to process arbitrarily long inputs within a fixed context window, fundamentally overcoming the window length limitations of traditional Transformer architectures.
- 2 O(n) complexity: By decoupling computation from text length, the complexity of processing long texts is transformed from quadratic growth to linear growth.
- 3 RL-driven extrapolation: We enhance the DAPO algorithm to support multi-turn training over context-independent conversations. Based on this, the trained model exhibits unprecedented extrapolation performance.
プロジェクトサイトはMemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems [44.8]
大規模言語モデル (LLM) を利用したマルチエージェントシステム (MAS) は、単一のLLMエージェントよりもはるかに高い認知と実行能力を示している。組織記憶理論に触発されたMASのための階層型エージェントメモリシステムG-Memoryを紹介する。 Gメモリは、インボディードアクションの成功率と知識QAの精度を、それぞれ20.89%$と10.12%$で改善する。
論文参考訳（メタデータ） (Mon, 09 Jun 2025 03:43:46 GMT)
とてもホットなLLMの記憶に関する報告。「we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both high-level, generalizable insights that enable the system to leverage cross-trial knowledge, and fine-grained, condensed interaction trajectories that compactly encode prior collaboration experiences.」とAgenticなアプローチ。
リポジトリはGitHub – bingreeky/GMemory

How much do language models memorize?

How much do language models memorize? [104.2]
我々は記憶を2つの構成要素に分けている:「文体記憶」と「文体一般化」である。一般化を完全に排除すると、モデルキャパシティを見積もるトータル・メモリ化を計算することができる。サイズが大きくなるデータセット上で言語モデルをトレーニングし、キャパシティが満たされるまでモデルを記憶し、その時点での「グルーキング」が始まり、モデルが一般化し始めるにつれて意図しない記憶が減少するのを観察する。
論文参考訳（メタデータ） (Fri, 30 May 2025 17:34:03 GMT)
AGIを目指すうえでとても重要な記憶に関する報告、「We formally separate memorization into two components: unintended memorization, the information a model contains about a specific dataset, and generalization, the information a model contains about the true data-generation process. When we completely eliminate generalization, we can compute the total memorization, which provides an estimate of model capacity: our measurements estimate that GPT-style models have a capacity of approximately 3.6 bits per parameter.」とのこと。
引用されているが、Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws – arXiv最新論文の紹介など、この手の研究は本当に興味深い。

MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models [31.9]
我々は,大規模言語モデル(LLM)用に設計されたメモリオペレーティングシステムであるMemOSを紹介する。コアとなるMemCubeは、異種メモリの追跡、融合、マイグレーションを可能にする標準化されたメモリ抽象化である。 MemOSは、強力な制御性、適応性、進化性を備えたメモリ中心の実行フレームワークを確立する。
論文参考訳（メタデータ） (Wed, 28 May 2025 08:27:12 GMT)
LLMのためのメモリ管理フレームワークの提案。「Large Language Models (LLMs) have emerged as foundational infrastructure in the pursuit of Artificial General Intelligence (AGI). Despite their remarkable capabilities in language perception and generation, current LLMs fundamentally lack a unified and structured architecture for handling memory.」はその通りで、記憶の実装はLLMの利用を進める上でとても重要
「MemOS provides a unified abstraction and integrated management framework for heterogeneous memory types, including parametric memory, activation memory, and explicit plaintext memory. We propose a standardized memory unit, MemCube, and implement key modules for scheduling, lifecycle management, structured storage, and transparent augmentation.」と良く設計・実装されたシステムに見えるが、このようなアプローチと（最近あまり聞かない）Deepでポン的なモデルに組み込むアプローチのどちらが有望なのか気になる。

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection [48.2]
人間の介入を必要とせず、時間とともに継続的に進化する新しい、現実的で大規模な知識注入ベンチマークを導入する。 WikiDYKはウィキペディアの「Did You Know…」エントリから最近追加された人文的な事実を活用する。 WikiDYKには12,290の事実と77,180の質問が含まれている。
論文参考訳（メタデータ） (Sun, 18 May 2025 08:39:05 GMT)
「Our extensive experiments reveal a critical limitation: under continued pre-training, Causal Language Models (CLMs) exhibit significantly weaker knowledge memorization compared to Bidirectional Language Models (BiLMs). To address this gap, we proposed a modular collaborative framework that integrates BiLMs as dynamic external knowledge repositories with LLMs.」とのこと。今はCausal LM全盛という感じだが、BiLMの活用はありえるのだろうか。速度的な問題次第・・・？
リポジトリはGitHub – zhang-yu-wei/WikiDYK

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions [55.2]
メモリは、大規模言語モデル(LLM)ベースのエージェントを支える、AIシステムの基本コンポーネントである。コンソリデーション、更新、インデックス付け、フォッティング、検索、圧縮の6つの基本的なメモリ操作を紹介します。この調査は、AIのメモリに関する研究、ベンチマークデータセット、ツールに関する構造化された動的視点を提供する。
論文参考訳（メタデータ） (Thu, 01 May 2025 17:31:33 GMT)
LLM、エージェントにとって重要なメモリのサーベイ。
「In this survey, we first categorize memory representations into parametric, contextual structured, and contextual unstructured and then introduce six fundamental memory operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression.」という軸設定。

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31