2025年8月13日 – arXiv最新論文の紹介

RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems [30.5]
本稿では,脳にインスパイアされたマルチメモリ・フレームワークであるRoboMemoryについて紹介する。継続的学習、マルチモジュールメモリレイテンシ、タスク相関キャプチャ、クローズドループ計画における無限ループ緩和といった現実の環境における課題に対処する。
論文参考訳（メタデータ） (Sat, 02 Aug 2025 15:39:42 GMT)
「Inspired by the brain’s unified memory mechanisms, we design a lifelong embodied mem- ory system with four parallel modules (Spatial, Temporal, Episodic, Semantic) under a unified framework. This framework supports parallelized update and retrieval across modules, mitigating latency accumulation in complex systems while facilitating coherent knowledge integration for lifelong learning.」という、AgenticなアプローチのMemory。
現状、現実的にはAgenticなアプローチだと思う一方で、どの段階でモデル構造に踏み込むべきなのかは気になるところ。

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents [88.4]
MMBench-GUIは、Windows、Linux、iOS、Android、WebプラットフォームでGUI自動化エージェントを評価する階層的なベンチマークである。 GUIコンテンツ理解、要素グラウンディング、タスク自動化、タスクコラボレーションの4つのレベルで構成されており、GUIエージェントに必要なスキルをカバーしています。
論文参考訳（メタデータ） (Fri, 25 Jul 2025 17:59:26 GMT)
GUIエージェント評価用のベンチマーク。「(1) GUI Content Understanding, (2) GUI Element Grounding, (3) GUI Task Automation, and (4) GUI Task Collaboration.」の4段階。「Finding 1: General-purpose language models excel at task decomposition, planning, and self-reflection but struggle with fine-grained visual interactions.」、「Finding 2: Accurate visual grounding significantly determines the success rate of GUI task execution.」は現在のGUIエージェント開発の方向性とも合致している。
リポジトリはopen-compass/MMBench-GUI: Official repo of “MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents”. It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.

The Missing Parts: Augmenting Fact Verification with Half-Truth Detection [8.1]
多くの現実世界の主張は半真実であり、実際は正しいが、批判的な文脈が欠落しているために誤解を招く。我々は,半真実検出の課題を紹介し,文レベルの証拠アライメントと推論されたクレーム意図を付加した15kの政治的クレームを備えた新しいベンチマークであるPolitiFact-Hiddenを提案する。提案するTRACERは,エビデンスを整理し,インプリートを推定し,隠されたコンテンツの因果的影響を推定することにより,省略に基づく誤報を識別するモジュラー・リアセスメント・フレームワークである。
論文参考訳（メタデータ） (Fri, 01 Aug 2025 10:06:38 GMT)
「half-truth detection as a new task in fact verification, targeting claims that omit critical context while remaining factually correct.」というタスクの提案とベンチマークの作成。
加えて、「 (1) evidence alignment, to classify retrieved evidence as presented or hidden; (2) intent generation, to recover the claim’s implicit message; and (3) causality analysis, to determine whether the hidden evidence undermines the inferred intent. 」という３ステージ構成の「TRACER (Truth ReAssessment with Critical hidden Evidence reasoning)」を提案している。