2025年10月 – ページ 6 – arXiv最新論文の紹介

TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis

TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis [25.4]
TimeSeriesScientist(TSci)は時系列予測のための一般的なドメインに依存しないフレームワークである。これはそれぞれ平均10.4%と38.2%の予測誤差を減少させる。透明な自然言語の合理性と包括的な報告により、TSciは予測をホワイトボックスシステムに変換する。
論文参考訳（メタデータ） (Thu, 02 Oct 2025 00:18:59 GMT)
「Upon receiving input time series data, the framework executes a structured four-agent workflow. Curator generates analytical reports (Section 3.2), Planner selects model configurations through reasoning and validation (Section 3.3), Forecaster integrates model results to produce the final forecast (Section 3.4), Reporter generates a comprehensive report as the final output of our framework (Section 3.5).」という時系列分析のエージェントフレームワーク
プロジェクトサイトはTimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis

D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents

D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents [22.3]
D-ArtemisはGUIエージェントのための新しい検討フレームワークである。 D-Artemisは、詳細なアプリ固有のチップ検索メカニズムを使用して、意思決定プロセスに通知する。また、TACチェックモジュールとACA(Action Correction Agent)が協調して動作し、実行障害のリスクを軽減している。実行後状態反映エージェント(SRA)は認知ループを完了し、経験から戦略的学習を可能にする。
論文参考訳（メタデータ） (Fri, 26 Sep 2025 02:56:19 GMT)
「(a) The manager agent is guided by two input modalities: textual (task, tips, working memory) and visual (screenshot only). (b) Pre-execution, TAC Check module verifies thought-action consistency. (c) A low consistency score triggers the Action Correction Agent (ACA) to analyze the error type and rectify the action. (d) Post-execution, the Status Reflection Agent (SRA) assesses the action effectiveness and the environmental state to produce guidance for the next step. Upon completion of each step, the working memory is updated.」と非常に凝ったマルチエージェント構成をとるGUI Agent。同一バックボーンを持つアプローチに対して優位性を主張。

Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design

Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design [124.5]
組合せ最適化問題は伝統的に手作りのアルゴリズムで取り組まれている。最近の進歩は、大規模言語モデルによる自動設計の可能性を強調している。本稿では,自動アルゴリズム設計のためのPmpt and Heuristics (EvoPH) を用いた経験進化的リフレクティブ・ガイドを提案する。
論文参考訳（メタデータ） (Mon, 29 Sep 2025 09:24:09 GMT)
「we propose EvoPH, a novel experience-guided reflective co-Evolution framework that can co-evolve Prompts and Heuristics for automatic algorithm design.」、「EvoPH comprises two interacting processes. Heuristics Evolution generates, evaluates, and stores candidate algorithms, providing feedback for further search. Prompt Evolution adaptively refines LLM prompts and strategy selection based on this feedback.」と人が手で最適化するようなフレームワークの提案。従来の手法から優位性を確認とのこと。

More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration

More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration [103.2]
ガイダンス・オン・デマンド」アプローチは、自己発見の価値を保ちながら探究を広げる。実験の結果、AMPOは強いベースラインを大幅に上回ることが示された。ピアサイズの4人の教師を用いて、より強力な1人の教師を活用できる手法に匹敵する結果が得られる。
論文参考訳（メタデータ） (Thu, 02 Oct 2025 17:14:00 GMT)
「we introduce Adaptive Multi-Guidance Policy Optimization (AMPO), a novel Mixed-Policy RL framework. Instead of relying on a single stronger teacher (e g , GPT4o or DeepSeek-R1), AMPO leverages the collective intelligence of multiple peer models. It operates on a “guidance-on-demand” principle: external guidance from diverse teachers replaces on-policy failures only when the student model is unable to solve a problem, thus maximizing the value of self- exploration. Furthermore, AMPO employs a comprehension-based guidance selection mechanism.」というフレームワークの提案。教師側が強力な1モデルではなく、複数の小型モデルで良いというは面白い。
リポジトリはGitHub – SII-Enigma/AMPO: Official Repository of “More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration”

Who Gets Cited Most? Benchmarking Long-Context Language Models on Scientific Articles

Who Gets Cited Most? Benchmarking Long-Context Language Models on Scientific Articles [81.9]
SciTrekは、科学論文を用いた大規模言語モデル(LLM)の長文推論能力を評価するために設計された、新しい質問応答ベンチマークである。本分析により,モデルの基本的数値演算を行ない,特定の情報を長い文脈で正確に特定する能力において,系統的な欠点が明らかとなった。
論文参考訳（メタデータ） (Thu, 25 Sep 2025 11:36:09 GMT)
「This paper introduced SciTrek, a benchmark designed for testing the ability of LLMs to perform multi-document information synthesis and structured reasoning over full-text scientific articles. 」と科学分野のマルチドキュメント・長文ベンチマーク。
リポジトリはGitHub – oaimli/SciTrek: Benchmarking long-context language models on scientific articles

InfoAgent: Advancing Autonomous Information-Seeking Agents

InfoAgent: Advancing Autonomous Information-Seeking Agents [143.2]
本稿では,革新的なデータ合成パイプラインとWeb検索ツールを駆使したディープリサーチエージェントInfoAgentを紹介する。我々の方法では、InfoAgentはBrowseCompで15.3%、BrowseComp-ZHで29.2%、Xbench-DSで40.4%の精度を達成した。
論文参考訳（メタデータ） (Mon, 29 Sep 2025 17:59:57 GMT)
Deep Researchエージェントの構築。Qwen3 14Bベースで合成データを活用、「In the first stage, we perform supervised finetuning (SFT) as a cold start, in order to instill long-horizon search behavior into the model.」、「In the second stage, we apply RL to refine its ability of reasoning-driven tool use.」の2段階でのpost training。
合成データ、post trainingの有効性を示す結果で、ベースモデルサイズもお手頃感がある。このようなSLMの開発が流行っていく可能性を感じる結果。

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents [58.7]
エージェントの自己進化が意図しない方法で逸脱し、望ましくない結果や有害な結果に至る場合について検討する。我々の経験から、誤進化は広範囲にわたるリスクであり、最上位のLLM上に構築されたエージェントにも影響を及ぼすことが判明した。我々は、より安全で信頼性の高い自己進化型エージェントを構築するためのさらなる研究を促すための潜在的な緩和戦略について議論する。
論文参考訳（メタデータ） (Tue, 30 Sep 2025 14:55:55 GMT)
「(1) In model evolution, we assess whether self-evolving agents compromise their safety alignment after self-updating their model parameters. (2) In memory evolution, we test whether memory-augmented agents learn undesirable preferences or degrade their risk awareness while accumulating experience into memory. (3) In tool evolution, we evaluate whether agents will spontaneously induce risks in the tool creation-reuse loop, and test agents’ ability to reject appealing but potentially malicious tools retrieved from the Internet. (4) In workflow evolution, we analyze whether automatically adjusted workflows can lead to safety decay.」と4つの観点からMisevolveを評価。現実的な問題であると指摘。
リポジトリはGitHub – ShaoShuai0605/Misevolution: Official Repo of Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

Muon Outperforms Adam in Tail-End Associative Memory Learning

Muon Outperforms Adam in Tail-End Associative Memory Learning [119.0]
機能埋め込みにかかわらず,Muonはクラス間のバランスの取れた学習を一貫して達成している。我々の経験的観察と理論的分析により、ムオンの核となる利点が明らかとなり、その更新規則は線形連想記憶の外積構造と一致している。
論文参考訳（メタデータ） (Tue, 30 Sep 2025 10:04:08 GMT)
採用例が増えているオプティマイザ、Muonの分析。「The Muon update rule is aligned with the outer-product structure of linear assciative memories, enabling more balanced and effective learning of tail classes in heavy-tailed distributions as compared with Adam.」

LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions

LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions [80.1]
LLMをベースとした幻覚の包括的調査を行った。そこで本研究では,異なる段階において発生するさまざまな種類の幻覚を識別する新しい分類法を提案する。エージェント幻覚の出現の根底にある18の要因について詳細な検討を行った。
論文参考訳（メタデータ） (Tue, 23 Sep 2025 13:24:48 GMT)
「This paper presents a comprehensive survey of hallucination issues in LLM-based agents, with the goal of consolidating past progress, clarifying current challenges, and outlining future opportunities. We begin by distinguishing agent components into internal states and external behaviors, and, from this perspective, propose a taxonomy of hallucination types occurring at different stages.」とLLM based agentsが苦労している点の整理

Mem-α: Learning Memory Construction via Reinforcement Learning

Mem-α: Learning Memory Construction via Reinforcement Learning [20.9]
大きな言語モデル(LLM)エージェントは、限られたコンテキストウィンドウによって制約される。現在のメモリ拡張エージェントは、メモリ更新のための事前に定義された命令とツールに依存している。 Mem-alphaは、エージェントに複雑なメモリシステムを効果的に管理するように訓練する強化学習フレームワークである。
論文参考訳（メタデータ） (Tue, 30 Sep 2025 08:02:34 GMT)
システムプロンプト等で処理を行うメモリ管理エージェントでは限界があるためメモリ管理戦略を学ぶよう強化学習を活用するアプローチを提案「we propose Mem-α, a reinforcement learning framework that trains agents to effectively manage complex memory systems through interaction and feedback. 」
「Empirical evaluation demonstrates that Mem-α achieves significant improvements over existing memory-augmented agent baselines across diverse benchmarks. Most remarkably, despite being trained exclusively on instances with a maximum length of 30k tokens, our agents exhibit robust generalization to sequences exceeding 400k tokens, over 13× the training length.」というのも興味深い。
リポジトリはGitHub – wangyu-ustc/Mem-alpha: Learning Memory Construction via Reinforcement Learning

2025年10月
月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31