2025年10月9日 – arXiv最新論文の紹介

Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design

Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design [124.5]
組合せ最適化問題は伝統的に手作りのアルゴリズムで取り組まれている。最近の進歩は、大規模言語モデルによる自動設計の可能性を強調している。本稿では,自動アルゴリズム設計のためのPmpt and Heuristics (EvoPH) を用いた経験進化的リフレクティブ・ガイドを提案する。
論文参考訳（メタデータ） (Mon, 29 Sep 2025 09:24:09 GMT)
「we propose EvoPH, a novel experience-guided reflective co-Evolution framework that can co-evolve Prompts and Heuristics for automatic algorithm design.」、「EvoPH comprises two interacting processes. Heuristics Evolution generates, evaluates, and stores candidate algorithms, providing feedback for further search. Prompt Evolution adaptively refines LLM prompts and strategy selection based on this feedback.」と人が手で最適化するようなフレームワークの提案。従来の手法から優位性を確認とのこと。

More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration [103.2]
ガイダンス・オン・デマンド」アプローチは、自己発見の価値を保ちながら探究を広げる。実験の結果、AMPOは強いベースラインを大幅に上回ることが示された。ピアサイズの4人の教師を用いて、より強力な1人の教師を活用できる手法に匹敵する結果が得られる。
論文参考訳（メタデータ） (Thu, 02 Oct 2025 17:14:00 GMT)
「we introduce Adaptive Multi-Guidance Policy Optimization (AMPO), a novel Mixed-Policy RL framework. Instead of relying on a single stronger teacher (e g , GPT4o or DeepSeek-R1), AMPO leverages the collective intelligence of multiple peer models. It operates on a “guidance-on-demand” principle: external guidance from diverse teachers replaces on-policy failures only when the student model is unable to solve a problem, thus maximizing the value of self- exploration. Furthermore, AMPO employs a comprehension-based guidance selection mechanism.」というフレームワークの提案。教師側が強力な1モデルではなく、複数の小型モデルで良いというは面白い。
リポジトリはGitHub – SII-Enigma/AMPO: Official Repository of “More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration”

Who Gets Cited Most? Benchmarking Long-Context Language Models on Scientific Articles [81.9]
SciTrekは、科学論文を用いた大規模言語モデル(LLM)の長文推論能力を評価するために設計された、新しい質問応答ベンチマークである。本分析により,モデルの基本的数値演算を行ない,特定の情報を長い文脈で正確に特定する能力において,系統的な欠点が明らかとなった。
論文参考訳（メタデータ） (Thu, 25 Sep 2025 11:36:09 GMT)
「This paper introduced SciTrek, a benchmark designed for testing the ability of LLMs to perform multi-document information synthesis and structured reasoning over full-text scientific articles. 」と科学分野のマルチドキュメント・長文ベンチマーク。
リポジトリはGitHub – oaimli/SciTrek: Benchmarking long-context language models on scientific articles

InfoAgent: Advancing Autonomous Information-Seeking Agents [143.2]
本稿では,革新的なデータ合成パイプラインとWeb検索ツールを駆使したディープリサーチエージェントInfoAgentを紹介する。我々の方法では、InfoAgentはBrowseCompで15.3%、BrowseComp-ZHで29.2%、Xbench-DSで40.4%の精度を達成した。
論文参考訳（メタデータ） (Mon, 29 Sep 2025 17:59:57 GMT)
Deep Researchエージェントの構築。Qwen3 14Bベースで合成データを活用、「In the first stage, we perform supervised finetuning (SFT) as a cold start, in order to instill long-horizon search behavior into the model.」、「In the second stage, we apply RL to refine its ability of reasoning-driven tool use.」の2段階でのpost training。
合成データ、post trainingの有効性を示す結果で、ベースモデルサイズもお手頃感がある。このようなSLMの開発が流行っていく可能性を感じる結果。