2025年10月21日 – arXiv最新論文の紹介

LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation

LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation [110.6]
Retrieval-augmented Generation (RAG)は、外部知識を取り入れた大規模言語モデル(LLM)を強化する。既存の研究はしばしばユーティリティをジェネリック属性として扱い、異なるLLMが同じ通路から異なる利益をもたらすという事実を無視している。
論文参考訳（メタデータ） (Mon, 13 Oct 2025 12:57:45 GMT)
「(1) We highlight the new perspective of utility for RAG, i.e., LLM-specific utility. (2) We introduce the LLM-specific utility judgment task, propose a benchmarking procedure, and provide a comprehensive empirical analysis of various LLMs and methods.(3) We identify the key direction in achieving more effective LLM-specific utility judgment: known queries should reject all passages, while unknown ones must identify useful ones, which need to be analyzed further.」とのこと。そうだよねという印象で、RAGの特性を整理するうえでも参考になる。
リポジトリはAnonymized Repository – Anonymous GitHub

Self-Improvement in Multimodal Large Language Models: A Survey [34.4]
LLM(Large Language Models)の自己改善は、コストを大幅に増大させることなく、効率的にモデル機能を強化している。この調査は、マルチモーダル LLM における自己改善に関する総合的な概要を提供する最初のものである。
論文参考訳（メタデータ） (Fri, 03 Oct 2025 01:48:26 GMT)
Self improvementに関するサーベイ。「We provide a structured overview of the current literature and discuss methods from three perspectives: 1) data col- lection, 2) data organization, and 3) model optimization, to facilitate the further development of self-improvement in MLLMs. We also in- clude commonly used evaluations and down- stream applications.」

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning [124.2]
統一音声言語モデル (Unified Audio Language Model, UALM) は、音声理解、テキスト音声生成、マルチモーダル推論を単一モデルで統一することを目的としている。最初にUALM-Genを提示する。これは音声トークンを直接予測し,最先端の拡散モデルに匹敵する言語モデルである。 UALM-Reasonは、テキストと音声の両方を中間的思考ステップで活用し、複雑な生成作業を容易にするマルチモーダル推論モデルである。
論文参考訳（メタデータ） (Mon, 13 Oct 2025 22:55:01 GMT)
NVIDIAによるaudio understanding, text-to-audio generation, multimodal reasoningが可能な単一モデルUALM: Unified Audio Language Modelの提案。UALM: Unified Audio Language Model for Understanding, Generation, and Reasoning – NVIDIA ADLRでデモが提供されている、
リポジトリはaudio-intelligence/UALM at main · NVIDIA/audio-intelligence · GitHub

LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training [55.7]
構造化されたUI状態と遷移を生成するスケーラブルなパラダイムを導入し、大規模にトレーニングトラジェクトリを合成する。このパラダイムは、多様なUI状態のためのデジタルワールドシミュレータ、コヒーレント探索のためのガイド付きロールアウトプロセス、軌道ラッパーを統合している。 WebArenaとAndroidWorldの実験では、UI-Simulatorは実際のUIでトレーニングされたオープンソースエージェントと競合するか、あるいは超越している。
論文参考訳（メタデータ） (Thu, 16 Oct 2025 17:59:38 GMT)
「We introduced UI-Simulator, a scalable trajectory synthesis paradigm that uses LLM-based digital world simulators to synthesize diverse UI trajectories at scale through multi-step simulation, guided rollouts, and final trajectory wrapping.」とGUIエージェント構築に活用できるデータ合成フレームワークの提案。
リポジトリはGitHub – WadeYin9712/UI-Simulator: Code for 🌍 UI-Simulator: LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training