arXiv – ページ 44 – arXiv最新論文の紹介

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [34.4]
コンピュータ使用エージェントの安全性を計測する新しいベンチマークであるOS-Harmを紹介する。 OS-HarmはOSWorld環境上に構築されており、故意のユーザ誤用、インジェクション攻撃、モデル誤動作の3つのカテゴリでモデルをテストすることを目指している。我々は、フロンティアモデルに基づいてコンピュータ利用エージェントを評価し、その安全性に関する洞察を提供する。
論文参考訳（メタデータ） (Tue, 17 Jun 2025 17:59:31 GMT)
「First, we identify three main categories of risk: (1) deliberate user misuse, where the user asks the agent to pursue a harmful goal, (2) prompt injection attacks, where external attackers insert malicious content into third-party data (incoming emails, web pages, notifications, etc.) that steers the model away from performing its task and towards the attacker’s goal, and (3) model misbehavior, including benign tasks which are likely to result in costly mistakes or reveal model misalignment. For each category, we design tasks that differ in the type of safety violations and in the apps they require (such as Thunderbird, VS Code, Terminal, LibreOffice Impress, etc.), for a total of 150 tasks.」というベンチマークの提案。
リポジトリはGitHub – tml-epfl/os-harm: OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs

Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs [28.6]
大きな言語モデル(LLM)は印象的な推論機能を示している。彼らの成功の多くは、真の推論よりも、暗記された回答推論パターンに起因している、とエビデンスは示唆している。本稿では, 応答キューを体系的に操作し, 間接的, 行動解析によるモデル行動の探索を行う5段階の応答可視プロンプトフレームワークを提案する。
論文参考訳（メタデータ） (Sat, 21 Jun 2025 08:15:45 GMT)
「By manipulating the visibility of final answers within prompts, we uncover a profound and consistent pattern: LLM performance is predominantly anchored to the explicit presence of final answers rather than to the textual patterns of the reasoning steps themselves.」という指摘だが、LRMによっても挙動がかなり違うのが興味深い。

Towards AI Search Paradigm

Towards AI Search Paradigm [42.6]
我々は,人間の情報処理と意思決定をエミュレートできる次世代検索システムの青写真であるAI Search Paradigmを紹介する。このパラダイムは、4つのLCMを動力とするエージェントのモジュラーアーキテクチャを採用し、情報要求の完全な範囲に動的に適応する。この研究は、これらのコンポーネントの詳細なガイドを提供することによって、信頼できる、適応的でスケーラブルなAI検索システムの開発を知らせることを目的としている。
論文参考訳（メタデータ） (Fri, 20 Jun 2025 17:42:13 GMT)
検索用のマルチエージェントフレームワークの整理
検索とLLMの関係性がよくわかる論文

Robust Reward Modeling via Causal Rubrics

Robust Reward Modeling via Causal Rubrics [46.4]
リワードモデル(RM)は、人間のフィードバックによってLLM(Large Language Models)を整列させるのに基本的だが、報酬のハッキングに悩まされることが多い。 Cromeは、報酬のハッキングを軽減するために設計された明確な因果モデルに基づく、新しいフレームワークである。 RewardBenchの標準ベースラインを大幅に上回り、平均精度を最大5.4%向上させ、特定のカテゴリーで最大13.2%と7.2%のゲインを達成した。
論文参考訳（メタデータ） (Thu, 19 Jun 2025 17:59:47 GMT)
rewardハッキングへ対応可能な因果性を利用したフレームワーク、Crome (Causally Robust Reward Modeling)の提案
Google Deepmindによる成果だがChromeと紛らわしいような・・・

A Survey on World Models Grounded in Acoustic Physical Information

A Survey on World Models Grounded in Acoustic Physical Information [13.0]
本調査は, 音波物理情報に基づく世界モデルの新しい分野を包括的に概観する。理論的基盤、重要な方法論の枠組み、最近の技術進歩について考察する。この調査では、ロボット工学、自律運転、ヘルスケア、ファイナンスにおけるアコースティックワールドモデルの重要な応用について詳述している。
論文参考訳（メタデータ） (Mon, 16 Jun 2025 04:59:42 GMT)
World modelを念頭にPhysical acousticsに注目したサーベイ。

SGIC: A Self-Guided Iterative Calibration Framework for RAG

SGIC: A Self-Guided Iterative Calibration Framework for RAG [45.2]
大規模言語モデル(LLM)は、頑健な文脈内推論を生かしている。ツールとして不確実性スコアを用いる新しいフレームワークを提案する。また、反復的な自己校正訓練セットを構築するための革新的なアプローチも導入する。
論文参考訳（メタデータ） (Thu, 19 Jun 2025 09:45:13 GMT)
不確実性スコアを使ってRAGの性能向上を狙うアプローチ（(1) estimating the uncertainty scores of each document and the generated answers (Section 3.1); (2) iteratively utilizing the generated answers and their corresponding uncertainty scores from the validation set to perform the self-calibration process during the inference stage (Section 3.2); and (3) designing a strategy to reconstruct a new training set to fine-tune a self-guided iterative calibration LLM with uncertainty awareness (Section 3.3).）。トークンレベルで確信度的な値が取れるオープンなモデルだと効果が大きいように見える。
「Our framework consistently improves performance for both open-weight and closed-source models by utilizing uncertainty scores of documents and generated answers.」とのこと

Distilling On-device Language Models for Robot Planning with Minimal Human Intervention

Distilling On-device Language Models for Robot Planning with Minimal Human Intervention [117.9]
PRISMは、SLM(Small Language Model)対応ロボットプランナーを蒸留するためのフレームワークである。 PRISMを3つのLCM対応プランナーに適用し、マッピング、探索、操作、家事支援を行う。 GPT-4o の 10-20% から 93% 以上まで, PRISM は Llama-3.2-3B の性能を向上することを示した。
論文参考訳（メタデータ） (Fri, 20 Jun 2025 21:44:27 GMT)
robot planningを対象とした「Given a source LLM-enabled planner, PRISM synthesizes tasks and environments, elicits plans from the LLM-enabled planner in these synthesized environments, and then uses the resulting data to train an SLM-enabled planner that serves as a drop-in replacement for the source model.」という蒸留フレームワークの提案。直観的にも有効そうだが実際有望な結果。
プロジェクトサイトはPRISM

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.7]
推論とエージェント能力を備えた大規模言語モデル(LLM)は、エージェントディープリサーチ(Agenic Deep Research)と呼ばれる新しいパラダイムを取り入れている。静的なWeb検索から,計画,探索,学習を行う対話型エージェントベースのシステムへの進化を辿ります。我々はエージェントディープリサーチが既存のアプローチを著しく上回るだけでなく、将来の情報探索において支配的なパラダイムになることを実証する。
論文参考訳（メタデータ） (Thu, 26 Jun 2025 17:18:00 GMT)
DeepResearchに関するサーベイ、論文が出るのも凄いスピードだが、サーベイが出るのも早い・・・
リポジトリはGitHub – DavidZWZ/Awesome-Deep-Research: [Up-to-date] Awesome Agentic Deep Research Resources

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models [45.1]
Webのコンテキストでは、退屈な日々のタスクを扱う人々を支援するために、AI Agents — WebAgents — を活用することで、生産性と効率が劇的に向上する。 LFMの可能性を十分に探求するために、ユーザの指示に従って日々のWebタスクを完了させるように設計されたWebAgentsに広範な研究が登場した。
論文参考訳（メタデータ） (Mon, 26 May 2025 07:05:18 GMT)
利用が広がるWebAgentのサーベイ

Early Stopping Tabular In-Context Learning

Early Stopping Tabular In-Context Learning [40.6]
テキスト内学習を早期に行うことを提案する。トランスフォーマーエンコーダの各レイヤの後にコンテキスト内学習を停止させるかどうかを動的に評価することでこれを実現する。一旦停止すると、プレトレーニングされたレイヤワイズデコーダを使って埋め込みをデコードします。
論文参考訳（メタデータ） (Thu, 26 Jun 2025 15:36:37 GMT)
tabular foundation modelに対するearly stopping。TabPFNで効果を確認している。

2026年1月
月	火	水	木	金	土	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31