2025年12月23日 – arXiv最新論文の紹介

Large Video Planner Enables Generalizable Robot Control

Large Video Planner Enables Generalizable Robot Control [117.5]
汎用ロボットは、様々なタスクや環境にまたがって一般化する意思決定モデルを必要とする。最近の研究は、マルチモーダル大言語モデル(LM)をアクション出力で拡張し、視覚-アクション(VLA)システムを構築することで、ロボット基盤モデルを構築している。本稿では,ロボット基礎モデル構築における主要なモダリティとして,大規模ビデオ事前学習を用いるための代替パラダイムについて検討する。
論文参考訳（メタデータ） (Wed, 17 Dec 2025 18:35:54 GMT)
「We present Large Video Planner (LVP), a 14-billion parameter video foundation model for embodiment planning. LVP generates videos as motion plans conditioned on one or a few scene frames and a text description of the task. We demonstrate that these generated motion plans can be successfully retargeted to dexterous robotic hands using open-source reconstruction and retargeting tools. Evaluations on third-party proposed tasks show evidence of task-level generalization, a capability limited in existing VLA models.」と動画をカギとするロボット用の行動計画モデルの提案。
関連手法の進化を見るに、有力なアプローチに思えなくもない。

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World [100.7]
エージェントは現実的な4D駆動環境を合成し、説得力があるように見えるが、物理的または行動的に失敗することが多い。モデルがどのように構築され、理解され、その生成された世界の中でどのように振る舞うかを評価するフルスペクトルベンチマークであるWorldLensを紹介します。さらに、数値的なスコアとテキストの合理性を備えた人間の注釈付きビデオの大規模データセット WorldLens-26K を構築し、WorldLens-Agent を開発した。
論文参考訳（メタデータ） (Thu, 11 Dec 2025 18:59:58 GMT)
「We introduce WorldLens, a full-spectrum benchmark evaluating how well a model builds, understands, and behaves within its generated world. It spans five aspects – Generation, Reconstruction, Action-Following, Downstream Task, and Human Preference – jointly covering visual realism, geometric consistency, physical plausibility, and functional reliability.」というベンチマーク。
リポジトリはGitHub – worldbench/WorldLens: 🌐 WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World、プロジェクトサイトはWorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Adaptation of Agentic AI

Adaptation of Agentic AI [162.6]
我々は、急速に拡大する研究環境を、エージェント適応とツール適応の両方にまたがる体系的な枠組みに統一する。エージェントAIにおける適応戦略の設計空間を明らかにする上で,本フレームワークが有効であることを示す。次に、各カテゴリの代表的アプローチをレビューし、その強みと限界を分析し、主要なオープン課題と今後の機会を強調します。
論文参考訳（メタデータ） (Thu, 18 Dec 2025 08:38:51 GMT)
AIエージェントに関するサーベイ。「The transition from static foundation models to autonomous agentic systems marks a fundamental shift in artificial intelligence, moving from passive response generation to active and multi-step problem solving. As these systems are deployed in increasingly complex and open-ended environments, the ability to adapt to refine behavior, master new tools, and align with specific tasks has become the primary driver of reliability and performance.」を「(A1) Agent Adaptation with Tool Execution Signal, (A2) Agent Adaptation with Agent Output Signal, (T1) Agent-Agnostic Tool Adaptation, and (T2) Agent-Supervised Tool Adaptation.」軸で整理。メリデメがあるので「Looking forward, the advancement of agentic AI depends on the strategic integration of these paradigms rather than their isolation.」というのはそうだろうと思う。
リポジトリはGitHub – pat-jj/Awesome-Adaptation-of-Agentic-AI: Repo for “Adaptation of Agentic AI”

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31