2025年7月17日 – arXiv最新論文の紹介

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots [45.0]
本研究では,シミュレータや実環境で実行する前に,タスクプランを自動的に検証するアーキテクチャを提案する。このモジュールは、Large Language Modelsの推論機能を使用して、論理的一貫性を評価し、計画の潜在的なギャップを特定する。我々は,タスク計画の信頼性と効率の向上に寄与し,自律システムにおける堅牢な事前実行検証の必要性に対処する。
論文参考訳（メタデータ） (Mon, 07 Jul 2025 15:31:36 GMT)
タスク計画の検証のため「In this paper, we propose an architecture for automatically verifying high-level task plans before their execution in simulator or real-world environments. Leveraging Large Language Models (LLMs), our approach consists of two key steps: first, the conversion of natural language instructions into Linear Temporal Logic (LTL), followed by a comprehensive analysis of action sequences.」と形式言語を併用するアプローチの提案。
リポジトリはVerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset [112.5]
4000時間以上の対面インタラクション映像の大規模な収集であるSeamless Interactionデータセットを紹介した。このデータセットは、ダイドの具体的ダイナミクスを理解するAIテクノロジの開発を可能にする。そこで我々は,このデータセットを用いて,人間の発話に適応した動作ジェスチャーと表情を生成するモデル群を開発した。
論文参考訳（メタデータ） (Fri, 27 Jun 2025 18:09:49 GMT)
「we introduce the Seamless Interaction Dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage from over 4,000 participants in diverse contexts. This dataset enables the development of AI technologies that understand dyadic embodied dynamics, unlocking breakthroughs in virtual agents, telepresence experiences, and multimodal content analysis tools.」というデータセット。
リポジトリはGitHub – facebookresearch/seamless_interaction: Foundation Models and Data for Human-Human and Human-AI interactions.

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents [105.4]
VLM2Vec-V2は、様々な視覚形態にまたがる埋め込みを学習するための統一的なフレームワークである。まず、MMEBを5つの新しいタスクタイプで拡張する包括的なベンチマークであるMMEB-V2を紹介する。次に、テキスト、画像、ビデオ、ビジュアルドキュメント入力をサポートする汎用埋め込みモデルであるVLM2Vec-V2を訓練する。
論文参考訳（メタデータ） (Mon, 07 Jul 2025 00:51:57 GMT)
「MMEB-V2, an advanced multimodal embedding dataset designed to train and evaluate embedding models across three key visual modalities: images, videos, and visual documents.」と、それを活用した埋め込みモデルVLM2Vec-V2の提案。かなり汎用的な2vec
プロジェクトサイトはVLM2Vec