2025年5月27日 – arXiv最新論文の紹介

DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories

DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories [120.3]
DreamGenは、ニューラルトラジェクトリを通じて行動や環境を一般化するロボットポリシーをトレーニングするためのパイプラインだ。私たちの研究は、手作業によるデータ収集を超えて、ロボット学習をスケールするための、有望な新たな軸を確立します。
論文参考訳（メタデータ） (Mon, 19 May 2025 04:55:39 GMT)
「This pipeline is designed to be general-purpose across different robots, environments, and tasks. (1) We fine-tune video world models on a target robot to capture the dynamics and kinematics of the specific embodiment; (2) we prompt the model with pairs of initial frames and language instructions to generate large volumes of robot videos, capturing both familiar behaviors from fine-tuning and novel ones in unseen settings; (3) we then extract pseudo-actions using either a latent action model [13] or an inverse dynamics model (IDM)[14]; (4) finally, we use the resulting video-action sequence pairs, dubbed neural trajectories, for training downstream visuomotor policies.」と動画生成モデルを活用したデータ合成手法の提案。イメージトレーニングのようで面白い。
プロジェクトサイトはDreamGen

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent [66.3]
MLLMはUIコンポーネントの誤解釈と古い知識の2つの大きな問題に悩まされている。本稿では,2つの基本的なメカニズムを組み込んだトレーニング不要なGUIエージェントであるGUI-Explorerを提案する。 SPA-Benchでは53.7%、AndroidWorldでは47.4%のタスク成功率で、GUI-ExplorerはSOTAエージェントよりも大幅に改善されている。
論文参考訳（メタデータ） (Thu, 22 May 2025 16:01:06 GMT)
「(a) Automatically constructing function-aware exploration goals by analyzing structural information from the GUI environment, followed by systematic exploration to collect diverse function- aware trajectories. (b) Extracting effective screen-operation logic through unsupervised analysis of structured interaction triples (observation, action, outcome), enabling unsupervised knowledge extraction. (c) Performing visual-semantic retrieval between screen visuals and the knowledge vector store to construct Dynamic Guidance achieves dual objectives: preventing UI misinterpretation and ensuring action proposals align with actual UI states.」というメカニズムの提案。SPA-Bench、AndroidWorldのスコアを改善。
リポジトリはGitHub – JiuTian-VL/GUI-explorer: [ACL 2025] GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

BAT: Benchmark for Auto-bidding Task [67.6]
本稿では,最も普及している2種類のオークション形式を含むオークションベンチマークを提案する。我々は,新しいデータセットに基づいて,一連の堅牢なベースラインを実装した。このベンチマークは、研究者や実践者が革新的なオートバイディングアルゴリズムを開発し、洗練するための、ユーザフレンドリで直感的なフレームワークを提供する。
論文参考訳（メタデータ） (Tue, 13 May 2025 12:12:34 GMT)
「To address this deficiency, we present an auction benchmark en- compassing the two most prevalent auction formats. We implement a series of robust baselines on a novel dataset, addressing the most salient Real-Time Bidding (RTB) problem domains: budget pacing uniformity and Cost Per Click (CPC) constraint optimization.」と珍しいベンチマーク
リポジトリはGitHub – avito-tech/bat-autobidding-benchmark

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models [37.5]
視覚的推論によってのみ解決可能な合成データセットを用いてケーススタディを行う。次に、1,162人の専門家が注釈を付けた質問を含む新しいチャート質問回答(QA)ベンチマークであるChartMuseumを紹介します。人間は93%の精度を達成しているが、最高のパフォーマンスモデルであるGemini-2.5-Proは63.0%しか達成できず、主要なオープンソースであるLVLM Qwen2.5-VL-72B-Instructは38.5%しか達成していない。
論文参考訳（メタデータ） (Mon, 19 May 2025 17:59:27 GMT)
チャートQAなベンチマーク。Gemini-2.5-Pro、o4, o3, Calude 3.7, GPT-4.1もスコアが低い困難なタスク。
プロジェクトサイトはChartMuseum

Understanding Gen Alpha Digital Language: Evaluation of LLM Safety Systems for Content Moderation [8.9]
この研究は、AIシステムがジェネレーションアルファのデジタル言語をどのように解釈するかの独特な評価を提供する(Gen Alpha、2010年生まれ-2024年) Gen Alphaは、没入型のデジタルエンゲージメントと、進化するコミュニケーションと既存の安全ツールとのミスマッチの増加により、新たな形のオンラインリスクに直面している。この研究は、ゲームプラットフォーム、ソーシャルメディア、ビデオコンテンツからの100の最近の表現のデータセットを使用して、オンラインの安全性に直接影響する重要な理解障害を明らかにしている。
論文参考訳（メタデータ） (Wed, 14 May 2025 16:46:11 GMT)
デジタルネイティブ世代とのギャップに関する研究、「Most critically, protection systems consistently lagged behind the rapid evolution of expressions, creating windows of vulnerability where concerning interactions went undetected」で「The resulting trust gap led many Gen Alpha users to avoid reporting concerning interactions, believing adults would misunderstand or minimize their experiences.」とのこと。。
生成AI時代はもっとギャップが広がるのだろうか・・・
リポジトリはGitHub – SystemTwoAI/GenAlphaSlang