2024年11月26日 – arXiv最新論文の紹介

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory [23.5]
本稿では,視覚的物体追跡に特化して設計されたSAM 2を改良したSAmuraiを紹介する。提案した動き認識メモリ選択機構に時間的動作手がかりを組み込むことで、物体の動きを効果的に予測し、マスク選択を洗練し、トレーニングや微調整を必要とせず、堅牢で正確なトラッキングを実現する。評価では、既存のトラッカーよりも成功率と精度が大幅に向上し、LaSOT$_ext$で7.1%、GOT-10kで3.5%向上した。
論文参考訳（メタデータ） (Mon, 18 Nov 2024 05:59:03 GMT)
オブジェクトトラッキングに特化しSAMを改良したSAM-based Unified and Robust zero-shot visual tracker with motionAware Instance-level memory、SAMURAI。
リポジトリはGitHub – yangchris11/samurai: Official repository of “SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory”

Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search [95.1]
o1のような推論アプローチは困難で、研究者はこのオープンな研究領域を前進させようとさまざまな試みを行ってきた。本稿では,報酬誘導木探索アルゴリズムを用いて,LLMの推論能力を高めるための予備的な検討を行う。
論文参考訳（メタデータ） (Mon, 18 Nov 2024 16:15:17 GMT)
o1-like reasoning systemsを実現するための検討、「In this paper, we present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.」とのこと。Marco-o1の報告、DeepSeek-R1の主張（A Chinese lab has released a ‘reasoning’ AI model to rival OpenAI’s o1 | TechCrunch）を含め、速攻で近いものの提案が始まる激しい競争環境。マルチモーダルでの有効性も報告（下記）されていて今後が楽しみ。
サーベイに近いかと思いきや実験結果などもあり参考になる。

LLaVA-o1: Let Vision Language Models Reason Step-by-Step [33.7]
LLaVA-o1は、自律的な多段階推論を実現するために設計された新しいVLMである。チェーン・オブ・シークレットのプロンプトとは異なり、LLaVA-o1は独立に要約、視覚的解釈、論理的推論、結論生成の逐次的な段階に関与する。 100kのトレーニングサンプルと単純な推論時間スケーリング法により、LLaVA-o1はベースモデルよりも8.9%性能が向上する。
論文参考訳（メタデータ） (Fri, 15 Nov 2024 18:58:31 GMT)
リポジトリはGitHub – PKU-YuanGroup/LLaVA-CoT: LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning、「Based on recent feedback from social media platforms like X, we have decided to rename LLaVA-o1 to LLaVA-CoT.」とのこと。

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions [40.2]
Marco-o1は数学、物理学、コーディングなどの標準解を持つ分野に焦点を当てている。 o1モデルは、明確な標準が欠如し、報酬が定量化が難しい広い領域に効果的に一般化しますか? Marco-o1は、Chain-of-Thoughtファインチューニング、Monte Carlo Tree Search (MCTS)、リフレクションメカニズム、革新的な推論戦略によって実現されている。
論文参考訳（メタデータ） (Thu, 21 Nov 2024 18:37:33 GMT)
「Our Marco-o1 enhances the reasoning ability by integrating Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), and novel reasoning action strategies.」というo1ライクなモデル構築に関する報告。

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective [77.9]
OpenAIは、o1の背後にある主要な技術は強化学習であると主張している。本稿では、強化学習の観点から、o1を達成するためのロードマップを分析する。
論文参考訳（メタデータ） (Wed, 18 Dec 2024 18:24:47 GMT)
「In this paper, we present a roadmap for reproducing o1 from the perspective of reinforcement learning, emphasizing key components such as policy initialization, reward design, search, and learning.」という論文も。