MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.7]
ロングチェーンのリフレクティブ推論は、複雑な現実世界の問題を解決するための前提条件である。我々は42の難解な合成タスクの1,260のサンプルからなるベンチマークを構築した。トレーニング後のデータを生成し、そのようなデータを活用するための学習パラダイムを探索する。
論文参考訳（メタデータ） (Thu, 09 Oct 2025 17:53:58 GMT)
「MM-HELIX contains 42 meticulously curated challeng- ing tasks from diverse online sources, categorized into four domains: Algorithm, Graph, Puzzle, and Game. Each task requires the model to perform careful visual observation, develop a deep understanding of complex rules, and generate an extended chain-of-thought that necessitates reflec- tion and backtracking.」という試行、失敗、修正のような長い思考を必要とするベンチマークの提案。GPT-5の性能が高くOSSモデルとの性能差が大きい。
プロジェクトサイトはMM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

コメントを残す

コメントを残す コメントをキャンセル