OmniGAIA: Towards Native Omni-Modal AI Agents

OmniGAIA: Towards Native Omni-Modal AI Agents [103.8]
我々は、深い推論とマルチターンツールの実行を必要とするタスクにおいて、オムニモーダルエージェントを評価するために設計されたベンチマークを導入する。我々は,Omni-modal foundation agentであるOmniAtlasを提案する。
論文参考訳（メタデータ） (Thu, 26 Feb 2026 11:35:04 GMT)
「OmniGAIA, a challenging benchmark for native omni-modal agents. OmniGAIA comprises 360 tasks across 9 real-world domains, covering both video-with-audio and image+audio settings, and explicitly requires multi-turn tool use (e g , web search/browsing and code) to produce verifiable open-form answers.」とマルチモーダルなベンチマーク。デモが分かりやすい。
リポジトリはGitHub – RUC-NLPIR/OmniGAIA: OmniGAIA: Towards Native Omni-Modal AI Agents、リーダーボードはOmniGAIA Leaderboard – a Hugging Face Space by RUC-NLPIR、商用モデル（Gemini）の強さが目立つ

コメントを残す

コメントを残す コメントをキャンセル