MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents [88.4]
MMBench-GUIは、Windows、Linux、iOS、Android、WebプラットフォームでGUI自動化エージェントを評価する階層的なベンチマークである。 GUIコンテンツ理解、要素グラウンディング、タスク自動化、タスクコラボレーションの4つのレベルで構成されており、GUIエージェントに必要なスキルをカバーしています。
論文参考訳（メタデータ） (Fri, 25 Jul 2025 17:59:26 GMT)
GUIエージェント評価用のベンチマーク。「(1) GUI Content Understanding, (2) GUI Element Grounding, (3) GUI Task Automation, and (4) GUI Task Collaboration.」の4段階。「Finding 1: General-purpose language models excel at task decomposition, planning, and self-reflection but struggle with fine-grained visual interactions.」、「Finding 2: Accurate visual grounding significantly determines the success rate of GUI task execution.」は現在のGUIエージェント開発の方向性とも合致している。
リポジトリはopen-compass/MMBench-GUI: Official repo of “MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents”. It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.

コメントを残す

コメントを残す コメントをキャンセル