GUIエージェント評価用のベンチマーク。「(1) GUI Content Understanding, (2) GUI Element Grounding, (3) GUI Task Automation, and (4) GUI Task Collaboration.」の4段階。「Finding 1: General-purpose language models excel at task decomposition, planning, and self-reflection but struggle with fine-grained visual interactions.」、「Finding 2: Accurate visual grounding significantly determines the success rate of GUI task execution.」は現在のGUIエージェント開発の方向性とも合致している。