{"id":7985,"date":"2026-01-07T04:58:00","date_gmt":"2026-01-06T19:58:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7985"},"modified":"2026-01-04T07:22:47","modified_gmt":"2026-01-03T22:22:47","slug":"7985","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7985","title":{"rendered":"OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models\u00a0<\/strong>[54.4]<br>\u30af\u30ed\u30b9\u30d7\u30e9\u30c3\u30c8\u30d5\u30a9\u30fc\u30e0GUI\u6279\u5224\u30c7\u30fc\u30bf\u306e\u305f\u3081\u306e\u30b9\u30b1\u30fc\u30e9\u30d6\u30eb\u306a\u30c7\u30fc\u30bf\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u3001\u6559\u5e2b\u4ed8\u304d\u5fae\u8abf\u6574\u3068\u4e00\u8cab\u6027\u4fdd\u8b77\u30b0\u30eb\u30fc\u30d7\u306b\u3088\u308b\u76f8\u5bfe\u7684\u306a\u30dd\u30ea\u30b7\u30fc\u6700\u9069\u5316\u3092\u7d44\u307f\u5408\u308f\u305b\u305f2\u6bb5\u968e\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30d1\u30e9\u30c0\u30a4\u30e0\u3001\u30e2\u30d0\u30a4\u30eb\u3001Web\u3001\u30c7\u30b9\u30af\u30c8\u30c3\u30d7\u30d7\u30e9\u30c3\u30c8\u30d5\u30a9\u30fc\u30e0\u306b\u304a\u3051\u308b\u6279\u5224\u30e2\u30c7\u30eb\u306e\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u3092\u8a55\u4fa1\u3059\u308b\u305f\u3081\u306e\u7dcf\u5408\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3067\u3042\u308bOS-Critic Bench\u306e3\u3064\u306e\u30b3\u30a2\u30b3\u30f3\u30c8\u30ea\u30d3\u30e5\u30fc\u30b7\u30e7\u30f3\u3092\u7d39\u4ecb\u3057\u307e\u3059\u3002 \u7d50\u679c\u3068\u3057\u3066\u5f97\u3089\u308c\u305f\u6279\u5224\u30e2\u30c7\u30eb\u3067\u3042\u308bOS-Oracle-7B\u306f\u3001OS-Critic Bench\u4e0a\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u306eVLM\u306e\u6700\u5148\u7aef\u306e\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u3092\u9054\u6210\u3057\u3001\u30e2\u30d0\u30a4\u30eb\u30c9\u30e1\u30a4\u30f3\u306e\u30d7\u30ed\u30d7\u30e9\u30a4\u30a8\u30bf\u30ea\u30e2\u30c7\u30eb\u3092\u4e0a\u56de\u3063\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2512.16295v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2512.16295v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 18 Dec 2025 08:29:50 GMT)<\/li>\n\n\n\n<li>\u300cwe present OS-Oracle, a comprehensive framework for GUI critic models. By introducing a scalable cross-platform data pipeline, we systematically synthesize both positive and negative samples that capture di- verse GUI failure modes. Together with a two-stage training recipe combining supervised fine-tuning and consistency- preserving GRPO, our approach enables robust and generalizable critic learning across Mobile, Web, and Desktop environments. Extensive experiments demonstrate that our critic model not only achieves impressive performance on the OS-Critic Bench but also effectively enhances the reliability and task success of native GUI agents.\u300d\u3068\u306e\u3053\u3068\u3002GUI Agent\u304c\u76db\u308a\u4e0a\u304c\u308b\u4e2d\u91cd\u8981\u306a\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3001\u30e2\u30c7\u30eb\u3001\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3060\u3068\u601d\u3046\u3002<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/numbmelon\/OS-Oracle\">GitHub &#8211; numbmelon\/OS-Oracle<\/a>\u3001<a href=\"https:\/\/huggingface.co\/datasets\/OS-Copilot\/OS-Critic-Bench\">OS-Copilot\/OS-Critic-Bench \u00b7 Datasets at Hugging Face<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[84,181,517],"class_list":["post-7985","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-critic","tag-gui-agent","tag-517"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7985","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7985"}],"version-history":[{"count":2,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7985\/revisions"}],"predecessor-version":[{"id":7987,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7985\/revisions\/7987"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7985"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7985"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7985"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}