{"id":5731,"date":"2024-11-11T05:58:00","date_gmt":"2024-11-10T20:58:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=5731"},"modified":"2024-11-11T05:58:00","modified_gmt":"2024-11-10T20:58:00","slug":"agent-k","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=5731","title":{"rendered":"Agent K"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level\u00a0<\/strong>[73.1]<br>\u6211\u3005\u306f\u3001\u30a8\u30f3\u30c9\u30c4\u30fc\u30a8\u30f3\u30c9\u306e\u81ea\u5f8b\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30b9\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3067\u3042\u308bAgent K v1.0\u3092\u7d39\u4ecb\u3059\u308b\u3002 \u7d4c\u9a13\u304b\u3089\u5b66\u3076\u3053\u3068\u306b\u3088\u3063\u3066\u3001\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30b9\u306e\u30e9\u30a4\u30d5\u30b5\u30a4\u30af\u30eb\u5168\u4f53\u3092\u7ba1\u7406\u3059\u308b\u3002 \u30ad\u30fc\u60c5\u5831\u3092\u9078\u629e\u7684\u306b\u4fdd\u5b58\u3057\u3066\u691c\u7d22\u3059\u308b\u3053\u3068\u3067\u3001\u9577\u671f\u8a18\u61b6\u3068\u77ed\u671f\u8a18\u61b6\u3092\u6700\u9069\u5316\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2411.03562v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2411.03562v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Tue, 05 Nov 2024 23:55:23 GMT)<\/li>\n\n\n\n<li>\u300cour results indicate that Agent K v1.0 has reached a performance level equivalent to Kaggle Grandmaster, with a record of 6 gold medals, 3 silver medals, and 7 bronze medals\u300d\u3068Kaggle\u306e\u30b0\u30e9\u30f3\u30c9\u30de\u30b9\u30bf\u30fc\u4e26\u307f\u3092\u4e3b\u5f35\u3059\u308b\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u30b7\u30b9\u30c6\u30e0\u306e\u63d0\u6848\u3002<\/li>\n\n\n\n<li>\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u69cb\u6210\u3084\u30d7\u30ed\u30f3\u30d7\u30c8\u306a\u3069\u53c2\u8003\u306b\u306a\u308b\u70b9\u306f\u591a\u3044\u304c\u3001\u300cHowever, because this assessment relies on a custom split of the training data rather than the competition\u2019s actual private test set, it remains uncertain whether an agent\u2019s high ranking in this context would align with results on the original Kaggle leaderboard.\u300d\u3068\u3044\u3046\u8a18\u8f09\u3084Leak\u306e\u53ef\u80fd\u6027\u306a\u3069\u300c\u307b\u3093\u307e\u304b\u3044\u306a\u300d\u3068\u3044\u3046\u7591\u554f\u70b9\u306f\u306a\u304f\u306f\u306a\u3044\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[42],"class_list":["post-5731","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-autonomous-agent"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5731","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5731"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5731\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5731"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5731"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5731"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}