{"id":5558,"date":"2024-10-07T03:42:00","date_gmt":"2024-10-06T18:42:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=5558"},"modified":"2024-10-07T03:42:00","modified_gmt":"2024-10-06T18:42:00","slug":"evaluation-of-openai-o1-opportunities-and-challenges-of-agi","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=5558","title":{"rendered":"Evaluation of OpenAI o1: Opportunities and Challenges of AGI \/ On The Planning Abilities of OpenAI&#8217;s o1 Models: Feasibility, Optimality, and Generalizability"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Evaluation of OpenAI o1: Opportunities and Challenges of AGI&nbsp;<\/strong>[112.1]<br>o1-preview\u306f\u76ee\u899a\u307e\u3057\u3044\u80fd\u529b\u3092\u793a\u3057\u3001\u3057\u3070\u3057\u3070\u4eba\u9593\u30ec\u30d9\u30eb\u307e\u305f\u306f\u512a\u308c\u305f\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u3092\u5b9f\u73fe\u3057\u305f\u3002 \u3053\u306e\u30e2\u30c7\u30eb\u306f\u3001\u69d8\u3005\u306a\u5206\u91ce\u306b\u308f\u305f\u308b\u8907\u96d1\u306a\u63a8\u8ad6\u3068\u77e5\u8b58\u306e\u7d71\u5408\u3092\u5fc5\u8981\u3068\u3059\u308b\u30bf\u30b9\u30af\u306b\u512a\u308c\u3066\u3044\u305f\u3002 \u7dcf\u5408\u7684\u306a\u7d50\u679c\u306f\u3001\u4eba\u5de5\u77e5\u80fd\u3078\u306e\u5927\u304d\u306a\u9032\u6b69\u3092\u793a\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2409.18486v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2409.18486v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Fri, 27 Sep 2024 06:57:00 GMT)<\/li>\n\n\n\n<li>OpenAI o1\u306e\u8a73\u7d30\u306a\u691c\u8a3c\u3002\u300cAdvanced Reasoning Capabilities: o1-preview demonstrated exceptional logical reasoning abilities in multiple fields, including high school mathematics, quantitative investing, and chip design\u300d\u3001\u300cDomain-Specific Knowledge: The model exhibited impressive knowledge breadth across diverse fields such as medical genetics, radiology, anthropology, and geology.\u300d\u3001\u300cIt often performed at a level comparable to or exceeding that of graduate students or early-career professionals in these domains.\u300d\u3068\u9ad8\u3044\u884c\u304c\u3055\u308c\u3066\u3044\u308b\u3002\u4e00\u65b9\u3067\u300cHowever, it still lacks the flexibility and adaptability of human experts in these fields.\u300d\u3001\u300cIt demonstrated the ability to capture complex expressions like irony and sarcasm, though it still struggles with very subtle emotional nuances.\u300d\u3068\u3044\u3046\u6307\u6458\u3082\u3002<\/li>\n\n\n\n<li>\u95a2\u308f\u3063\u3066\u3044\u308b\u65b9\u3082\u591a\u304f\u4ed6\u5206\u91ce\u304b\u3089\u306e\u8a73\u7d30\u306a\u691c\u8a3c\u7d50\u679c\u3001\u975e\u5e38\u306b\u53c2\u8003\u306b\u306a\u308b\u3002<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>On The Planning Abilities of OpenAI&#8217;s o1 Models: Feasibility, Optimality, and Generalizability&nbsp;<\/strong>[59.7]<br>\u3055\u307e\u3056\u307e\u306a\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u30bf\u30b9\u30af\u3067OpenAI\u306eo1\u30e2\u30c7\u30eb\u306e\u8a08\u753b\u80fd\u529b\u3092\u8a55\u4fa1\u3059\u308b\u3002 \u305d\u306e\u7d50\u679c,o1-preview \u306f GPT-4 \u3088\u308a\u3082\u30bf\u30b9\u30af\u5236\u7d04\u306b\u9806\u5fdc\u3057\u3066\u3044\u308b\u3053\u3068\u304c\u308f\u304b\u3063\u305f\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2409.19924v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2409.19924v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Mon, 30 Sep 2024 03:58:43 GMT)<\/li>\n\n\n\n<li>\u8a08\u753b\u80fd\u529b\u3092\u5bfe\u8c61\u3068\u3057\u305fo1\u306e\u8a55\u4fa1\u3002GPT-4o\u3068\u6bd4\u3079\u3066\u512a\u308c\u3066\u3044\u308b\u3068\u306e\u3053\u3068\u3002<\/li>\n\n\n\n<li>1. Understanding the Problem\u30012. Following Constraints\u30013. State and Memory Management\u30014. Reasoning and Generalization\u3067Findings\u304c\u307e\u3068\u3081\u3089\u308c\u3066\u3044\u308b\u3002\u3044\u305a\u308c\u3082\u5f37\u529b\u3060\u304c\u30013.\u306b\u3064\u3044\u3066\u306f\u300cas problem complexity increased, the model\u2019s state management became less reliable, particularly in tasks involving spatial reasoning across multiple dimensions.\u300d\u30014.\u306b\u3064\u3044\u3066\u306f\u300cWhile o1-preview showed some promise in its generalization ability, particularly in structured environments like Grippers, its performance in more abstract tasks like Termes revealed substantial limitations. The model struggled with reasoning under conditions where actions and outcomes were less directly tied to the natural language representation of the task, highlighting an area for future improvements.\u300d\u3068\u3044\u3046\u6307\u6458\u3082<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1\u00a0<\/strong>[20.1]<br>o1 \u306f OpenAI \u306e\u65b0\u3057\u3044\u30b7\u30b9\u30c6\u30e0\u3067,\u5f93\u6765\u306e LLM \u3068\u7570\u306a\u308a,\u63a8\u8ad6\u306b\u6700\u9069\u5316\u3055\u308c\u3066\u3044\u308b\u3002 \u591a\u304f\u306e\u5834\u5408\u3001o1 \u306f\u5f93\u6765\u306e LLM \u3088\u308a\u3082\u5927\u5e45\u306b\u512a\u308c\u3066\u304a\u308a\u3001\u7279\u306b\u5171\u901a\u30bf\u30b9\u30af\u306e\u7a00\u306a\u5909\u7a2e\u306b\u5bfe\u3057\u3066\u5927\u304d\u306a\u6539\u5584\u304c\u52a0\u3048\u3089\u308c\u3066\u3044\u308b\u3002 \u3057\u304b\u3057\u3001o1\u306f\u4ee5\u524d\u306e\u30b7\u30b9\u30c6\u30e0\u3067\u89b3\u6e2c\u3057\u305f\u306e\u3068\u540c\u3058\u5b9a\u6027\u7684\u50be\u5411\u3092\u793a\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2410.01792v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2410.01792v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 02 Oct 2024 17:50:19 GMT)<\/li>\n\n\n\n<li>\u300cOn many of the tasks we considered, o1 performed substantially better than the LLMs we had previously evaluated, with particularly strong results on rare variants of common tasks. However, it still qualitatively showed both of the central types of probability sensitivity discussed in McCoy et al (2023): sensitivity to output probability and sensitivity to task frequency.\u300d\u3068\u3044\u3046\u6307\u6458\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[232,283],"class_list":["post-5558","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-lrm","tag-o1"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5558","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5558"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5558\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5558"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5558"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5558"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}