{"id":7421,"date":"2025-09-15T06:00:00","date_gmt":"2025-09-14T21:00:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7421"},"modified":"2025-09-14T09:06:30","modified_gmt":"2025-09-14T00:06:30","slug":"autonomous-code-evolution-meets-np-completeness","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7421","title":{"rendered":"Autonomous Code Evolution Meets NP-Completeness\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Autonomous Code Evolution Meets NP-Completeness\u00a0<\/strong>[9.7]<br>SATLUTION\u306fLLM\u30d9\u30fc\u30b9\u306e\u30b3\u30fc\u30c9\u9032\u5316\u3092\u5b8c\u5168\u306a\u30ea\u30dd\u30b8\u30c8\u30ea\u30b9\u30b1\u30fc\u30eb\u306b\u62e1\u5f35\u3057\u305f\u6700\u521d\u306e\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3067\u3042\u308b\u3002 \u53b3\u683c\u306a\u6b63\u5f53\u6027\u4fdd\u8a3c\u3068\u5206\u6563\u30d5\u30a3\u30fc\u30c9\u30d0\u30c3\u30af\u306e\u4e0b\u3067\u30bd\u30eb\u30d0\u30ea\u30dd\u30b8\u30c8\u30ea\u3092\u7de8\u6210\u3057\u3001\u540c\u6642\u306b\u72ec\u81ea\u306e\u9032\u5316\u30dd\u30ea\u30b7\u30fc\u3068\u30eb\u30fc\u30eb\u3092\u81ea\u5df1\u9032\u5316\u3055\u305b\u308b\u3002 SAT\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f32024\u3068\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3092\u76ae\u5207\u308a\u306bSATLUTION\u306f\u3001SAT\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f32025\u306e\u4eba\u9593\u8a2d\u8a08\u306e\u52dd\u8005\u3092\u6c7a\u5b9a\u7684\u306b\u4e0a\u56de\u3063\u305f\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2509.07367v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2509.07367v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Tue, 09 Sep 2025 03:28:06 GMT)<\/li>\n\n\n\n<li>\u300cStarting from SAT Competition 2024 codebases and benchmark, SATLUTION evolved solvers that decisively outperformed the human-designed winners of the SAT Competition 2025, and also surpassed both 2024 and 2025 champions on the 2024 benchmarks.\u300d\u3068\u30b3\u30fc\u30c9\u751f\u6210\u306e\u5f37\u529b\u3055\u3092\u5370\u8c61\u4ed8\u3051\u308b\u7d50\u679c\u3002<\/li>\n\n\n\n<li>discussion\u306b\u306f\u300cHowever, our experiments also revealed limitations. In fully automated operation\u2014what we refer to as our customized \u201cYOLO mode\u201c, distinct from the official CLI tool, the agents often struggled, and the flow proved most effective in a semi-automated setup with targeted human intervention. \u3057\u304b\u3057\u3001\u79c1\u305f\u3061\u306e\u5b9f\u9a13\u3067\u306f\u9650\u754c\u3082\u660e\u3089\u304b\u306b\u306a\u308a\u307e\u3057\u305f\u3002\u5b8c\u5168\u81ea\u52d5\u904b\u8ee2\u3001\u3064\u307e\u308a\u79c1\u305f\u3061\u304c\u300cYOLO\u30e2\u30fc\u30c9\u300d\u3068\u547c\u3076\u30ab\u30b9\u30bf\u30de\u30a4\u30ba\u3055\u308c\u305f\u8a2d\u5b9a\u3067\u306f\u3001\u516c\u5f0f\u306eCLI\u30c4\u30fc\u30eb\u3068\u306f\u7570\u306a\u308a\u3001\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306f\u3057\u3070\u3057\u3070\u82e6\u6226\u3057\u3001\u30d5\u30ed\u30fc\u306f\u7279\u5b9a\u306e\u4eba\u9593\u306e\u4ecb\u5165\u304c\u3042\u308b\u534a\u81ea\u52d5\u8a2d\u5b9a\u3067\u6700\u3082\u52b9\u679c\u7684\u3067\u3042\u308b\u3053\u3068\u304c\u5206\u304b\u308a\u307e\u3057\u305f\u3002 (score: 0.9)<\/li>\n\n\n\n<li>In particular, the agents were prone to failures in SAT\/UNSAT correctness checks and deep memory errors such as segmentation faults, where human intervention remained critical to preserve progress. While the planning capabilities of the agents were strong at the level of concrete programming tasks, they lacked sufficient domain-specific knowledge at the idea level, especially for nuanced aspects of SAT solving.\u300d\u3068\u3044\u3046\u8a18\u8f09\u3082\u3042\u308a\u3001\u30c9\u30e1\u30a4\u30f3\u77e5\u8b58\u306e\u91cd\u8981\u6027\u306f\u6307\u6458\u3055\u308c\u3066\u3044\u308b\u3002\uff08\u305f\u3060\u3001\u305d\u306e\u3046\u3061\u305d\u3053\u3082AI\u4ee3\u66ff\u3055\u308c\u305d\u3046\u306a\u6c17\u304c\u3057\u306a\u304f\u306f\u306a\u3044\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[468],"class_list":["post-7421","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-468"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7421","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7421"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7421\/revisions"}],"predecessor-version":[{"id":7422,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7421\/revisions\/7422"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7421"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7421"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7421"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}