{"id":7755,"date":"2025-11-12T06:26:00","date_gmt":"2025-11-11T21:26:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7755"},"modified":"2025-11-09T09:30:35","modified_gmt":"2025-11-09T00:30:35","slug":"memsearcher-training-llms-to-reason-search-and-manage-memory-via-end-to-end-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7755","title":{"rendered":"MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning\u00a0<\/strong>[73.3]<br>\u672c\u7a3f\u3067\u306f,\u30e1\u30e2\u30ea\u3092\u53cd\u5fa9\u7684\u306b\u4fdd\u6301\u3057,\u73fe\u5728\u306e\u30bf\u30fc\u30f3\u3068\u7d44\u307f\u5408\u308f\u305b\u305f\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u30ef\u30fc\u30af\u30d5\u30ed\u30fc\u3067\u3042\u308bMemSearcher\u3092\u63d0\u6848\u3059\u308b\u3002 \u305d\u308c\u305e\u308c\u306e\u30bf\u30fc\u30f3\u3067\u3001MemSearcher\u306f\u30e6\u30fc\u30b6\u30fc\u306e\u8cea\u554f\u3092\u30e1\u30e2\u30ea\u306b\u878d\u5408\u3055\u305b\u3001\u63a8\u8ad6\u30c8\u30ec\u30fc\u30b9\u3092\u751f\u6210\u3057\u3001\u691c\u7d22\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u5b9f\u884c\u3057\u3001\u30e1\u30e2\u30ea\u3092\u66f4\u65b0\u3057\u3066\u30bf\u30b9\u30af\u306e\u89e3\u6c7a\u306b\u5fc5\u8981\u306a\u60c5\u5831\u306e\u307f\u3092\u4fdd\u6301\u3059\u308b\u3002 \u6211\u3005\u306f,MemSearcher Agents \u306e\u63a8\u8ad6,\u691c\u7d22\u6226\u7565,\u30e1\u30e2\u30ea\u7ba1\u7406\u3092\u5354\u8abf\u7684\u306b\u6700\u9069\u5316\u3059\u308b,\u30a8\u30f3\u30c9\u30c4\u30fc\u30a8\u30f3\u30c9\u306e RL \u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3067\u3042\u308b Multi-context GRPO \u3092\u7d39\u4ecb\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2511.02805v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2511.02805v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Tue, 04 Nov 2025 18:27:39 GMT)<\/li>\n\n\n\n<li>\u300cWe introduce MemSearcher, an agentic workflow that leverages the backbone LLM as a memory manager to iteratively maintain a compact memory, preserving only the essential information necessary for answering the user\u2019s question and thereby eliminating the need to append the entire interaction history to the LLM context. \u2022 We develop search agents based on MemSearcher, and utilize multi-context GRPO, a natural extension of GRPO, to optimize LLMs to reason, leverage search engines and manage memory simultaneously.\u300d\u3068\u30e1\u30e2\u30ea\u95a2\u9023\u306e\u6a5f\u80fd\u5c3e\u3092\u3046\u307e\u304f\u6271\u3048\u308b\u3088\u3046\u306b\u5f37\u5316\u5b66\u7fd2\u3055\u308c\u305f\u30e2\u30c7\u30eb\u306e\u63d0\u6848\u3002\u300cMemSearcher based on Qwen2.5-3B-Instruct achieves a higher average score than other methods based on Qwen2.5-7B-Instruct.\u300d\u3068\u52b9\u679c\u3092\u78ba\u8a8d\u3002<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/icip-cas\/MemSearcher\">GitHub &#8211; icip-cas\/MemSearcher<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[244],"class_list":["post-7755","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-memory"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7755","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7755"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7755\/revisions"}],"predecessor-version":[{"id":7756,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7755\/revisions\/7756"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7755"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7755"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7755"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}