{"id":4606,"date":"2024-03-25T04:54:00","date_gmt":"2024-03-24T19:54:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=4606"},"modified":"2024-03-25T04:54:00","modified_gmt":"2024-03-24T19:54:00","slug":"perl-parameter-efficient-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=4606","title":{"rendered":"PERL: Parameter Efficient Reinforcement Learning"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>PERL: Parameter Efficient Reinforcement Learning from Human Feedback\u00a0<\/strong>[27.7]<br>RLHF(Reinforcement Learning from Human Feedback)\u306f\u3001\u5927\u898f\u6a21\u8a00\u8a9e\u30e2\u30c7\u30eb\u3068\u4eba\u9593\u306e\u597d\u307f\u3092\u7d50\u3073\u3064\u3051\u308b\u5f37\u529b\u306a\u624b\u6cd5\u3067\u3042\u308b\u3053\u3068\u304c\u8a3c\u660e\u3055\u308c\u3066\u3044\u308b\u3002 \u672c\u7a3f\u3067\u306f,Hu\u3089\u306b\u3088\u3063\u3066\u5c0e\u5165\u3055\u308c\u305fLoRA(Lo-Rank Adaptation)\u306e\u30d1\u30e9\u30e1\u30fc\u30bf\u52b9\u7387\u5411\u4e0a\u624b\u6cd5\u3092\u7528\u3044\u3066,\u57fa\u790e\u3068\u306a\u308b\u30e2\u30c7\u30eb\u3092\u5b66\u7fd2\u3059\u308bRLHF\u306b\u3064\u3044\u3066\u691c\u8a0e\u3059\u308b\u3002 PERL\u306f\u5f93\u6765\u306eRLHF\u8a2d\u5b9a\u3068\u540c\u7b49\u306b\u52d5\u4f5c\u3057\u3001\u9ad8\u901f\u304b\u3064\u5c11\u306a\u3044\u30e1\u30e2\u30ea\u3067\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3092\u884c\u3046\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2403.10704v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2403.10704v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Fri, 15 Mar 2024 21:43:46 GMT)<\/li>\n\n\n\n<li>LoRA(Lo-Rank Adaptation)\uff0bReinforcement Learning from Human Feedback (RLHF)\u3001\u300cThrough extensive experiments on various datasets, we have shown that this method achieves comparable results to conventional RLHF, for which all the model parameters are tuned, while reducing memory usage by approx 50%, and speeding up the training by up to 90% for the Reward Model training, and more modest memory savings of 20%, and speed-up of 10% in the RL loop.\u300d\u3068\u306e\u3053\u3068\u3067\u52b9\u679c\u7684\u306e\u3088\u3046\u3002\u5e83\u7bc4\u306a\u5b9f\u9a13\u304c\u3055\u308c\u3066\u304a\u308a\u975e\u5e38\u306b\u53c2\u8003\u306b\u306a\u308b\u3002<\/li>\n\n\n\n<li>\ud83d\udc4d\u3068\ud83d\udc4e\u3067\u8a55\u4fa1\u3055\u308c\u305f\u300c<a href=\"https:\/\/github.com\/google-research-datasets\/Taskmaster\/tree\/master\/TM-4-2024\">Taskmaster\/TM-4-2024 at master \u00b7 google-research-datasets\/Taskmaster \u00b7 GitHub<\/a>\u300d\u300c<a href=\"https:\/\/github.com\/google-research-datasets\/Taskmaster\/tree\/master\/TM-3-2020\">Taskmaster\/TM-3-2020 at master \u00b7 google-research-datasets\/Taskmaster \u00b7 GitHub<\/a>\u300d\u3068\u3044\u30462\u3064\u306e\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u304c\u516c\u958b\u3055\u308c\u3066\u3044\u308b\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[227,340,551],"class_list":["post-4606","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-lora","tag-rlhf","tag-551"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/4606","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4606"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/4606\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4606"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4606"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4606"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}