{"id":6777,"date":"2025-05-19T05:56:00","date_gmt":"2025-05-18T20:56:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=6777"},"modified":"2025-05-17T16:12:34","modified_gmt":"2025-05-17T07:12:34","slug":"worldpm-scaling-human-preference-modeling","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=6777","title":{"rendered":"WorldPM: Scaling Human Preference Modeling\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>WorldPM: Scaling Human Preference Modeling\u00a0<\/strong>[130.2]<br>\u6211\u3005\u306f\u3001\u3053\u306e\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u306e\u53ef\u80fd\u6027\u3092\u5f37\u8abf\u3059\u308b\u305f\u3081\u306b\u3001World Preference Modeling$ (WorldPM)\u3092\u63d0\u6848\u3059\u308b\u3002 \u591a\u69d8\u306a\u30e6\u30fc\u30b6\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u3092\u30ab\u30d0\u30fc\u3059\u308b\u516c\u958b\u30d5\u30a9\u30fc\u30e9\u30e0\u304b\u3089\u9078\u597d\u30c7\u30fc\u30bf\u3092\u53ce\u96c6\u3059\u308b\u3002 1.5B\u304b\u308972B\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u7bc4\u56f2\u306715M\u30b9\u30b1\u30fc\u30eb\u306e\u30c7\u30fc\u30bf\u3092\u7528\u3044\u3066\u5e83\u7bc4\u56f2\u306a\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3092\u884c\u3046\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2505.10527v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2505.10527v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 15 May 2025 17:38:37 GMT)<\/li>\n\n\n\n<li>\u300cMotivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling.\u300d\u3068\u306e\u3053\u3068\u3002\u3055\u3089\u306b\u306f\u300cThrough evaluations on 7 benchmarks with 20 subtasks, we find that WorldPM broadly improves the generalization performance across human preference datasets of varying sizes (7K, 100K and 800K samples), with performance gains exceeding 5% on many key subtasks.\u300d\u3092\u4e3b\u5f35\u3057\u3066\u3044\u308b\u3002\u3053\u306e\u624b\u306e\u57fa\u76e4\u30e2\u30c7\u30eb\u306e\u53ef\u80fd\u6027\u306f\u8208\u5473\u6df1\u3044\uff08\u304c\u82e5\u5e72\u6016\u304f\u3082\u3042\u308b\uff09\u3002\n<ul class=\"wp-block-list\">\n<li>Appendix\u306e\u30d5\u30a3\u30eb\u30bf\u306b\u95a2\u3059\u308b\u7d50\u679c\u3001\u300cwe argue that applying RM filtering diverges from capturing world preference. Instead of assuming forum data contains noise, we should interpret apparent contradictions as manifestations of genuine human preferences, allowing models to discover underlying commonalities within these surface-level conflicts.\u300d\u3082\u9762\u767d\u3044<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/QwenLM\/WorldPM\">GitHub &#8211; QwenLM\/WorldPM<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[153,682,349],"class_list":["post-6777","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-foundation-models","tag-preference","tag-scaling-law"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6777","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6777"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6777\/revisions"}],"predecessor-version":[{"id":6778,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6777\/revisions\/6778"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6777"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6777"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6777"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}