{"id":6633,"date":"2025-04-28T04:47:00","date_gmt":"2025-04-27T19:47:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=6633"},"modified":"2025-04-28T04:47:00","modified_gmt":"2025-04-27T19:47:00","slug":"the-bitter-lesson-learned-from-2000-multilingual-benchmarks","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=6633","title":{"rendered":"The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks\u00a0<\/strong>[37.8]<br>\u672c\u7a3f\u3067\u306f148\u30ab\u56fd\u306e2000\u4ee5\u4e0a\u306e\u591a\u8a00\u8a9e(\u975e\u82f1\u8a9e)\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u306b\u3064\u3044\u3066\u691c\u8a0e\u3059\u308b\u3002 \u82f1\u8a9e\u306f\u3053\u308c\u3089\u306e\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3067\u8457\u3057\u304f\u904e\u5270\u306b\u8868\u73fe\u3055\u308c\u3066\u3044\u308b\u3002 \u307b\u3068\u3093\u3069\u306e\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u306f\u7ffb\u8a33\u3088\u308a\u3082\u30aa\u30ea\u30b8\u30ca\u30eb\u306e\u8a00\u8a9e\u30b3\u30f3\u30c6\u30f3\u30c4\u306b\u4f9d\u5b58\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2504.15521v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2504.15521v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Tue, 22 Apr 2025 01:47:37 GMT)<\/li>\n\n\n\n<li>\u591a\u8a00\u8a9e\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u306b\u5bfe\u3059\u308b\u8abf\u67fb\u5831\u544a\u3002\u300cImportantly, simply translating English benchmarks proves insufficient for robust evaluation, localized benchmarks (like CMMLU for Chinese) show substantially higher correlation with human judgments (0.68) than translated equivalents (0.47 and 0.49), highlighting the critical need for culturally and linguistically authentic evaluation resources.\u300d\u3068\u3044\u3046\u306e\u306f\u305d\u3046\u3060\u308d\u3046\u3068\u601d\u3044\u3064\u3064\u3001\u6570\u5b57\u3067\u793a\u3055\u308c\u308b\u3068\u7d0d\u5f97\u611f\u304c\u3042\u308b\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[267],"class_list":["post-6633","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-multilingual"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6633","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6633"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6633\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6633"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6633"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6633"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}