{"id":6445,"date":"2025-03-26T05:38:00","date_gmt":"2025-03-25T20:38:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=6445"},"modified":"2025-03-26T05:38:00","modified_gmt":"2025-03-25T20:38:00","slug":"mmlu-prox-a-multilingual-benchmark-for-advanced-large-language-model-evaluation","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=6445","title":{"rendered":"MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation\u00a0<\/strong>[60.5]<br>MMLU-ProX\u306f\u3001\u8a00\u8a9e\u6bce\u306b\u7d0411,829\u306e\u8cea\u554f\u3092\u6301\u3064\u300113\u306e\u578b\u7684\u591a\u69d8\u8a00\u8a9e\u3092\u30ab\u30d0\u30fc\u3059\u308b\u5305\u62ec\u7684\u306a\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3067\u3042\u308b\u3002 5\u30b7\u30e7\u30c3\u30c8\u30c1\u30a7\u30fc\u30f3(CoT)\u3068\u30bc\u30ed\u30b7\u30e7\u30c3\u30c8\u30d7\u30ed\u30f3\u30d7\u30c8\u6226\u7565\u3092\u7528\u3044\u306625\u306e\u6700\u5148\u7aef\u306e\u5927\u898f\u6a21\u8a00\u8a9e\u30e2\u30c7\u30eb(LLM)\u3092\u8a55\u4fa1\u3057,\u8a00\u8a9e\u7684\u30fb\u6587\u5316\u7684\u5883\u754c\u3092\u8d8a\u3048\u3066\u305d\u306e\u6027\u80fd\u3092\u89e3\u6790\u3057\u305f\u3002 \u6211\u3005\u306e\u5b9f\u9a13\u306f\u3001\u30cf\u30a4\u30ea\u30bd\u30fc\u30b9\u8a00\u8a9e\u304b\u3089\u4f4e\u30ea\u30bd\u30fc\u30b9\u8a00\u8a9e\u3078\u306e\u4e00\u8cab\u3057\u305f\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u52a3\u5316\u3092\u793a\u3057\u3001\u6700\u9ad8\u306e\u30e2\u30c7\u30eb\u306f\u82f1\u8a9e\u306770%\u4ee5\u4e0a\u306e\u7cbe\u5ea6\u3092\u9054\u6210\u3057\u3066\u3044\u308b\u304c\u3001Swahili\u306e\u3088\u3046\u306a\u8a00\u8a9e\u3067\u306f40%\u7a0b\u5ea6\u306b\u307e\u3067\u4f4e\u4e0b\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2503.10497v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2503.10497v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 13 Mar 2025 15:59:20 GMT)<\/li>\n\n\n\n<li>\u300cMMLU-ProX extends the challenging MMLU-Pro benchmark to encompass 13 typologically diverse languages: English (EN), Chinese (ZH), Japanese (JA), Korean (KO), French (FR), German (DE), Spanish (ES), Portuguese (PT), Arabic (AR), Thai (TH), Hindi (HI), Bengali (BN), and Swahili (SW).\u300d\u3001\u300cBy carefully translating the same set of questions across all languages, MMLU-ProX facilitates direct comparison of model performance across linguistic boundaries while controlling for question dif\ufb01culty.\u300d\u3068\u3044\u3046\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3002\u591a\u8a00\u8a9e\u3067\u8a55\u4fa1\u53ef\u80fd\u306a\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3092\u4f7f\u3046\u3068\u8a00\u8a9e\u9593\u5dee\u7570\u304c\u3088\u304f\u308f\u304b\u308b\u3002<\/li>\n\n\n\n<li>\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u30b5\u30a4\u30c8\u306f<a href=\"https:\/\/mmluprox.github.io\/\">MMLU-ProX: A Multilingual Benchmark for Advanced LLM Evaluation<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[267,491],"class_list":["post-6445","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-multilingual","tag-491"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6445","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6445"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6445\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6445"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6445"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6445"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}