{"id":5473,"date":"2024-09-23T06:32:00","date_gmt":"2024-09-22T21:32:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=5473"},"modified":"2024-09-23T06:32:00","modified_gmt":"2024-09-22T21:32:00","slug":"qwen-2-5-qwen-2-vl-grin-moe-pixtral","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=5473","title":{"rendered":"Qwen 2.5, Qwen 2 VL, GRIN-MoE, Pixtral"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">\u69d8\u3005\u306a\u7814\u7a76\u6a5f\u95a2\u304cLLM\u3092\u69cb\u7bc9\u3057\u3066\u3044\u308b\u3002\u5148\u9031\u306e\u30cb\u30e5\u30fc\u30b9\u3068\u3057\u3066\u306f\u9ad8\u6027\u80fd\u306aLLM Qwen 2.5\u3001MoE\u69cb\u6210\u3067\u9ad8\u52b9\u7387\u306aGRIN-MoE\u3001\u30de\u30eb\u30c1\u30e2\u30fc\u30c0\u30eb\u62e1\u5f35\u306eQwen 2 VL\u3001Pixtral\u306b\u6ce8\u76ee\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/qwenlm.github.io\/blog\/qwen2.5\/\">Qwen2.5: A Party of Foundation Models! | Qwen (qwenlm.github.io)<\/a>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/qwenlm.github.io\/blog\/qwen2.5-llm\/\">Qwen2.5-LLM: Extending the boundary of LLMs | Qwen (qwenlm.github.io)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/qwenlm.github.io\/blog\/qwen2.5-coder\/\">Qwen2.5-Coder: Code More, Learn More! | Qwen (qwenlm.github.io)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/qwenlm.github.io\/blog\/qwen2.5-math\/\">Qwen2.5-Math: The world&#8217;s leading open-sourced mathematical LLMs | Qwen (qwenlm.github.io)<\/a><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/QwenLM\/Qwen2-VL\">GitHub &#8211; QwenLM\/Qwen2-VL: Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/microsoft\/grin-moe\">GitHub &#8211; microsoft\/GRIN-MoE<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/mistral.ai\/news\/pixtral-12b\/\">Announcing Pixtral 12B | Mistral AI | Frontier AI in your hands<\/a><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u30e9\u30a4\u30bb\u30f3\u30b9\u306f\u69d8\u3005\u3067\u3042\u308b\u3053\u3068\u306b\u6ce8\u610f\u304c\u5fc5\u8981\u3060\u304c\u3001\u30e2\u30c7\u30eb\u81ea\u4f53\u306f\u516c\u958b\u3055\u308c\u3066\u3044\u308b\u3002\u5546\u7528API\u4ee5\u5916\u306b\u9078\u629e\u80a2\u304c\u5e83\u304c\u3063\u3066\u3044\u308b\u3002\u307e\u305f\u3001\u305d\u308c\u305e\u308c\u69d8\u3005\u306a\u72d9\u3044\u3092\u6301\u3063\u305f\u30e2\u30c7\u30eb\u3068\u306a\u3063\u3066\u3044\u3066\u6b63\u76f4\u8a55\u4fa1\u3092\u884c\u3046\u3053\u3068\u3082\u7c21\u5358\u3067\u306f\u306a\u3044\u3002\u81ea\u5206\u304c\u3084\u308a\u305f\u3044\u3053\u3068\u306b\u30d5\u30a3\u30c3\u30c8\u3059\u308b\u30d9\u30fc\u30b9\u30e2\u30c7\u30eb\u3001\u6d3b\u7528\u65b9\u6cd5\u3092\u30b5\u30b8\u30a7\u30b9\u30c8\u3059\u308bAI\u304c\u6b32\u3057\u3044\u4eca\u65e5\u3053\u306e\u9803\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u30e2\u30c7\u30eb\u69cb\u7bc9\u3001fine tuning\u306e\u89b3\u70b9\u3067\u3082\u591a\u304f\u306e\u60c5\u5831\u304c\u516c\u958b\u3055\u308c\u3066\u304a\u308a\u3068\u3066\u3082\u8208\u5473\u6df1\u3044\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Qwen2.5-Coder Technical Report&nbsp;<\/strong>[100.7]<br>\u5148\u4ee3\u306eCodeQwen1.5\u304b\u3089\u5927\u5e45\u306b\u30a2\u30c3\u30d7\u30b0\u30ec\u30fc\u30c9\u3055\u308c\u305fQwen2.5-Coder\u30b7\u30ea\u30fc\u30ba\u3092\u7d39\u4ecb\u3057\u307e\u3059\u3002 \u30b3\u30fc\u30c9\u56fa\u6709\u306e\u30e2\u30c7\u30eb\u3068\u3057\u3066\u3001Qwen2.5-Coder\u306fQwen2.5\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u306b\u57fa\u3065\u3044\u3066\u69cb\u7bc9\u3055\u308c\u30015.5\u5146\u4ee5\u4e0a\u306e\u30c8\u30fc\u30af\u30f3\u304b\u3089\u306a\u308b\u5de8\u5927\u306a\u30b3\u30fc\u30d1\u30b9\u3067\u4e8b\u524d\u8a13\u7df4\u3055\u308c\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2409.12186v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2409.12186v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Wed, 18 Sep 2024 17:57:57 GMT)<\/li>\n\n\n\n<li>\u300cTo ensure the quality of the pre-training data, we have curated a dataset by collecting public code data and extracting high-quality code-related content from web texts, while filtering out low-quality data using advanced classifiers.<br>\u300d\u3068\u30d5\u30a3\u30eb\u30bf\u30ea\u30f3\u30b0\u306e\u91cd\u8981\u6027\u3092\u5f37\u8abf\u3002\u30c7\u30fc\u30bf\u5408\u6210\u306b\u3082\u89e6\u308c\u3089\u308c\u3066\u3044\u308b\u304cMATH\u3068\u7570\u306a\u308a\u30ea\u30a2\u30eb\u30c7\u30fc\u30bf\u304c\u8c4a\u5bcc\u306b\u3042\u308b\u304b\u3089\uff1f<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement&nbsp;<\/strong>[71.5]<br>Qwen2.5-Math \u3068 Qwen2.5-Math-Instruct-1.5B\/7B\/72B \u3067\u3042\u308b\u3002 Qwen2.5-Math-Instruct\u306f\u4e2d\u56fd\u8a9e\u3068\u82f1\u8a9e\u306e\u4e21\u65b9\u3092\u30b5\u30dd\u30fc\u30c8\u3057\u3001\u9ad8\u5ea6\u306a\u6570\u5b66\u7684\u63a8\u8ad6\u80fd\u529b\u3092\u6301\u3063\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2409.12122v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2409.12122v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Wed, 18 Sep 2024 16:45:37 GMT)<\/li>\n\n\n\n<li>\u300cIn this report, we introduce Qwen2.5-Math, which features several key technical highlights: (1) extensive use of synthesized mathematical data from Qwen2-Math during the pre-training phase, (2) iterative generation of fine-tuning data and reinforcement training guided by the reward model during the post-training and inference phase and (3) support for bilingual (English and Chinese) queries, along with chain-of-thought and tool-integrated reasoning capabilities.\u300d\u3068\u5408\u6210\u30c7\u30fc\u30bf\u3068self improvement\u7684\u306a\u52d5\u304d\u306e\u52b9\u679c\u304c\u8208\u5473\u6df1\u3044<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Qwen2-VL: Enhancing Vision-Language Model&#8217;s Perception of the World at Any Resolution&nbsp;<\/strong>[82.4]<br>\u672c\u7a3f\u3067\u306f,\u5f93\u6765\u306eQwen-VL\u30e2\u30c7\u30eb\u306e\u30a2\u30c3\u30d7\u30b0\u30ec\u30fc\u30c9\u3067\u3042\u308bQwen2-VL\u30b7\u30ea\u30fc\u30ba\u3092\u7d39\u4ecb\u3059\u308b\u3002 Qwen2-VL\u3067\u306f\u3001\u3055\u307e\u3056\u307e\u306a\u89e3\u50cf\u5ea6\u306e\u753b\u50cf\u3092\u7570\u306a\u308b\u6570\u306e\u30d3\u30b8\u30e5\u30a2\u30eb\u30c8\u30fc\u30af\u30f3\u306b\u51e6\u7406\u53ef\u80fd\u306b\u3059\u308b\u3001Naive Dynamic Resolution\u30e1\u30ab\u30cb\u30ba\u30e0\u304c\u5c0e\u5165\u3055\u308c\u3066\u3044\u308b\u3002 \u307e\u305f\u3001Multimodal Rotary Position Embedding (M-RoPE)\u3092\u7d71\u5408\u3057\u3001\u30c6\u30ad\u30b9\u30c8\u3001\u753b\u50cf\u3001\u30d3\u30c7\u30aa\u9593\u3067\u4f4d\u7f6e\u60c5\u5831\u306e\u52b9\u679c\u7684\u306a\u878d\u5408\u3092\u5bb9\u6613\u306b\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2409.12191v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2409.12191v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Wed, 18 Sep 2024 17:59:32 GMT)<\/li>\n\n\n\n<li>\u300cQwen2-VL series introduces naive dynamic resolution and multimodal rotary position embedding (M-RoPE) to fuse information across modals effectively and be capable of understanding videos over 20 minutes in length.\u300d\u3001\u300cFurthermore, Qwen2-VL now supports understanding multilingual texts within images, including most European languages, Japanese, Korean, Arabic, Vietnamese, and others.\u300d\u3068\u52d5\u753b\u5bfe\u5fdc\u3001\u65e5\u672c\u8a9e\u5bfe\u5fdc\u3068\u5f37\u529b\u306a\u30de\u30eb\u30c1\u30e2\u30fc\u30c0\u30eb\u30e2\u30c7\u30eb\u3002<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>GRIN: GRadient-INformed MoE&nbsp;<\/strong>[132.9]<br>Mixture-of-Experts (MoE)\u30e2\u30c7\u30eb\u306f\u3001\u30a8\u30ad\u30b9\u30d1\u30fc\u30c8\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u306b\u3088\u308b\u30b9\u30d1\u30fc\u30b9\u8a08\u7b97\u306b\u3088\u308a\u3001\u5bc6\u5ea6\u306e\u9ad8\u3044\u30e2\u30c7\u30eb\u3088\u308a\u3082\u52b9\u679c\u7684\u306b\u30b9\u30b1\u30fc\u30eb\u3059\u308b\u3002 \u30a8\u30ad\u30b9\u30d1\u30fc\u30c8\u30eb\u30fc\u30c6\u30a3\u30f3\u30b0\u306e\u305f\u3081\u306e\u30b9\u30d1\u30fc\u30b9\u52fe\u914d\u63a8\u5b9a\u3092\u7d44\u307f\u8fbc\u3093\u3060GRIN(GRadient-Informed MoE Training)\u3092\u5c0e\u5165\u3059\u308b\u3002 \u6211\u3005\u306e\u30e2\u30c7\u30eb\u306f6.6B\u306e\u6d3b\u6027\u5316\u30d1\u30e9\u30e1\u30fc\u30bf\u3057\u304b\u6301\u305f\u306a\u3044\u304c\u30017B\u306e\u5bc6\u5ea6\u30e2\u30c7\u30eb\u3088\u308a\u512a\u308c\u3066\u304a\u308a\u3001\u540c\u3058\u30c7\u30fc\u30bf\u3067\u8a13\u7df4\u3055\u308c\u305f14B\u306e\u5bc6\u5ea6\u30e2\u30c7\u30eb\u306e\u6027\u80fd\u3068\u4e00\u81f4\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2409.12136v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2409.12136v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Wed, 18 Sep 2024 17:00:20 GMT)<\/li>\n\n\n\n<li>\u300cWe propose SparseMixer-v2 to estimate the gradient related to expert routing, while the conventional MoE training treats expert gating as a proxy for the gradient estimation.\u300d\u3001\u300cWe scale MoE training with neither expert parallelism nor token dropping, while the conventional MoE training employs expert parallelism and deploys token dropping.\u300d\u3092\u7279\u5fb4\u3068\u3059\u308bMoE\u306e\u6539\u5584<\/li>\n\n\n\n<li>MoE\u69cb\u6210\u3067\u3082\u610f\u5916\u3068Expert\u306b\u306a\u3089\u306a\u3044\u3068\u3044\u3046\u5831\u544a\u3092\u8aad\u3093\u3060\u8a18\u61b6\u304c\u3042\u308b\u304c\u300cOur study seems to verify our hypothesis that expert networks in GRIN MoE have developed highly-specialized and heterogeneous expertise.\u300d\u3068\u3044\u3046\u8a18\u8f09\u304c\u8208\u5473\u6df1\u3044\u3002<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Pixtral 12B\u00a0<\/strong>[56.8]<br>12\u30d3\u30ea\u30aa\u30f3\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u30de\u30eb\u30c1\u30e2\u30fc\u30c0\u30eb\u8a00\u8a9e\u30e2\u30c7\u30eb\u3067\u3042\u308bPixtral-12B\u3092\u5c0e\u5165\u3059\u308b\u3002 Pixtral-12B\u306f\u3001\u81ea\u7136\u753b\u50cf\u3068\u6587\u66f8\u306e\u4e21\u65b9\u3092\u7406\u89e3\u3059\u308b\u305f\u3081\u306b\u8a13\u7df4\u3055\u308c\u3066\u3044\u308b\u3002 \u591a\u304f\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u30e2\u30c7\u30eb\u3068\u306f\u7570\u306a\u308a\u3001Pixtral\u306f\u305d\u306e\u30b5\u30a4\u30ba\u306b\u5bfe\u3059\u308b\u6700\u5148\u7aef\u306e\u30c6\u30ad\u30b9\u30c8\u30e2\u30c7\u30eb\u3067\u3082\u3042\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2410.07073v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2410.07073v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 09 Oct 2024 17:16:22 GMT)<\/li>\n\n\n\n<li><a href=\"https:\/\/mistral.ai\/news\/pixtral-12b\/\">Announcing Pixtral 12B | Mistral AI | Frontier AI in your hands<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/mistralai\/mistral-evals\/\">GitHub &#8211; mistralai\/mistral-evals<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u69d8\u3005\u306a\u7814\u7a76\u6a5f\u95a2\u304cLLM\u3092\u69cb\u7bc9\u3057\u3066\u3044\u308b\u3002\u5148\u9031\u306e\u30cb\u30e5\u30fc\u30b9\u3068\u3057\u3066\u306f\u9ad8\u6027\u80fd\u306aLLM Qwen 2.5\u3001MoE\u69cb\u6210\u3067\u9ad8\u52b9\u7387\u306aGRIN-MoE\u3001\u30de\u30eb\u30c1\u30e2\u30fc\u30c0\u30eb\u62e1\u5f35\u306eQwen 2 VL\u3001Pixtral\u306b\u6ce8\u76ee\u3002 \u30e9\u30a4\u30bb\u30f3\u30b9\u306f\u69d8\u3005\u3067\u3042\u308b\u3053 &hellip; <a href=\"https:\/\/devneko.jp\/wordpress\/?p=5473\" class=\"more-link\"><span class=\"screen-reader-text\">&#8220;Qwen 2.5, Qwen 2 VL, GRIN-MoE, Pixtral&#8221; \u306e<\/span>\u7d9a\u304d\u3092\u8aad\u3080<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[223,356,390],"class_list":["post-5473","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-llm","tag-self-x","tag-synthetic-data"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5473","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5473"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5473\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5473"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5473"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5473"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}