{"id":5960,"date":"2024-12-30T06:50:00","date_gmt":"2024-12-29T21:50:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=5960"},"modified":"2024-12-30T06:50:00","modified_gmt":"2024-12-29T21:50:00","slug":"deepseek-v3-qvq-72b-preview-yulan-mini","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=5960","title":{"rendered":"DeepSeek v3, QVQ-72B-Preview, YuLan-Mini"},"content":{"rendered":"\n<p>\u516c\u958b\u30e2\u30c7\u30eb\u3082\u9ad8\u6027\u80fd\u5316\u304c\u7d9a\u3044\u3066\u3044\u308b\u3002DeepSeek v3\u306f671B\u3068\u975e\u5e38\u306b\u5927\u304d\u306a\u30e2\u30c7\u30eb\uff08\u3060\u304c\u3001\u30a2\u30af\u30c6\u30a3\u30d6\u30d1\u30e9\u30e1\u30fc\u30bf\u306f37B\u306eMoE\uff09\u3067GPT-4o\u3084Claude 3.5 Sonnet\u7af6\u5408\u3092\u4e3b\u5f35\u3002 <a href=\"https:\/\/github.com\/deepseek-ai\/DeepSeek-V3\">GitHub &#8211; deepseek-ai\/DeepSeek-V3<\/a><\/p>\n\n\n\n<p>QVQ-72B-Preview\u306f<a href=\"https:\/\/devneko.jp\/wordpress\/?p=5473\">Qwen 2.5, Qwen 2 VL, GRIN-MoE, Pixtral \u2013 arXiv\u6700\u65b0\u8ad6\u6587\u306e\u7d39\u4ecb<\/a>\u306eQwen2 VL\u304b\u3089\u63a8\u8ad6\u80fd\u529b\u3092\u5f37\u5316\u3001GPT-4o\u3060\u3051\u3067\u306a\u304f\u30bf\u30b9\u30af\u306b\u3088\u3063\u3066\u306fOpenAI o1\u3068\u7af6\u5408\u3059\u308b\u6027\u80fd\u3092\u4e3b\u5f35\u3002<a href=\"https:\/\/qwenlm.github.io\/blog\/qvq-72b-preview\/\">QVQ: To See the World with Wisdom | Qwen<\/a><\/p>\n\n\n\n<p>YuLan-Mini\u306f2.42B\u30011.08T\u30c8\u30fc\u30af\u30f3\u3067\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3068\u6bd4\u8f03\u7684\u5c0f\u898f\u6a21\u3060\u304c\u3001\u7af6\u5408\u3059\u308b\u516c\u958b\u30e2\u30c7\u30eb\u3092\u4e0a\u56de\u308b\u6027\u80fd\u3092\u4e3b\u5f35\u3002<a href=\"https:\/\/github.com\/RUC-GSAI\/YuLan-Mini\/blob\/main\/README_ja.md\">YuLan-Mini\/README_ja.md at main \u00b7 RUC-GSAI\/YuLan-Mini \u00b7 GitHub<\/a><\/p>\n\n\n\n<p>\u4e2d\u56fd\u306e\u7814\u7a76\u6a5f\u95a2\u306f\u30e2\u30c7\u30eb\u3084\u624b\u6cd5\u3092\u304b\u306a\u308a\u516c\u958b\u3057\u3066\u304f\u308c\u3066\u3044\u308b\u5370\u8c61\u3002\u975e\u5e38\u306b\u3042\u308a\u304c\u305f\u3044\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>YuLan-Mini: An Open Data-efficient Language Model&nbsp;<\/strong>[111.0]<br>2.42B\u30d1\u30e9\u30e1\u30fc\u30bf\u3092\u6301\u3064\u9ad8\u3044\u80fd\u529b\u3092\u6301\u3064\u30d9\u30fc\u30b9\u30e2\u30c7\u30eb\u3067\u3042\u308bYuLan-Mini\u306f\u3001\u540c\u69d8\u306e\u30d1\u30e9\u30e1\u30fc\u30bf\u30b9\u30b1\u30fc\u30eb\u306e\u30e2\u30c7\u30eb\u3067\u4e0a\u4f4d\u5c64\u306e\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u3092\u5b9f\u73fe\u3059\u308b\u3002 \u6ce8\u76ee\u3059\u3079\u304d\u306f\u30011.08T\u30c8\u30fc\u30af\u30f3\u3067\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3055\u308c\u305fYuLan-Mini\u306f\u3001\u306f\u308b\u304b\u306b\u591a\u304f\u306e\u30c7\u30fc\u30bf\u3092\u5fc5\u8981\u3068\u3059\u308b\u696d\u754c\u4e3b\u5c0e\u306e\u30e2\u30c7\u30eb\u306b\u5339\u6575\u3059\u308b\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u3092\u9054\u6210\u3059\u308b\u3053\u3068\u3060\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2412.17743v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2412.17743v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Mon, 23 Dec 2024 17:47:53 GMT)<\/li>\n\n\n\n<li>\u300cOur approach includes three major contributions to enhance training efficacy: (1) an elaborately designed data pipeline that combines data cleaning with data schedule strategies; (2) a systematic optimization method that can effectively mitigate training instability; (3) an effective annealing approach that integrate targeted data selection and long context training.\u300d\u3068\u306e\u3053\u3068\u3002<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>DeepSeek-V3 Technical Report&nbsp;<\/strong>[147.2]<br>We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token\u3002 \u6211\u3005\u306f14.8\u5146\u306e\u591a\u69d8\u6027\u3068\u9ad8\u54c1\u8cea\u306e\u30c8\u30fc\u30af\u30f3\u3067DeepSeek-V3\u3092\u4e8b\u524d\u8a13\u7df4\u3057\u3001\u305d\u306e\u5f8c\u306bSupervised Fine-Tuning and Reinforcement Learning\u306e\u30b9\u30c6\u30fc\u30b8\u3092\u53d7\u8b1b\u3057\u305f\u3002 \u5305\u62ec\u7684\u306a\u8a55\u4fa1\u306b\u3088\u308b\u3068\u3001DeepSeek-V3\u306f\u4ed6\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u30e2\u30c7\u30eb\u3088\u308a\u3082\u512a\u308c\u3066\u304a\u308a\u3001\u4e3b\u8981\u306a\u30af\u30ed\u30fc\u30ba\u30c9\u30bd\u30fc\u30b9\u30e2\u30c7\u30eb\u306b\u5339\u6575\u3059\u308b\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u3092\u5b9f\u73fe\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2412.19437v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2412.19437v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Fri, 27 Dec 2024 04:03:16 GMT)<\/li>\n\n\n\n<li>\u300cDuring the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pretraining stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M.\u300d\u3068\u3068\u3066\u3082\u30b3\u30b9\u30c8\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u304c\u826f\u3044\u3002\u3082\u3063\u3068\u3082\u300cNote that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.\u300d<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code&nbsp;<\/strong>[123.7]<br>\u672c\u7a3f\u3067\u306f,\u82f1\u8a9e,\u30d5\u30a3\u30f3\u30e9\u30f3\u30c9\u8a9e,\u30d2\u30f3\u30c7\u30a3\u30fc\u8a9e,\u65e5\u672c\u8a9e,\u30d9\u30c8\u30ca\u30e0\u8a9e,\u30b3\u30fc\u30c9\u306b\u57fa\u3065\u304f15B\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u591a\u8a00\u8a9e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u30e2\u30c7\u30eb\u3067\u3042\u308bAurora-M\u3092\u63d0\u6848\u3059\u308b\u3002 \u3053\u308c\u306f\u3001\u4eba\u9593\u304c\u30ec\u30d3\u30e5\u30fc\u3057\u305f\u5b89\u5168\u547d\u4ee4\u3092\u5fae\u8abf\u6574\u3057\u305f\u521d\u3081\u3066\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u591a\u8a00\u8a9e\u30e2\u30c7\u30eb\u3067\u3042\u308b\u3002 \u6211\u3005\u306fAurora-M\u3092\u5e45\u5e83\u3044\u30bf\u30b9\u30af\u3084\u8a00\u8a9e\u3067\u8a55\u4fa1\u3057\u3001\u7834\u6ec5\u7684\u306a\u5fd8\u308c\u7269\u306b\u5bfe\u3059\u308b\u9811\u5065\u3055\u3092\u793a\u3057\u305f\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2404.00399v3\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2404.00399v3\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Fri, 27 Dec 2024 03:53:21 GMT)<\/li>\n\n\n\n<li><a href=\"https:\/\/huggingface.co\/aurora-m\/aurora-m-biden-harris-redteamed\">aurora-m\/aurora-m-biden-harris-redteamed \u00b7 Hugging Face<\/a> \u3053\u3046\u3044\u3063\u305f\u30e2\u30c7\u30eb\u3082\u5b58\u5728\u3002\u5bfe\u5fdc\u8a00\u8a9e\u306b\u65e5\u672c\u8a9e\u304c\u660e\u8a18\u3055\u308c\u3066\u3044\u308b\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u516c\u958b\u30e2\u30c7\u30eb\u3082\u9ad8\u6027\u80fd\u5316\u304c\u7d9a\u3044\u3066\u3044\u308b\u3002DeepSeek v3\u306f671B\u3068\u975e\u5e38\u306b\u5927\u304d\u306a\u30e2\u30c7\u30eb\uff08\u3060\u304c\u3001\u30a2\u30af\u30c6\u30a3\u30d6\u30d1\u30e9\u30e1\u30fc\u30bf\u306f37B\u306eMoE\uff09\u3067GPT-4o\u3084Claude 3.5 Sonnet\u7af6\u5408\u3092\u4e3b\u5f35\u3002 GitHub &#038;#821 &hellip; <a href=\"https:\/\/devneko.jp\/wordpress\/?p=5960\" class=\"more-link\"><span class=\"screen-reader-text\">&#8220;DeepSeek v3, QVQ-72B-Preview, YuLan-Mini&#8221; \u306e<\/span>\u7d9a\u304d\u3092\u8aad\u3080<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[223,251,293],"class_list":["post-5960","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-llm","tag-mllm","tag-oss"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5960","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5960"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5960\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5960"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5960"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5960"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}