{"id":6792,"date":"2025-05-26T05:19:00","date_gmt":"2025-05-25T20:19:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=6792"},"modified":"2025-05-24T17:39:35","modified_gmt":"2025-05-24T08:39:35","slug":"hunyuan-turbos-advancing-large-language-models-through-mamba-transformer-synergy-and-adaptive-chain-of-thought","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=6792","title":{"rendered":"Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought&nbsp;<\/strong>[190.9]<br>Hunyuan-TurboS\u306f\u3001Transformer-Mamba Mixture of Experts\u306e\u5927\u578b\u30cf\u30a4\u30d6\u30ea\u30c3\u30c9\u30e2\u30c7\u30eb\u3067\u3042\u308b\u3002 \u9ad8\u3044\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u3068\u52b9\u7387\u306e\u30d0\u30e9\u30f3\u30b9\u3092\u4fdd\u3061\u3001\u63a8\u8ad6\u30b3\u30b9\u30c8\u3092\u4f4e\u304f\u6291\u3048\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2505.15431v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2505.15431v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Wed, 21 May 2025 12:11:53 GMT)<\/li>\n\n\n\n<li>Tencent\u306b\u3088\u308bMamba hybrid\u3001MoE\u3001Adaptive CoT\u3068\u5168\u90e8\u76db\u308a\u611f\u306e\u3042\u308b\u30e2\u30c7\u30eb\uff08<a href=\"https:\/\/devneko.jp\/wordpress\/?p=6471\">Mistral Small 3.1, Hunyuan-T1 \u2013 arXiv\u6700\u65b0\u8ad6\u6587\u306e\u7d39\u4ecb<\/a>\u306b\u3082\u95a2\u9023\uff09\u3002\n<ul class=\"wp-block-list\">\n<li>Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep \u201dthinking\u201d modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF\/MF block pattern.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Mamba\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\uff08\u30cf\u30a4\u30d6\u30ea\u30c3\u30c9\uff09\u30e2\u30c7\u30eb\u3067\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u306e\u30b9\u30b3\u30a2\u3082\u975e\u5e38\u306b\u9ad8\u3044\u3002\u300cLMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345)\u300d\u3068\u306e\u3053\u3068\u3002\uff08LLM\uff1fLRM\uff1f\u3068\u3044\u3046\u7591\u554f\u306f\u3042\u308a\u3064\u3064\uff09\u500b\u5225\u30bf\u30b9\u30af\u3060\u3068\u4ed6\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u30e2\u30c7\u30eb\u3084\u5546\u7528\u30e2\u30c7\u30eb\u3092\u8d85\u3048\u3066\u3044\u308b\u3082\u306e\u3082\u3042\u308b\u3002\u30aa\u30fc\u30d7\u30f3\u306a\u6bd4\u8f03\u5bfe\u8c61\u306fLlama-4-Maverick,  DeepSeek-V3 , Qwen3-235B-A22B\u3068\u6700\u65b0\u306e\u3082\u306e\u3002<\/li>\n\n\n\n<li>\u300cThe inference of the Hunyuan-TurboS model is powered by the AngelHCF Inference Acceleration Framework. For the Mamba Hybrid architecture of the TurboS model, we have implemented optimizations across folloing three key dimensions, ultimately achieving a 1.8x speedup compared to Hunyuan-Turbo, which is a pure Transformers MoE model\u300d\u3068Mamba\u306e\u6709\u52b9\u6027\u3082\u3057\u3066\u3057\u3066\u304a\u308a\u3001\u5168\u822c\u7684\u306b\u975e\u5e38\u306b\u5148\u9032\u7684\u306a\u30e2\u30c7\u30eb\u306b\u898b\u3048\u308b\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[223,232,235],"class_list":["post-6792","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-llm","tag-lrm","tag-mamba"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6792","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6792"}],"version-history":[{"count":2,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6792\/revisions"}],"predecessor-version":[{"id":6794,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/6792\/revisions\/6794"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6792"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6792"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6792"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}