Qwen3-Next-80B-A3B, Qwen3-ASR, Hunyuan-MT, MMBERT

先週の大きなニュースは非常に疎な構成を持ち性能の高いQwen/Qwen3-Next-80B-A3B-Instruct · Hugging Faceの発表だろうと思う。DeepSeekなども同様にMoE構成ではとてもスパースな構造をとることが流行っている。Qwenからはマルチリンガルな音声認識モデルQwen-ASRも発表されている。周辺領域もしっかりと作っている印象。

Hunyuan-MTはHunyuanをベースとした機械翻訳モデルである。特化型大規模言語モデル『PLaMo翻訳』を公開しました – Preferred Networks Research & Developmentもだが、LLMベースのものは非常に強力である。

最後にマルチリンガルなencoder onlyモデル、MMBERTも発表されていた。decoder onlyなLLM全盛という感じではあるが、分類など実用的なタスクでは今でも重要なアプローチである。

Hunyuan-MT Technical Report [20.9]
Hunyuan-MT-7Bは33の主要言語にまたがる双方向翻訳をサポートしている。 Hunyuan-MT-Chimera-7Bは、スローシンキングモードにインスパイアされた翻訳モデルである。
論文参考訳（メタデータ） (Fri, 05 Sep 2025 16:11:05 GMT)
「The development of our models follows a holistic training process specifically engineered for multilingual translation, which begins with general and MT-oriented pre-training to build foundational capabilities, proceeds to Supervised Fine-Tuning (SFT) for task-specific adaptation, and culminates in advanced alignment through Reinforcement Learning (RL) and weak-to-strong RL.」とあるがそれぞれのパイプラインもとても凝っている。
リポジトリはtencent/Hunyuan-MT-7B · Hugging Face

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning [57.6]
mmBERTは、多言語テキストの3Tトークンで事前訓練されたエンコーダのみの言語モデルである。データに1700以上の低リソース言語を追加しています。分類および検索タスクにおける従来のモデルよりも, mmBERTの方が優れていたことを示す。
論文参考訳（メタデータ） (Mon, 08 Sep 2025 17:08:42 GMT)
「We do this by pre-training our new model suite, MMBERT, on 3T tokens of multilingual text using an architecture inspired from ModernBERT (Warner et al , 2024).」というマルチリンガルBERT。
リポジトリはGitHub – JHU-CLSP/mmBERT: A massively multilingual modern encoder language model

月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル