Grok 4, Phi4-mini-Flash-Reasoning, SmolLM3, Kimi-K2, T5Gemma

先週も様々なモデルが発表されたが、注目は様々なベンチマークで強力な性能を主張するGrok 4だろう（Grok 4 | xAI）。Humanity’s Last Examで44.4%と非常に強力に見える。

オープンなモデルとしてはモデル構造が面白いPhi4-mini-Flash-Reasoning（Reasoning reimagined: Introducing Phi-4-mini-flash-reasoning | Microsoft Azure Blog、論文は後述）、HuggingFaceの小型モデルSmolLM3（SmolLM3, GitHub – huggingface/smollm: Everything about the SmolLM and SmolVLM family of models）、総パラメータ1T / 32 B Activeと極端なMoE構成で非常に高性能なKimi-K2（GitHub – MoonshotAI/Kimi-K2: Kimi K2 is the large language model series developed by Moonshot AI team、Kimi K2）など興味深い発表が相次いだ。また、T5Gemma: A new collection of encoder-decoder Gemma models – Google Developers Blogにも要注目。Decoder onlyでないアーキテクチャの良さが現れるタスクも多そうに思う。

Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation [52.2]
我々は,デコーダのみの大規模言語モデルをエンコーダ-デコーダモデルに適応させるという,新しい問題を研究する。適応はデコーダのみのLLMの能力を継承するだけでなく、計算の需要を減らすことができると主張している。同様の推論予算の下では、エンコーダ-デコーダ LLM は(しばしばより優れた)事前訓練性能を達成できるが、デコーダのみの性能よりもはるかに優れた微調整性能が得られる。
論文参考訳（メタデータ） (Tue, 08 Apr 2025 17:13:41 GMT)

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation [129.5]
我々は、レイヤ間の効率的なメモリ共有のためのシンプルで効果的なメカニズムであるGated Memory Unit(GMU)を紹介した。これは、GMUを組み込んでSambaベースのセルフデコーダからメモリ読み出し状態を共有するデコーダ・ハイブリッド・デコーダアーキテクチャである。
論文参考訳（メタデータ） (Wed, 09 Jul 2025 07:27:00 GMT)
Phi4-mini-Flash-Reasoningの論文
「Our decoder-hybrid-decoder architecture taking Samba [RLL+25] as the self-decoder. Gated Memory Units (GMUs) are interleaved with the cross-attention layers in the cross-decoder to reduce the decoding complexity. As in YOCO [SDZ+24], the full attention layer only need to compute the KV cache during prefilling with the self-decoder, leading to linear computation complexity for the prefill stage.」と計算量的に有利なアーキテクチャでLRMに適しているように見える。

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities [1584.5]
Gemini 2.5 Proは私たちの最も有能なモデルであり、フロンティアコーディングと推論ベンチマークでSoTAのパフォーマンスを実現しています。 Gemini 2.5 Flashは計算とレイテンシの要求のごく一部で優れた推論機能を提供する。 Gemini 2.0 FlashとFlash-Liteは低レイテンシと低コストでハイパフォーマンスを提供する。
論文参考訳（メタデータ） (Mon, 07 Jul 2025 17:36:04 GMT)
Gemini 2.5の論文も出ていた。共著者の人数がすごい（3300人以上）。

SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity’s Last Exam? [47.2]
本稿では,人間研究者をエミュレートするツール強化推論エージェントであるX-Masterを紹介する。 XマスターズはHumanity’s Last Examに32.1%のスコアで最新記録を樹立した。
論文参考訳（メタデータ） (Mon, 07 Jul 2025 17:50:52 GMT)
Agenticなアプローチ＋DeepSeek-R1-0528でHumanity’s Last Exam 32.1%を達成という報告。ベースモデルとしてGrok 4を使った場合のスコアが気になるところ。
リポジトリはGitHub – sjtu-sai-agents/X-Master: Official implementation of X-Master, a general-purpose tool-augmented reasoning agent.

コメントを残す コメントをキャンセル