Qwen3, Phi-4 reasoning, MiMo 7B, OLMo2 1B, Mellum 4B

先週はオープンなモデルのニュースが多かった。その中でもQwen3は大きなニュースである（Qwen3: Think Deeper, Act Faster | Qwen）。MoEなQwen3-235B-A22B, Qwen3-30B-A3Bの他、denseなQwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, Qwen3-0.6Bが公開されている（Qwen3 – a Qwen Collection）。ライセンスはApache-2。また、MicrosoftのPhi-4のreasoningモデル公開（Showcasing Phi-4-Reasoning: A Game-Changer for AI Developers | Microsoft Community Hub、huggingface）も注目。

SLMの発表も多く、XiaomiによりMiMo（GitHub – XiaomiMiMo/MiMo: MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining）、Ai2によるOLMo release notes | Ai2が興味深い。JetBrainによるMellum（Mellum Goes Open Source: A Purpose-Built LLM for Developers, Now on Hugging Face | The JetBrains Blog）は「Mellum doesn’t try to know everything. It’s designed to do one thing really well: code completion. We call it a focal model – built with purposeful depth and not concerned with chasing breadth.」とある通り特化型。現状、Mellumは十分な性能とは言い難いものの、SLMを特化して強化する、コスパを上げる方向は有望。DeepseekProver-V2の671Bは凄いが、7Bのうまい活用のような組み合わせも重要になると思う。

Phi-4-reasoning Technical Report [42.5]
Phi-4-reasoningは14ビリオンのパラメータ推論モデルであり、複雑な推論タスクにおいて高い性能を実現する。我々はPhi-4-reasoning-plusを開発した。どちらのモデルもDeepSeek-R1-Distill-Llama-70Bモデルのような大きなオープンウェイトモデルよりも優れており、完全なDeepSeek-R1モデルのパフォーマンスレベルに近づいている。
論文参考訳（メタデータ） (Wed, 30 Apr 2025 05:05:09 GMT)
Phi-4シリーズのLRM

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math [135.1]
CoT(Chain-of-Thought)は大規模言語モデル(LLM)の形式推論能力を著しく向上させるしかし、Small Language Models (SLM) における推論の改善は、モデル能力が限られているため、依然として困難である。本研究では,(1)多種多様な蒸留長CoTデータによる大規模中等教育,(2)高品質長CoTデータによる微調整,(3)厳格な選好データセットを活用したロールアウトDPO,(4)検証リワードを用いた強化学習(RL)の4段階からなるSLMの体系的トレーニングレシピを提案する。
論文参考訳（メタデータ） (Wed, 30 Apr 2025 00:04:35 GMT)
SLMを利用したreasoningモデルの構築。「The resulting Phi-4-Mini-Reasoning model exceeds, on math reasoning tasks, much larger reasoning models, e g , outperforming DeepSeek-R1-Distill-Qwen-7B by 3.2 points and DeepSeek-R1-DistillLlama-8B by 7.7 points on Math-500.」と効果を確認とのこと。
小型のモデルであってもreasoningが有効という興味深い結果。

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition [24.5]
我々はDeepSeek-Prover-V2を紹介します。このモデルは、ニューラル定理の証明における最先端のパフォーマンスを達成し、ミニF2Fテストで88.9%のパス比に達し、PutnamBenchの658問題のうち49を解決した。標準ベンチマークに加えて、325の形式化された問題の集合であるProverBenchを導入し、最近のAIMEコンペティションから選択された15の問題を含む評価を強化した。
論文参考訳（メタデータ） (Wed, 30 Apr 2025 16:57:48 GMT)
「We first prompt DeepSeek-V3 to generate a natural-language proof sketch while simultaneously formalizing it into a Lean statement with sorry placeholders for omitted proof details. A 7B prover model then recursively solves the decomposed subgoals. By combining these subgoal proofs, we construct a complete formal proof for the original complex problem.This composed proof is appended to DeepSeek-V3’s original chain-of-thought, creating high-quality cold-start training data for formal mathematical reasoning. 」
リポジトリはGitHub – deepseek-ai/DeepSeek-Prover-V2

コメントを残す コメントをキャンセル