Grok 4.1, Gemini 3Pro, GPT-5.1 Pro / Codex , Nano Banana Pro (Gemini Image Pro), Olmo 3, Step-Audio-R1, Omnilingual ASR

先週はフロンティアモデルレベルでの激戦がよくわかる週であった。Grok 4.1（Grok 4.1 | xAI）、Gemini3 Pro（Gemini 3 Pro – Google DeepMind、GPT-5.1 Pro（XユーザーのOpenAIさん: 「GPT-5.1 Pro is rolling out today to all Pro users. It delivers clearer, more capable answers for complex work, with strong gains in writing help, data science, and business tasks.」 / X）GPT-5.1-Codex-Max（Building more with GPT-5.1-Codex-Max | OpenAI）と大きな発表が相次いだ。公式のベンチマーク結果の他、様々な方が検証を行っていて、個人的にも検証をしているが、LLM/LRMの性能アップはまだいけるのではないか、と期待の持てる結果になっている。

Step-Audio-R1 Technical Report [70.4]
本稿では,音声領域における推論能力の解放に成功した最初の音声推論モデルであるStep-Audio-R1を紹介する。私たちのモデルは、Gemini 2.5 Proを抜いて、最先端のGemini 3 Proに匹敵するパフォーマンスを実現した、強力なオーディオ推論能力を示しています。
論文参考訳（メタデータ） (Wed, 19 Nov 2025 20:12:50 GMT)
Gemini 3 Proとも競合を主張、「Step-Audio-R1, the first audio reasoning model that successfully unlocks reasoning capabilities in the audio domain」
リポジトリはGitHub – stepfun-ai/Step-Audio-R1

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages [76.1]
大規模自動音声認識システムであるOmnilingual ASRを紹介する。自己教師付き事前学習を7Bパラメータに拡張し、堅牢な音声表現を学習する。 ASRが提供しなかった500以上の言語を含む1,600以上の言語にカバー範囲を広げている。
論文参考訳（メタデータ） (Fri, 14 Nov 2025 01:04:28 GMT)
「Omnilingual ASR illustrates how scaling methods, when combined with deliberate data collection and new architectural innovation, can reshape the trajectory of multilingual ASR. The project not only extends coverage to more than 1,600 languages, with over 500 represented for the first time in any ASR system, but also reframes how coverage itself is conceived.」と非常に多くの言語をカバーするモデル
リポジトリはGitHub – facebookresearch/omnilingual-asr: Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

コメントを残す コメントをキャンセル