2025年11月10日 – arXiv最新論文の紹介

Kimi K2 Thinking, LongCat-Flash-Omni, iFlyBot-VLA, Nemotron Nano V2 VL

先週も様々な公開モデルやテクニカルレポートの公開があった。非常に進展が速くフロンティアモデルに迫るものが公開されている凄い状況である。

Kimi K2 Thinking（Kimi K2 Thinking、moonshotai/Kimi-K2-Thinking · Hugging Face）は一部ベンチマークでGPT=5などフロンティアモデルを超える性能を主張するモデル。1Tパラメータ、Active 32BはGrok 4, Phi4-mini-Flash-Reasoning, SmolLM3, Kimi-K2, T5Gemma – arXiv最新論文の紹介の時と同じで「Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity’s Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls.」とのこと。

マルチモーダルモデルとしてはLongCat-Flash-Omni（meituan-longcat/LongCat-Flash-Omni · Hugging Face）, iFlyBot-VLA（iFlyBot-VLA Tech Report、iFlyBot/iFlyBotVLM · Hugging Face）, Nemotron Nano V2 VL（nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1 · Hugging Face）のテクニカルレポートが公開されていた。

LongCat-Flash-Omni Technical Report [131.5]
LongCat-Flash-Omniは5600億のパラメータを持つオープンソースのOmni-modalモデルである。 LongCat-Flash-Omniは強力なunimodal機能を維持しながら、包括的なマルチモーダル機能を実現する。低レイテンシのリアルタイムオーディオ・ビジュアルインタラクションを実現する。
論文参考訳（メタデータ） (Fri, 31 Oct 2025 21:58:15 GMT)
560B、Active 27Bのマルチモーダルモデル、一部ベンチマークではGemini 2.5 Proを超えるなど高性能な公開モデル
GitHub – meituan-longcat/LongCat-Flash-Omni: This is the official repo for the paper “LongCat-Flash-Omni Technical Report”

iFlyBot-VLA Technical Report [25.3]
iFlyBot-VLA(iFlyBot-VLA)は、新しいフレームワークでトレーニングされた大規模ビジョン・ランゲージ・アクション(VLA)モデルである。主なコントリビューションは,(1)大規模人体とロボットの操作映像を徹底的に訓練した潜在行動モデル,(2)視覚言語モデル(VLM)と訓練中のアクションエキスパートを協調的に監督する2段階の行動表現フレームワーク,(3)ロボット軌道データと一般的なQAデータセットと空間QAデータセットを組み合わせた混合トレーニング戦略である。
論文参考訳（メタデータ） (Sat, 01 Nov 2025 06:24:56 GMT)
iFlyTechのVLAモデル、「The architecture of iFlyBot-VLA consists primarily of a language transformer backbone and an action expert network. The model generates executable robot actions through a combination of explicit and implicit planning.」とのこと
iFlyBot/iFlyBotVLM · Hugging Face

NVIDIA Nemotron Nano V2 VL [134.5]
ネモトロン・ナノV2VLは、マンバ・トランスフォーマーのハイブリッドLLMであるネモトロン・ナノV2上に構築される。 BF16、FP8、FP4フォーマットでモデルチェックポイントをリリースしています。
論文参考訳（メタデータ） (Thu, 06 Nov 2025 00:10:19 GMT)
「Nemotron Nano V2 VL delivers significant improvements over our previous model, Llama-3.1-Nemotron-Nano-VL-8B, across all vision and text domains through major enhancements in model architecture, datasets, and training recipes. Nemotron Nano V2 VL builds on Nemotron Nano V2, a hybrid Mamba-Transformer LLM, and innovative token reduction techniques to achieve higher inference throughput in long document and video scenarios.」とハイブリッド構成なマルチモーダルモデル
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1 · Hugging Face

World Simulation with Video Foundation Models for Physical AI

World Simulation with Video Foundation Models for Physical AI [181.8]
我々は,[Cosmos-Predict2.5]と[Cosmos-Transfer2.5]を,エンボディインテリジェンスをスケールするための汎用ツールとしてリリースする。我々はNVIDIA Open Model Licenseの下で、ソースコード、事前訓練されたチェックポイント、およびキュレートされたベンチマークをリリースします。
論文参考訳（メタデータ） (Tue, 28 Oct 2025 22:44:13 GMT)
VLAモデル用の合成データや自動運転等で活用可能なworld simulator、Cosmos World Foundation Model Platform for Physical AI – arXiv最新論文の紹介からのアップデート。「[Cosmos-Predict2.5] and [Cosmos-Transfer2.5], the latest Cosmos video world foundation models for Physical AI」
プロジェクトサイトはDeep Imagination Research | NVIDIA、リポジトリはGitHub – nvidia-cosmos/cosmos-predict2.5: Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.

A Survey on Efficient Large Language Model Training: From Data-centric Perspectives

A Survey on Efficient Large Language Model Training: From Data-centric Perspectives [42.9]
本稿では,データ中心の観点から学習後のデータ効率の高い大規模言語モデルに関する最初の体系的な調査を示す。本稿では,データ選択,データ品質向上,合成データ生成,データ蒸留・圧縮,自己進化型データエコシステムを対象とする,データ効率の高いLCMポストトレーニング手法の分類法を提案する。我々の研究が、大規模モデルトレーニングにおけるデータ利用の可能性の最大化に、さらなる探究を促すことを願っています。
論文参考訳（メタデータ） (Wed, 29 Oct 2025 17:01:55 GMT)
「We propose a taxonomy of data-efficient LLM post-training methods, covering data selection, data quality enhancement, synthetic data generation, data distillation and compression, and self-evolving data ecosystems. We summarize representative approaches in each category and outline future research directions.」というサーベイ。
リポジトリはGitHub – luo-junyu/Awesome-Data-Efficient-LLM: A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective

Diffusion Language Models are Super Data Learners

Diffusion Language Models are Super Data Learners [61.7]
ユニークなデータが限られている場合、拡散言語モデル(DLM)は、よりエポックなトレーニングによって、常に自己回帰モデル(AR)を上回ります。本研究の目的は,(1) 任意の次数モデリング,(2) 反復的双方向 denoising からの超高次計算,(3) モンテカルロ増分という3つの複合的要因に起因する。
論文参考訳（メタデータ） (Wed, 05 Nov 2025 08:17:42 GMT)
「The main empirical finding is a Crossover: when total training tokens are fixed but the number of unique tokens is limited, DLMs consistently surpass equally sized AR counterparts. This crossover is not an isolated artifact—it systematically shifts with core factors.　With more unique data, it shifts later; with higher data quality, it shifts later; with larger models, the crossover arrives earlier; and it persists across dense and sparse (MoE) architectures (Figures 2, 3, 4). Under compute-bound settings with abundant unique data, AR recovers its edge by fitting the data more rapidly; but in data-bound regimes, which is our focus and, increasingly, the practical reality, DLM is the final winner.」との主張。Diffusion Beats Autoregressive in Data-Constrained Settings – arXiv最新論文の紹介の主張とも整合的であるように思う。
プロジェクトサイトはDiffusion Language Models are Super Data Learners、リポジトリはGitHub – JinjieNi/dlms-are-super-data-learners: The official github repo for “Diffusion Language Models are Super Data Learners”.

同著者の下記論文も興味深い。

Training Optimal Large Diffusion Language Models [61.7]
拡散言語モデル(DLM)の最初の体系的スケーリング法則であるQuokkaを紹介する。この結果が、DLMのトレーニングにおける短期的な実践的なガイダンスと、AIコミュニティ全体の長期的なインスピレーションをもたらすことを期待しています。
論文参考訳（メタデータ） (Wed, 05 Nov 2025 08:32:08 GMT)
リポジトリはGitHub – JinjieNi/Quokka: The official github repo for “Training Optimal Large Diffusion Language Models”, the first-ever large-scale diffusion language models scaling law..

2025年11月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30