2026年1月5日 – arXiv最新論文の紹介

A.X K1, EXAONE, VAETKI, HyperCLOVAX, Solar Open, IQuest Coder, TeleChat3-MoE, SenseNova-MARS

Manusの買収（？）など先週も大きなニュースがあったが、韓国の科学技術情報通信部が独自AI基盤モデル第1回発表会を開催 – ChosunBizは興味深かった。下記のモデルに関する発表があったよう。

アップステージのソーラ・オープン100Bが中国モデル類似疑惑で公開検証へ – ChosunBizという指摘もあるようだが、ソブリンAIの開発は重要であるし、また、公開モデルの方向性としても要注目。（何をソブリンAIとして定義するかは悩ましい問題でもある。）

上記とは別に、IQuest Coderのような高性能モデルが公開、TELECHAT3やSenseNova-MARSといった強力なLLM、推論・検索フレームワークについても発表が相次いでおり、今年も熱い状況が続きそう。

IQuest_Coder_Technical_Report
IQuest-Coder-V1シリーズは、コード大規模言語モデル（LLMs）の新しいファミリーであり、ソフトウェアロジックの動的進化を捉える多段階トレーニングパラダイムを提案しています。このモデルは、事前トレーニングから専門的な中間トレーニング、二つのポストトレーニングパスを経て高度なコードインテリジェンスを実現し、エージェント的なソフトウェアエンジニアリングや競技プログラミングにおいて最先端の性能を達成しています。さらに、リカレントメカニズムを導入したIQuest-Coder-V1-Loopは、モデルの能力と展開サイズの最適化のトレードオフを改善するためのアーキテクチャ的な進化を提供します。

Training Report of TeleChat3-MoE [77.9]
この技術的レポートは、主に、フロンティアモデルサイズへの信頼性と効率的なスケーリングを可能にする、基礎となるトレーニングインフラストラクチャを提示する。本稿では,ハードウェアプラットフォーム間の整合性を確保するため,演算子レベルとエンドツーエンドの数値検証精度の体系的手法を詳述する。解析的推定と整数線形プログラミングを利用した並列化フレームワークも提案され,多次元並列化の構成を最適化する。
論文参考訳（メタデータ） (Tue, 30 Dec 2025 11:42:14 GMT)
リポジトリはGitHub – Tele-AI/TeleChat3

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning [57.1]
SenseNova-MARSは、Multimodal Agentic Reasoning and Searchフレームワークである。画像検索、テキスト検索、画像収穫ツールを動的に統合し、知識集約型視覚理解の課題に対処する。 SenseNova-MARSは、オープンソースの検索ときめ細かい画像理解ベンチマークで最先端のパフォーマンスを達成する。
論文参考訳（メタデータ） (Tue, 30 Dec 2025 16:31:45 GMT)
リポジトリはGitHub – OpenSenseNova/SenseNova-MARS

Training AI Co-Scientists Using Rubric Rewards [36.9]
AIの共同研究者の重要な特徴は、目的と制約のセットから研究計画を生成する能力である。本研究では,既存の研究論文の膨大なコーパスを活用して,より良い研究計画を生み出す言語モデルを訓練する方法について検討する。複数のドメインにわたる論文から研究目標と目標固有のグルーブを自動抽出することで、スケーラブルで多様なトレーニングコーパスを構築します。
論文参考訳（メタデータ） (Mon, 29 Dec 2025 18:59:33 GMT)
「we leverage existing scientific papers to improve language models at generating research plans for diverse open-ended research goals. We propose a scalable training procedure that uses a language model to extract research goals and grading rubrics from papers, and trains the plan generator with self-grading using the goal-specific rubrics as privileged information.」とのことで既存の研究論文を用いてLRMの研究計画再生能力を強化。 Qwen-3-30B-A3B-Instructベースであることを考えると「The obtained performance makes our 30B model competitive with Grok-4-Thinking (xAI, 2025), though it remains behind the best performing model, GPT-5-Thinking (OpenAI, 2025).」は健闘しているように思える。
データセットが公開されている　facebook/research-plan-gen · Datasets at Hugging Face

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models [78.7]
ネイティブエージェントインテリジェンスと高い計算効率を調和させる軽量言語モデルであるYoutu-LLMを紹介する。 Youtu-LLMは、スクラッチから体系的に推論と計画能力の育成まで事前訓練されている。
論文参考訳（メタデータ） (Wed, 31 Dec 2025 04:25:11 GMT)
「Youtu-LLM significantly outperforms existing state-of-the-art models of similar scale across both general- purpose (Figure 2) and agentic benchmarks (Figure 1), and in several settings, rivals substantially larger models. Beyond performance gains, our analyses provide the first systematic evidence that agentic pre- training can unlock agent potential in lightweight LLMs, revealing phenomena such as scalable growth of agent capabilities.」と小規模、エージェント向けのモデルの提案。オンデバイスを狙うとエージェント関連の能力を保ったままの小型化が重要であり「We propose a principled training paradigm that enhances native agentic capabilities through innovations in tokenizer design, data allocation, and multi-stage learning, guided by an agent-centric philosophy.」とあるように狙って強化することもできるよう。
リポジトリはGitHub – TencentCloudADP/youtu-tip: Youtu-Tip: Tap for Intelligence, Keep on Device.、モデルはYoutu – a tencent Collection

Yume-1.5: A Text-Controlled Interactive World Generation Model [78.9]
Methodは、単一の画像やテキストプロンプトから現実的でインタラクティブで連続的な世界を生成するように設計された新しいフレームワークである。メソッドは、キーボードベースの生成世界を探索するフレームワークを慎重に設計し、これを実現している。
論文参考訳（メタデータ） (Fri, 26 Dec 2025 17:52:49 GMT)
「we present Yume1.5, an interactive world generation model that enables infinite video generation from a single input image through autoregressive synthesis while supporting intuitive keyboard-based camera control.」、「The key innovations of Yume1.5 include: (1) a joint temporal-spatial-channel modeling approach that enables efficient long video generation while maintaining temporal coherence; (2) an acceleration method that mitigates error accumulation during inference; and (3) text-controlled world event generation capability achieved through careful architectural design and mixed-dataset training.」とのこと。動画生成系、world modelにつながる研究。夢、世界（GitHub – Lixsp11/sekai-codebase: [NeurIPS 2025] The official repository of “Sekai: A Video Dataset towards World Exploration”）とネーミングも面白い。
リポジトリはGitHub – stdstu12/YUME: The official code of Yume、モデルはstdstu123/Yume-5B-720P · Hugging Face