2026年2月23日 – arXiv最新論文の紹介

GLM-5: from Vibe Coding to Agentic Engineering

GLM-5: from Vibe Coding to Agentic Engineering [223.2]
GLM-5は,バイブ符号化のパラダイムをエージェント工学に移行するために設計された次世代基盤モデルである。 GLM-5は、前任者のエージェント、推論、コーディング(ARC)能力に基づいており、長いコンテキストの忠実さを維持しながら、トレーニングと推論のコストを大幅に削減するためにDSAを採用している。
論文参考訳（メタデータ） (Tue, 17 Feb 2026 17:50:56 GMT)
GLMの最新モデル、744B / 40B Activeの構成、使用した学習データ量も28.5TBに増加。フロンティアモデルと呼べる性能。先週はQwen/Qwen3.5-397B-A17B · Hugging Faceも話題となった。商用モデルでもGemini 3.1 Pro、Sonnet 4.6の公開もあり、また、OpenAIの対抗も噂されている。性能の向上が続いている。
タイトルの「from Vibe Coding to Agentic Engineering」は「We describe the transition from vibe coding (human prompting) to agentic engineering. In vibe coding, a human prompts an AI model to write code. In agentic engineering, AI agents write the code themselves. They plan, implement, and iterate.」と解説されている。
リポジトリはGitHub – zai-org/GLM-5: GLM-5: From Vibe Coding to Agentic Engineering

World Action Models are Zero-shot Policies [111.9]
本稿では,予めトレーニングされたビデオ拡散バックボーン上に構築されたワールドアクションモデル(WAM)であるDreamZeroを紹介する。ビデオとアクションを共同でモデリングすることで、DreamZeroは異種ロボットデータから多様なスキルを効果的に学習する。ビデオのみによる他のロボットや人間によるデモは、目に見えないタスクのパフォーマンスに対して42%以上の相対的な改善をもたらす。
論文参考訳（メタデータ） (Tue, 17 Feb 2026 15:04:02 GMT)
「By jointly predicting video and action, World Action Models (WAMs) inherit world physics priors that enable 1) effective learning from diverse, non-repetitive data, 2) open-world generalization,3) cross-embodiment learning from video-only data, and 4) few-shot adaptation to new robots.」とのことで、ビデオ合成を活用したもの。ゆえに「 we enable a 14B autoregressive video diffusion model to perform real-time closed-loop control at 7Hz.」と高速改善。
プロジェクトサイトはDreamZero: World Action Models are Zero-shot Policies

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution [32.9]
我々は、高速かつスムーズなリアルタイム実行のために最適化された高度な視覚言語アクション(VLA)モデルであるXiaomi-Robotics-0を紹介する。 Xiaomi-Robotics-0は、大規模なクロス・エボディメント・ロボット軌道と視覚言語データに事前訓練された。我々はXiaomi-Robotics-0をシミュレーションベンチマークで広範囲に評価し、正確で巧妙なバイマニュアル操作を必要とする2つの挑戦的な実ロボットタスクについて検討した。
論文参考訳（メタデータ） (Fri, 13 Feb 2026 07:30:43 GMT)
XiaomiによるVLAモデル。「Our robot trajectory data are sourced from multiple open-sourced robot datasets (e g , DROID [23] and MolmoAct [26]) as well as in-house data collected by ourselves. Our in-house data consists of teleoperated trajectories for two challenging tasks: Lego Disassembly and Towel Folding. In total, we collected 338 and 400 hours of data for these two tasks, respectively.」とこちらはデータを作りにいっている。
リポジトリはXiaomi-Robotics-0

Self-evolving Embodied AI [31.5]
エンボディード・人工知能(英語: Embodied Artificial Intelligence、AI)は、エージェントとその環境によって、能動的知覚、具体的認知、行動相互作用を通じて形成されるインテリジェントなシステムである。本稿では,エージェントが変化状態と環境に基づいて動作する新たなパラダイムである,自己進化型エンボディAIを紹介する。
論文参考訳（メタデータ） (Wed, 04 Feb 2026 10:40:34 GMT)
Embodiedかつ自己進化するAIに関する紹介、サーベイ。夢物語ではなくなっている点に驚く。