2025年9月22日 – arXiv最新論文の紹介

Pre-training under infinite compute

Pre-training under infinite compute [87.0]
本研究では、エポック数の増加とパラメータ数の増加に対するデータ制約によるアプローチが、最終的には過度に適合することを示す。独立に訓練されたモデルのアンサンブルは、正規化レシピよりもはるかに低損失の漸近を達成できる。この結果から,計算量の多い将来において,よりデータ効率の高い事前学習が実現できることが示唆された。
論文参考訳（メタデータ） (Thu, 18 Sep 2025 09:36:23 GMT)
「Our best intervention combining epoching, regularization, parameter scaling, and ensemble scaling achieves an asymptote at 200M tokens using 5.17× less data than our baseline, and our data scaling laws predict that this improvement persists at higher token budgets. We find that our data efficiency gains can be realized at much smaller parameter counts as we can distill an ensemble into a student model that is 8× smaller and retains 83% of the ensembling benefit.」とデータ枯渇の懸念に対する回答になりそうな結果。

MobileLLM-R1, APERTUS

先週はOpenAIによるICPCの成果（https://x.com/MostafaRohani/status/1968360976379703569）などが話題になった。クローズドモデルの性能向上は本当にすごい。とはいえ、Metaによる小型モデルMobileLLM-R1（facebook/MobileLLM-R1-950M · Hugging Face）やオープンかつ権利関係にも気を使い他のモデルと競合的な性能を達成しているAPERTUS など公開モデルの取り組みも興味深い状況が続く。本当に目が離せない。

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments [163.7]
Apertusは、今日のオープンモデルエコシステムにおける2つのシステム的欠点に対処するために設計された、大きな言語モデル(LLM)の完全なオープンスイートである。 Apertusモデルは、公開データにのみ事前訓練されており、ロボット.txtの除外や、非許容的で有毒で個人が特定可能なコンテンツに対するフィルタリングを尊重している。 Apertusモデルはまた、1800以上の言語から15Tトークンをトレーニングし、非英語コンテンツに割り当てられた事前トレーニングデータの40%をトレーニングしている。
論文参考訳（メタデータ） (Wed, 17 Sep 2025 17:59:21 GMT)
オープンかつ多言語、さらに権利関係にもかなり配慮しているモデル「The models are trained on 15T tokens from 1811 languages with retroactive respect for robots.txt and related opt outs, and with a Goldfish-style objective to curb verbatim reproduction of training text.」。性能もかなり高く、非常に興味深い。
モデルはswiss-ai/Apertus-70B-Instruct-2509 · Hugging Face

A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models [22.7]
時系列推論は時間を第一級軸として扱い、中間証拠を直接答えに組み込む。本調査では,一段階における直接推論,明示的な中間体による線形連鎖推論,分岐構造推論という3つのファミリーによるトポロジの推論によって,問題を定義し,文献を整理する。
論文参考訳（メタデータ） (Mon, 15 Sep 2025 04:39:50 GMT)
時系列推論に関するサーベイ。
- Reasoning Topology — execution structures:
  - Direct reasoning (single step)
  - Linear chain reasoning (sequential intermediate steps)
  - Branch-structured reasoning (exploration, feedback, and aggregation)
- Primary Objective — the main intent:
  - Traditional time series analysis (forecasting, classification, anomaly detection, segmentation)
  - Explanation and understanding (temporal QA, diagnostics, structure discovery)
  - Causal inference and decision making (counterfactuals, policy evaluation, decision support)
  - Time series generation (simulation, editing, synthesis)
リポジトリはGitHub – blacksnail789521/Time-Series-Reasoning-Survey: A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

Self-Improving Embodied Foundation Models

Self-Improving Embodied Foundation Models [21.8]
ロボット工学における2段階のポストトレーニング手法を提案する。第1段階であるSupervised Fine-Tuning (SFT) は、a) 行動クローニングとb) ステップ・トゥ・ゴーの予測目的の両方を用いたファインチューン事前訓練基礎モデルである。第2段階では、ステップ・トゥ・ゴー予測により、良好な形状の報酬関数と堅牢な成功検出器の抽出が可能となる。
論文参考訳（メタデータ） (Thu, 18 Sep 2025 17:00:08 GMT)
「1) Supervised Fine-Tuning (SFT) wherein we fine-tune EFMs using behavioral cloning as well as “steps-to-go” prediction objectives, and 2) Self-Improvement (Online RL) wherein EFMs autonomously practice downstream tasks and rapidly improve via optimizing self-predicted rewards.」というアプローチの提案（EFM= Embodied Foundation Models）。「Finally, we demonstrated that this novel combination uniquely unlocks a capability not possible by current methods: autonomously aquiring new skills that generalize far beyond the tasks covered in the imitation learning datasets. These findings highlight the transformative potential of combining pretrained foundation models with online Self- Improvement to enable autonomous skill acquisition in robotics.」と効果があったとのこと。
プロジェクトサイトはAnonymous Supplementary Videos for “On the Magic of Online Self-Improvement for Embodied Multimodal Foundation Models”

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision [26.9]
我々は、コンピュータ・アズ・教師(CaT)による調査を監督に転換することを提案する。 CaTは平行ロールアウトのグループから単一の参照を合成し、それに向けて最適化する。テストタイムの手順として、CaTはGemma 3 4B、Qwen 3 4B、Llama 3.1 8Bを改善している。
論文参考訳（メタデータ） (Wed, 17 Sep 2025 17:59:42 GMT)
「(i) verifiable tasks use programmatic equivalence on final answers; (ii) non-verifiable tasks use self-proposed rubrics—binary, auditable criteria scored by an independent LLM judge, with reward given by the fraction satisfied.」と検証困難なタスクにも効果があるのが興味深い。「CaT can be applied at test time for inference-time gains or inside RL (CaT-RL) to improve the policy.」とのこと。強化学習でも効果を確認している。

2025年9月
月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30