Foundation Models – arXiv最新論文の紹介

Vision Generalist Model: A Survey

Vision Generalist Model: A Survey [87.5]
本稿では、ビジョンジェネラリストモデルの概要を概観し、その分野におけるその特性と能力について考察する。関連ドメインへの簡単な探索を行い、相互接続と潜在的なシナジーに光を当てます。
論文参考訳（メタデータ） (Wed, 11 Jun 2025 17:23:41 GMT)

WorldPM: Scaling Human Preference Modeling

WorldPM: Scaling Human Preference Modeling [130.2]
我々は、このスケーリングの可能性を強調するために、World Preference Modeling$ (WorldPM)を提案する。多様なユーザコミュニティをカバーする公開フォーラムから選好データを収集する。 1.5Bから72Bパラメータの範囲で15Mスケールのデータを用いて広範囲なトレーニングを行う。
論文参考訳（メタデータ） (Thu, 15 May 2025 17:38:37 GMT)
「Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling.」とのこと。さらには「Through evaluations on 7 benchmarks with 20 subtasks, we find that WorldPM broadly improves the generalization performance across human preference datasets of varying sizes (7K, 100K and 800K samples), with performance gains exceeding 5% on many key subtasks.」を主張している。この手の基盤モデルの可能性は興味深い（が若干怖くもある）。
- Appendixのフィルタに関する結果、「we argue that applying RM filtering diverges from capturing world preference. Instead of assuming forum data contains noise, we should interpret apparent contradictions as manifestations of genuine human preferences, allowing models to discover underlying commonalities within these surface-level conflicts.」も面白い
リポジトリはGitHub – QwenLM/WorldPM

The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features

The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features [40.2]
本稿では,TabPFNと単純な特徴工学を組み合わせ,予測性能を高めるための簡単なアプローチであるTabPFN-TSを提案する。その単純さとわずか1100万のパラメータにもかかわらず、TabPFN-TSは類似サイズのモデルであるChronos-Miniよりも優れており、65倍のパラメータを持つChronos-Largeよりもわずかに優れている。
論文参考訳（メタデータ） (Mon, 06 Jan 2025 11:38:19 GMT)
なかなか難しい感のあるTabular Foundation Modelの提案。「By using a simple set of timestampderived features, our approach matches or slightly outperforms Chronos-T5 (Large), which, to our knowledge, is one of the strongest time series foundation models.」とのこと。時系列データの基礎的な動きを捉えられているのかもしれないが、使う場合はそのドメインでの検証はした方が良いのだろうなと思う。
リポジトリはGitHub – PriorLabs/tabpfn-client: ⚡ Easy API access to the tabular foundation model TabPFN ⚡

Cosmos World Foundation Model Platform for Physical AI

Cosmos World Foundation Model Platform for Physical AI [136.1]
物理AIには、自分自身のデジタルツイン、ポリシーモデル、そして世界のデジタルツイン、ワールドモデルが必要です。私たちは、開発者が物理AIセットアップのためにカスタマイズされた世界モデルを構築するのを助けるために、Cosmos World Foundation Model Platformを紹介します。
論文参考訳（メタデータ） (Tue, 07 Jan 2025 06:55:50 GMT)
バズっていたNVIDIAによるWorld Foundation Model。「Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers.」と包括的な構成でモデルを公開しているのはすごい。
構築過程で「We refine our data by excluding specific video types that could lead to poor generation quality or unrealistic dynamics, such as abstract visual patterns, video game footage, animated content, etc.」があるのが面白かった。unrealistic dynamicsはそうだろうと思う。
現状は初期段階、問題も多そうではあるが今後の発展に期待。現状の進化で作れるのか、根幹のモデルアーキテクチャが変わらないとできないのか、とても興味がある。
リポジトリはGitHub – NVIDIA/Cosmos: Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate videos.

AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities

AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities [5.8]
本稿では,JEPAと分解能適応型空間エンコーダに基づくマルチモーダルモデルであるAnySatを提案する。この統一アプローチの利点を示すために、5ドルのマルチモーダルデータセットのコレクションであるGeoPlexをコンパイルする。次に、これらの多様なデータセット上で、単一の強力なモデルを同時にトレーニングします。
論文参考訳（メタデータ） (Wed, 18 Dec 2024 18:11:53 GMT)
様々な Earth observationデータを統合的に扱える基盤モデルの提案。「We have presented AnySat, a versatile architecture designed to address the diversity of EO data in terms of resolutions, scales, and modalities.」ということで効果も検証されている。
リポジトリはGitHub – gastruc/AnySat

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective [31.5]
本稿では、最近の進歩を概観し、自己回帰的視覚基盤モデルの将来的な方向性について論じる。我々は,次世代の視覚基礎モデルのトレンドを提示し,視覚タスクの理解と生成を統一する。我々は、自己回帰的視覚基盤モデルを、その視覚トークン化剤と自己回帰バックボーンから分類する。
論文参考訳（メタデータ） (Tue, 29 Oct 2024 16:48:22 GMT)
テキスト分野だけではなく画像においてもさらには画像生成においても存在感を増すAutoregressionモデル、autoregressive vision foundation modelのサーベイ
リポジトリはGitHub – EmmaSRH/ARVFM: Awesome autoregressive vision foundation models

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents [55.4]
OS-AtlasはGUIグラウンディングとOODエージェントタスクに優れた基礎的なGUIアクションモデルである。現在までに1300万以上のGUI要素を含む、オープンソースのクロスプラットフォームGUI基盤コーパスをリリースしています。
論文参考訳（メタデータ） (Wed, 30 Oct 2024 17:10:19 GMT)
GUIを対象としたFoundation Action Modelの提案、Anthropicの発表もあって盛り上がっている領域。性能は「although GPT-4o with OS-Atlas-Base as the grounding module still lags behind human performance, it significantly outperforms other grounding methods such as SeeClick and Set-of-Mark (SoM)」とのこと。
リポジトリはOS-Atlas Homepage

A Survey of Foundation Models for Music Understanding

A Survey of Foundation Models for Music Understanding [60.8]
この研究は、AI技術と音楽理解の交差に関する初期のレビューの1つである。音楽理解能力に関して,近年の大規模音楽基盤モデルについて検討,分析,検証を行った。
論文参考訳（メタデータ） (Sun, 15 Sep 2024 03:34:14 GMT)
「This work, to our best knowledge, is one of the early reviews of the intersection of AI techniques and music understanding.」とのこと。非常に包括的なサーベイ。

Configurable Foundation Models: Building LLMs from a Modular Perspective

Configurable Foundation Models: Building LLMs from a Modular Perspective [115.6]
LLMを多数の機能モジュールに分解する傾向が高まり、複雑なタスクに取り組むためにモジュールの一部とモジュールの動的アセンブリを推論することができる。各機能モジュールを表すブロックという用語を造語し、モジュール化された構造をカスタマイズ可能な基礎モデルとして定義する。検索とルーティング,マージ,更新,成長という,レンガ指向の4つの操作を提示する。 FFN層はニューロンの機能的特殊化と機能的ニューロン分割を伴うモジュラーパターンに従うことが判明した。
論文参考訳（メタデータ） (Wed, 4 Sep 2024 17:01:02 GMT)
Configurable Foundation Models、再構成可能なモジュール化された基盤モデルに関する研究、サーベイ
有用性は分かるが難しい問題との認識。model mergeなどの成果を見ると可能性を感じるとともに現時点では機能別の領域同定も簡単ではなさそうという印象。

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources [100.2]
ファンデーションモデル開発は、急速に成長するコントリビュータ、科学者、アプリケーションを引き付けている。責任ある開発プラクティスを形成するために、我々はFoundation Model Development Cheatsheetを紹介します。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 02:19:01 GMT)
責任ある基盤モデル開発のためのチートシート。チートシートとあるが広範な内容となっている。
プロジェクトサイトはResources for Foundation Models – Foundation Model Development Cheatsheet (fmcheatsheet.org)

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31