arXiv最新論文の紹介

Biomedical Foundation Model: A Survey

Biomedical Foundation Model: A Survey [84.3]
ファンデーションモデルは、広範なラベルなしデータセットから学習する大規模な事前訓練モデルである。これらのモデルは、質問応答や視覚的理解といった様々な応用に適応することができる。本研究は,生物医学分野における基礎モデルの可能性を探るものである。
論文参考訳（メタデータ） (Mon, 03 Mar 2025 22:42:00 GMT)
生物学、医学分野の基盤モデルのサーベイ、主な対象は「computational biology, drug development, clinical informatics, medical imaging, and public health」

Transformers without Normalization

Transformers without Normalization [58.8]
トランスフォーマーの正規化レイヤのドロップイン置換として、DyT($x$) = tanh(alpha $x$)$という要素演算式であるDynamic Tanh(DyT)を導入する。我々は、認識から生成、教師付き学習、教師付き学習、コンピュータビジョンから言語モデルまで、様々な環境において、DyTを用いたトランスフォーマーの有効性を検証する。
論文参考訳（メタデータ） (Thu, 13 Mar 2025 17:59:06 GMT)
「We introduce Dynamic Tanh (DyT), an element-wise operation DyT(x) = tanh(αx), as a drop-in replacement for normalization layers in Transformers.」とのこと。知見として興味深く、「DyT improves training and inference speed, making it a candidate for efficiency-oriented network design.」と計算コスト的にも有利とのこと。

Simulating the Real World: A Unified Survey of Multimodal Generative Models

Simulating the Real World: A Unified Survey of Multimodal Generative Models [48.4]
実世界のシミュレーションにおいて,データ次元の進行を調査する多モード生成モデルについて統一的な調査を行う。我々の知る限りでは、これは単一のフレームワーク内で2D、ビデオ、3D、および4D生成の研究を体系的に統一する最初の試みである。
論文参考訳（メタデータ） (Thu, 06 Mar 2025 17:31:43 GMT)
生成AIが実世界をシミュレーションにつながるかは議論が分かれるが、「In this survey, we present a unified survey for multimodal generative models that investigate the progression of data dimensionality in real-world simulation.」というサーベイ。
様々な研究は進むもののハードルはかなり高い印象。

You Only Debias Once: Towards Flexible Accuracy-Fairness Trade-offs at Inference Time

You Only Debias Once: Towards Flexible Accuracy-Fairness Trade-offs at Inference Time [132.0]
ディープニューラルネットワークは、様々なバイアス問題に悩まされがちで、高い意思決定のための応用を危うくしている。推論時間におけるフレキシブルな精度-公正トレードオフを実現するために,You Only Debias Once (YODO)を提案する。 YODOは、モデル精度と公平性の間の柔軟なトレードオフを、超低オーバーヘッドで達成します。
論文参考訳（メタデータ） (Mon, 10 Mar 2025 08:50:55 GMT)
「Instead of pursuing one individual fixed point (fairness-optimum) in the weight space, we aim to find a “line” in the weight space that connects the accuracyoptimum and fairness-optimum points using a single model.」し、推論時にどのポイントを使うか選ぶアプローチのDebias手法の提案。
「After training a model f(x; ω1, ω2, α) with two sets of parameters ω1 and ω2, the prediction procedure for a test sample x is i) Choose the desired trade-off parameter α, which controls the balance between accuracy and fairness, ii) Compute the weighted combination of the two sets of trained weights, (1 − α)ω1 + αω2, to obtain the model parameters for the desired trade-off, iii) Compute the prediction function to the test sample x as f(x; (1 − α)ω1 + αω2), to obtain the predicted output.」というのできちんと動作するのが面白い。
リポジトリはGitHub – ahxt/yodo

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [45.7]
本稿では,大規模言語モデル(LLM)のためのDeepSeek-R1モデルの拡張であるSearch-R1を紹介する。 Search-R1は、リアルタイム検索によるステップバイステップ推論中に(複数の)検索クエリを自律的に生成する。実験の結果、サーチ-R1は26%(Qwen2.5-7B)、21%(Qwen2.5-3B)、10%(LLaMA3.2-3B)のSOTAベースラインの性能向上を示した。
論文参考訳（メタデータ） (Wed, 12 Mar 2025 16:26:39 GMT)
検索クエリを発行しながら推論を進めるフレームワークの提案「SEARCH-R1, a novel reinforcement learning framework that enables large language models (LLMs) to interleave self-reasoning with real-time search engine interactions.」。
リポジトリはGitHub – PeterGriffinJin/Search-R1: Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy

Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy [38.6]
大規模言語モデル(LLM)を利用した100個の生成エージェントを備えたVacSimフレームワークについて紹介する。 VacSim ワクチンは,1) 人口統計データに基づくエージェントの集団のインスタンス化,2) ソーシャル・ネットワークを介してエージェントを接続し,社会的ダイナミクスと疾患関連情報の関数としてワクチンの態度をモデル化すること,3) ワクチンの根絶を緩和するための様々な公衆衛生介入の設計と評価を行う。
論文参考訳（メタデータ） (Wed, 12 Mar 2025 02:54:15 GMT)
LLM based Agentsで社会（ワクチンへの態度）をシミュレーションできるか、という研究。「Our results demonstrate that certain LLMs, such as Qwen-2.5-7B-Instruct and Llama-3-8B-Instruct, capture nuanced interactions among agent demographics, social influences, and policy scenarios. These models successfully pass both global and local consistency checks, suggesting that generative agents could become valuable tools for exploring how policy interventions might shape public attitudes.」、「Models such as Claude-3.5-Haiku and Phi-3.5-mini-instruct reveal inconsistencies that compromise simulation desiderata.」と解釈は悩ましい。このモデルサイズでまっとうな反応ができるのか疑問（Leakの影響が懸念される）だが、リアルなシミュレーションが可能なら面白い結果。
リポジトリはGitHub – abehou/VacSim: Public code repository for VacSim: A generative multi-agent simulation for vaccine hesitancy.とのこと

YuE: Scaling Open Foundation Models for Long-Form Music Generation

YuE: Scaling Open Foundation Models for Long-Form Music Generation [134.5]
YuEはLLaMA2アーキテクチャに基づいたオープンファンデーションモデルのファミリーである。歌詞のアライメント、コヒーレントな音楽構造、適切な伴奏を伴う声楽メロディを維持しながら、最大5分間の音楽を生成する。
論文参考訳（メタデータ） (Tue, 11 Mar 2025 17:26:50 GMT)
オープンな音楽生成基盤モデルYuEの提案。マルチリンガルな（日本語を含む）歌詞で歌っているデモソングが面白い。かなりのクオリティのモデルが「The YuE model (including its weights) is now released under the Apache License, Version 2.0. We do not make any profit from this model, and we hope it can be used for the betterment of human creativity.」で公開されているのは凄い。
デモサイトはYuE、リポジトリはGitHub – multimodal-art-projection/YuE: YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Personalized Generation In Large Model Era: A Survey

Personalized Generation In Large Model Era: A Survey [90.8]
大規模モデルの時代には、コンテンツ生成は徐々にパーソナライズドジェネレーション(PGen)へとシフトしている。本報告では,PGen に関する総合的な調査を行い,この急速に成長する分野における既存研究について考察する。複数のモダリティにまたがるPGen研究をブリッジすることで、この調査は知識共有と学際的コラボレーションを促進する貴重な情報源となる。
論文参考訳（メタデータ） (Tue, 04 Mar 2025 13:34:19 GMT)
Personalized Generation (PGen)のサーベイ、様々なモダリティを対象にしている。
最後の表を見ると様々な研究が数多くあることが分かる。。

Self-Taught Self-Correction for Small Language Models

Self-Taught Self-Correction for Small Language Models [16.5]
本研究は,自己生成データのみを用いた反復的微調整により,小言語モデル(SLM)における自己補正を探索する。複数のアルゴリズム設計選択を組み込んだ自己学習自己補正アルゴリズム(STaSC)を導入する。質問応答タスクの実験結果から,STaSCは自己補正を効果的に学習し,性能が大幅に向上することが示された。
論文参考訳（メタデータ） (Tue, 11 Mar 2025 17:57:44 GMT)
STaRに自己補正を様々組み込んだSelf-Taught Self-Correction (STaSC)の提案。
リポジトリはGitHub – VityaVitalich/STASC: [ICLR 2025 SSI-FM] Self-Taught Self-Correction for Small Language Models

Gemma3, Command A, OLMo 2 32B, ERNIE 4.5 & X1

ずっと週刊LLMという状態だが、先週のGemma3の発表は大きなニュースだった（Gemma 3: Google’s new open model based on Gemini 2.0）。寛容なライセンスの公開モデルで性能も強力。今後LRM化するのだろうと思うと非常に期待が持てる。NCではあるがCohereのCommand Aの発表（Introducing Command A: Max performance, minimal compute）もあった。

Ai2からはOLMo 2の32B版が発表されている（OLMo 2 32B: First fully open model to outperform GPT 3.5 and GPT 4o mini | Ai2）。こちらはモデル構築過程や使用したデータセットを含め多くを公開している点が特徴でモデルのみを公開しているモデルよりもオープンである。

OpenAIからはエージェント開発に有用なAPIやツールが公開されている（エージェント開発のための新たなツール | OpenAI）ように、活用・運用には周辺ツールもとても重要になるが、ローカルLLMへの期待は高まっているように感じる。

BaiduからはERNIE 4.5とLRMのX1が発表される（XユーザーのBaidu Inc.さん: 「We’ve just unveiled ERNIE 4.5 & X1! 🚀 As a deep-thinking reasoning model with multimodal capabilities, ERNIE X1 delivers performance on par with DeepSeek R1 at only half the price. Meanwhile, ERNIE 4.5 is our latest foundation model and new-generation native multimodal model. https://t.co/cLKVHYvbzw」 / X）など商用APIでも激しい競争が続く。

2026年2月
月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28