2025年7月 – ページ 6 – arXiv最新論文の紹介

FlexOlmo: Open Language Models for Flexible Data Use

FlexOlmo: Open Language Models for Flexible Data Use [184.9]
我々は、データ共有なしで分散トレーニングをサポートする新しい言語モデル(LM)であるFlexOlmoを紹介します。 FlexOlmoはエキスパートの混成アーキテクチャを採用しており、各専門家はクローズドデータセットで独立して訓練される。我々は、公開データで訓練された一般専門家と、他のデータ所有者から独立した訓練を受けた専門家とを効果的に組み合わせることができることを示す。
論文参考訳（メタデータ） (Wed, 09 Jul 2025 16:54:21 GMT)
「Standard MoEs train all experts and the router jointly on all data. In contrast, FLEXOLMO trains experts independently by teaching them to coordinate (§3.3.1) and merges them at inference using a domain-informed router (§3.3.2).」と連合学習やMoEと聞いて思い浮かべるが現実的には難しいそれぞれの場所で構築されたAIが統合的に動作するフレームワークの提案と効果検証。
「Organizations in regulated industries require LMs that can leverage their closed datasets while maintaining strict data privacy and access controls. Healthcare institutions, financial firms, and other entities possess valuable domain-specific data but cannot share it externally due to HIPAA, GDPR [14, 15], data sovereignty laws [16], and intellectual property (IP) protections. 　These organizations need training paradigms that enable AI improvement on their sensitive data while ensuring such sensitive data never leaves certain environments and can be removed from the model after training, e g , when data usage rights expire. In such settings, modular training approaches, where individual experts are trained independently and asynchronously on locally maintained data, are essential.」はまさにその通りで非常に有用な技術に思える。
プロジェクトサイトはIntroducing FlexOlmo: a new paradigm for language model training and data collaboration | Ai2、リポジトリはGitHub – allenai/FlexOlmo: Code and training scripts for FlexOlmo

Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers

Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers [31.5]
LimitGenは、初期のフィードバックをサポートし、人間のピアレビューを補完するLLMの能力を評価するための最初のベンチマークである。提案手法は, LLMシステムによる研究論文の限界を生じさせる能力を高め, より具体的で建設的なフィードバックを提供する。
論文参考訳（メタデータ） (Thu, 03 Jul 2025 15:04:38 GMT)
「We propose LIMITGEN, a comprehensive bench- mark specifically designed to assess the ability of models to identify and address limitations in scientific research, with a reliable and systematic evaluation framework.」というベンチマークの提案と検証。「Even the best-performing LLM, GPT-4o, can only identify about half of the limitations that humans consider very obvious. Although MARG lever- ages multi-agent collaboration and generates more comments, successfully identifying more limita- tions, the feedback it provides still lacks specificity, which is reflected in the fine-grained scores.」とのこと。MARGはマルチエージェントフレームワーク。
リポジトリはGitHub – yale-nlp/LimitGen: Data and Code for ACL 2025 Paper “Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers”

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact [31.6]
本稿では,人工知能,認知神経科学,心理学,生成モデル,エージェントベースシステムの学際的合成について述べる。我々は汎用知能のアーキテクチャと認知の基礎を分析し、モジュラー推論、永続記憶、マルチエージェント協調の役割を強調した。我々は、人工知能への道の鍵となる科学的、技術的、倫理的課題を特定します。
論文参考訳（メタデータ） (Tue, 01 Jul 2025 16:52:25 GMT)
AGIを目指すうえでの整理「Several challenges remains, such as the need for grounded world models, dynamic memory, causal reasoning, robust handling of aleatory and epistemic uncertainty, developing perception of emotional and social contexts and collective agent architectures. Significant advancements have been made, such as Large Concept Models, Large Reasoning Models and Mixture of Experts, which improve LLM performance beyond next-token prediction by incorporating biologically inspired behaviors into output generation.」と指摘。
MoEなど技術的なとらえ方に違和感がなくはないが興味深い整理

A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools

A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools [15.9]
ファンデーションモデル(FM)は、科学的発見のためにスケーラブルで汎用的でマルチモーダルなAIシステムを実現する。この調査は、この成長分野をサポートする基盤モデル、エージェントシステム、データセット、計算ツールの包括的概要を提供する。
論文参考訳（メタデータ） (Wed, 25 Jun 2025 18:10:30 GMT)

The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure

The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure [25.0]
生成のための暗黙的なタスク解決–>翻訳パイプラインの存在を実証する。 108言語対にわたる単語翻訳タスクに対して,この仮説を検証した。全体的な失敗のかなりの部分は、翻訳失敗に起因していることが分かりました。
論文参考訳（メタデータ） (Sat, 28 Jun 2025 02:09:21 GMT)
「We find that a significant portion of overall failures indeed stems from translation failure, or the model’s inability to translate correctly solved intermediate concepts into the target language. This is especially true for low-resource target languages.」という指摘
動作自体はBeyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? – arXiv最新論文の紹介からもそうなんだろうと思いつつ、中間言語は学習の中心になった言語に影響されているんだろうなと思うとそれでよいのかという気がしなくはない。

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language [48.8]
我々は、FineWebをベースにした、新しいトレーニング済みデータセットキュレーションパイプラインを導入する。我々のパイプラインは、以前のデータセットよりもパフォーマンスの高いモデルを生成する非英語コーパスを作成するために使用できることを示す。パイプラインを約100のCommon Crawlスナップショットを使用して1000以上の言語に拡張し、新たに20テラバイト(50億ドキュメント)のマルチリンガルデータセットであるFinWeb2を生成しました。
論文参考訳（メタデータ） (Thu, 26 Jun 2025 01:01:47 GMT)
大規模、マルチリンガル、高品質なデータセットの提案。重複データへの対応やフィルタリングによって他のデータセットよりも効率的な学習が可能とのこと
リポジトリはGitHub – huggingface/fineweb-2、データセットはHuggingFaceFW/fineweb-2 · Datasets at Hugging Face

Embodied AI Agents: Modeling the World

Embodied AI Agents: Modeling the World [165.0]
本稿では,視覚的,仮想的,物理的形態を具現化したAIエージェントの研究について述べる。我々は,世界モデルの開発が,具体的AIエージェントの推論と計画の中心であることを提案する。また,より優れた人間とエージェントのコラボレーションを実現するために,ユーザのメンタルワールドモデルを学ぶことを提案する。
論文参考訳（メタデータ） (Fri, 27 Jun 2025 16:05:34 GMT)
「We propose that the development of world models is central to reasoning and planning of embodied AI agents, allowing these agents to understand and predict their environment, to understand user intentions and social contexts, thereby enhancing their ability to perform complex tasks autonomously. World modeling encompasses the integration of multimodal perception, planning through reasoning for action and control, and memory to create a comprehensive understanding of the physical world.」という整理

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements [87.6]
科学的進歩への重要な能力は、既存の作品を再現する能力である。アクティブな研究領域においてAIエージェントが結果を再現する能力を評価するために,自動LLM高速化ベンチマークを導入する。最近のLSMとSoTAの足場を組み合わせると、ベンチマークですでに知られているイノベーションを再実装するのに苦労していることが分かりました。
論文参考訳（メタデータ） (Fri, 27 Jun 2025 17:44:32 GMT)
「We find that recent reasoning LLMs combined with SoTA scaffolds struggle to reimplement already-known innovations in our benchmark, even when given detailed hints.」というやや意外な結果。
リポジトリはGitHub – facebookresearch/llm-speedrunner: The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in language modeling.

Large Language Models in Argument Mining: A Survey

Large Language Models in Argument Mining: A Survey [15.0]
Argument Mining (AM) はテキストから議論的構造を抽出することに焦点を当てている。 LLM(Large Language Models)の出現は、AMを大きく変化させ、高度な文脈内学習を可能にした。本研究は, LLM駆動型AMの最近の進歩を体系的に合成する。
論文参考訳（メタデータ） (Thu, 19 Jun 2025 15:12:58 GMT)
LLMを活用したArgument Mining のサーベイ

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent [53.8]
我々は,セグメント内のテキストを読み,上書き戦略を用いてメモリを更新する新しいエージェントワークフローであるMemAgentを紹介した。 MemAgentは、32Kテキストでトレーニングされた8Kコンテキストから3.5M QAタスクへの外挿が可能で、パフォーマンスが5%低下し、512K RULERテストで95%以上を実現している。
論文参考訳（メタデータ） (Thu, 03 Jul 2025 03:11:50 GMT)
長文を扱うためのAgenticなフレームワークの提案、下記が特徴とのこと（プロジェクトサイトより引用）
- 1 Novel memory mechanism: The agent reads text in segments and efficiently updates memory through an overwriting strategy. This design enables the model to process arbitrarily long inputs within a fixed context window, fundamentally overcoming the window length limitations of traditional Transformer architectures.
- 2 O(n) complexity: By decoupling computation from text length, the complexity of processing long texts is transformed from quadratic growth to linear growth.
- 3 RL-driven extrapolation: We enhance the DAPO algorithm to support multi-turn training over context-independent conversations. Based on this, the trained model exhibits unprecedented extrapolation performance.
プロジェクトサイトはMemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31