2025年7月11日 – arXiv最新論文の紹介

Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers

Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers [31.5]
LimitGenは、初期のフィードバックをサポートし、人間のピアレビューを補完するLLMの能力を評価するための最初のベンチマークである。提案手法は, LLMシステムによる研究論文の限界を生じさせる能力を高め, より具体的で建設的なフィードバックを提供する。
論文参考訳（メタデータ） (Thu, 03 Jul 2025 15:04:38 GMT)
「We propose LIMITGEN, a comprehensive bench- mark specifically designed to assess the ability of models to identify and address limitations in scientific research, with a reliable and systematic evaluation framework.」というベンチマークの提案と検証。「Even the best-performing LLM, GPT-4o, can only identify about half of the limitations that humans consider very obvious. Although MARG lever- ages multi-agent collaboration and generates more comments, successfully identifying more limita- tions, the feedback it provides still lacks specificity, which is reflected in the fine-grained scores.」とのこと。MARGはマルチエージェントフレームワーク。
リポジトリはGitHub – yale-nlp/LimitGen: Data and Code for ACL 2025 Paper “Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers”

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact [31.6]
本稿では,人工知能,認知神経科学,心理学,生成モデル,エージェントベースシステムの学際的合成について述べる。我々は汎用知能のアーキテクチャと認知の基礎を分析し、モジュラー推論、永続記憶、マルチエージェント協調の役割を強調した。我々は、人工知能への道の鍵となる科学的、技術的、倫理的課題を特定します。
論文参考訳（メタデータ） (Tue, 01 Jul 2025 16:52:25 GMT)
AGIを目指すうえでの整理「Several challenges remains, such as the need for grounded world models, dynamic memory, causal reasoning, robust handling of aleatory and epistemic uncertainty, developing perception of emotional and social contexts and collective agent architectures. Significant advancements have been made, such as Large Concept Models, Large Reasoning Models and Mixture of Experts, which improve LLM performance beyond next-token prediction by incorporating biologically inspired behaviors into output generation.」と指摘。
MoEなど技術的なとらえ方に違和感がなくはないが興味深い整理

A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools [15.9]
ファンデーションモデル(FM)は、科学的発見のためにスケーラブルで汎用的でマルチモーダルなAIシステムを実現する。この調査は、この成長分野をサポートする基盤モデル、エージェントシステム、データセット、計算ツールの包括的概要を提供する。
論文参考訳（メタデータ） (Wed, 25 Jun 2025 18:10:30 GMT)