2025年11月6日 – arXiv最新論文の紹介

Social Simulations with Large Language Model Risk Utopian Illusion

Social Simulations with Large Language Model Risk Utopian Illusion [61.4]
社会シミュレーションにおける大規模言語モデルの行動分析のための体系的枠組みを提案する。本手法は,チャットルーム型会話を通してマルチエージェントインタラクションをシミュレートし,5つの言語的側面にわたって解析する。以上の結果から,LSMは真の人間の行動を忠実に再現するのではなく,過度に理想化されたバージョンを反映していることが明らかとなった。
論文参考訳（メタデータ） (Fri, 24 Oct 2025 06:08:41 GMT)
様々なところで試されているLLMを用いた社会シミュレーションに関する報告、「Our findings reveal that LLMs do not faithfully reproduce genuine human behavior but instead reflect overly idealized versions of it, shaped by the social desirabil- ity bias. In particular, LLMs show social role bias, primacy effect, and positivity bias, resulting in “Utopian” societies that lack the complexity and variability of real human interactions.」と否定的見解。

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark [124.0]
我々は、ビデオモデルがゼロショット推論器として機能する準備が整っているかどうかを実証研究する。私たちは、人気の高いVeo-3に注力しています。我々は,空間的,幾何学的,物理的,時間的,具体的論理を含む12次元にわたる推論行動を評価する。
論文参考訳（メタデータ） (Thu, 30 Oct 2025 17:59:55 GMT)
「Video models are zero-shot learners and reasoners – arXiv最新論文の紹介」という主張もあるが、異なるチームによる論文。「Our findings reveal that while current video models demonstrate promising reasoning patterns on short-horizon spatial coherence, fine-grained grounding, and locally consistent dynamics, they remain limited in long-horizon causal reasoning, strict geometric constraints, and abstract logic. Overall, they are not yet reliable as standalone zero-shot reasoners, but exhibit encouraging signs as complementary visual engines alongside dedicated reasoning models.」とのことで可能性を感じる結果ではある。
プロジェクトサイトはAre Video Models Ready as Zero-Shot Reasoners?

DeepAgent: A General Reasoning Agent with Scalable Toolsets [111.6]
DeepAgentは、自律的な思考、ツール発見、アクション実行を実行するエンドツーエンドのディープ推論エージェントである。長期にわたる相互作用の課題に対処するために,過去の相互作用を構造化エピソード,動作,ツール記憶に圧縮する自律的メモリ折り畳み機構を導入する。 LLMシミュレートされたAPIを活用し、ツール呼び出しトークンにきめ細かいクレジットを割り当てるツールコールアドバンテージ属性を適用した、エンドツーエンドの強化学習戦略であるToolPOを開発した。
論文参考訳（メタデータ） (Fri, 24 Oct 2025 16:24:01 GMT)
ツール利用等も可能になるエージェントフレームワークの紹介。QwQ-32Bをバックボーンとして有効性を検証している。
リポジトリはGitHub – RUC-NLPIR/DeepAgent: 🛠️ DeepAgent: A General Reasoning Agent with Scalable Toolsets