2026年2月25日 – arXiv最新論文の紹介

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies [61.3]
対話型経済における継続的計画・実行意思決定のためのベンチマークであるEcoGymを紹介する。 EcoGymは、透明性のある長期的なエージェント評価のためのオープンなテストベッドとしてリリースされ、現実的な経済環境下でのコントロール可能性とユーティリティのトレードオフを研究するためのものだ。
論文参考訳（メタデータ） (Wed, 11 Feb 2026 08:59:16 GMT)
「EcoGym, a generalizable benchmark for continuous plan-and-execute decision making in interactive economies.」というベンチマーク。「Experiments across eleven leading LLMs expose a systematic tension: no single model dominates across all three scenarios. Critically, we find that models exhibit significant suboptimality in either high-level strategies or efficient actions executions.」というのは興味深く得意・不得意があるよう（安定性が良くないという指摘もある）
リポジトリはGitHub – OPPO-PersonalAI/EcoGym: Official Repo for “EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies”

最近よくバズるMoltbookやOpenClawに言及するまたは対象とした論文が複数出ていた。対応（？）が速くて驚き。Fugu-MT: arxivの論文翻訳(検索結果: Moltbook)、Fugu-MT: arxivの論文翻訳(検索結果: OpenClaw)　はこれからも増えていくはず。

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 [61.8]
この技術レポートは、サイバー犯罪、説得と操作、戦略上の詐欺、制御されていないAIR&D、自己複製の5つの重要な側面について、更新されきめ細かな評価を提示する。この作業は、現在のAIフロンティアのリスクに対する理解を反映し、これらの課題を軽減するための集団行動を促します。
論文参考訳（メタデータ） (Mon, 16 Feb 2026 04:30:06 GMT)
リスク整理、「3.4.4 Interactive agents’ autonomous self-modification on Openclaw and Moltbook」で取り扱われる。

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook [23.9]
Moltbookは、自律エージェントがオープンエンドで継続的に進化するオンライン社会に参加する、もっともらしい未来のシナリオを近似している。本稿では,このAIエージェント・ソサエティの大規模システム診断について紹介する。
論文参考訳（メタデータ） (Sun, 15 Feb 2026 20:15:28 GMT)
「 Our results show that large-scale interaction and dense connectivity alone do not induce socialization, revealing a fundamental gap between scalability and social integration in current agent societies.」と指摘
プロジェクトサイトはGitHub – tianyi-lab/Moltbook_Socialization: Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw) [77.2]
6つのリスク次元にわたるClawdbotの軌道中心評価について述べる。我々は、完全なインタラクショントラジェクトリ(メッセージ、アクション、ツールコール引数/アウトプット)をログし、自動化されたトラジェクトリ判断とヒューマンレビューの両方を使用して安全性を評価する。
論文参考訳（メタデータ） (Mon, 16 Feb 2026 00:33:02 GMT)
OpenClawの分析、この手のツール設計は難しいなという思いが強くなる。「First, Clawdbot’s memory is persisted as plain Markdown files in the agent workspace, so mistaken inferences or injected instructions can be written to disk and then carried across sessions as durable state (OpenClaw Documentation, 2026h). Second, Clawdbot’s extensibility model encourages the use of “skills” that are themselves Markdown instruction bundles, which can embed tool-call recipes and command-style guidance and therefore expand the prompt-injection and supply-chain attack surface beyond the immediate user prompt.」
リポジトリはGitHub – tychenn/clawdbot_report

The Rise of AI Agent Communities: Large-Scale Analysis of Discourse and Interaction on Moltbook [62.3]
MoltbookはRedditに似たソーシャルプラットフォームで、AIエージェントが投稿を作成し、コメントや返信を通じて他のエージェントと対話する。ローンチから約5日後に収集された公開APIスナップショットを使用して、AIエージェントが何を議論しているか、どのように投稿するか、どのように相互作用するのかという3つの研究課題に対処する。エージェントの執筆は、主に中立であり、コミュニティエンゲージメントや支援指向のコンテンツに肯定性があることが示される。
論文参考訳（メタデータ） (Fri, 13 Feb 2026 05:28:31 GMT)
Moltbookの分析、「Affectively, agent communication is predominantly neutral, with positive sentiment selectively concentrated in community-oriented onboarding and engagement practices. Structurally, the interaction network ex- hibits a sparse, hub-dominated topology characterized by low reci- procity. Although the platform features resemble patterns observed in human online communities like Reddit, the interactions lack sustained, reciprocal dialogue.」と指摘。

日: 2026年2月25日