社会実装 – arXiv最新論文の紹介

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report [51.2]
本報告では,フロンティアリスクの包括的評価について述べる。サイバー犯罪、生物学的および化学的リスク、説得と操作、制御不能な自律型AIR&D、戦略的騙しと計画、自己複製、共謀の7つの分野における重要なリスクを特定します。
論文参考訳（メタデータ） (Tue, 22 Jul 2025 12:44:38 GMT)
強力なAIに対するリスクの評価。最初に「Guided by the “AI-45◦Law,” we evaluate these risks using “red lines” (intolerable thresholds) and “yellow lines” (early warning indicators) to define risk zones: green (manageable risk for routine deployment and continuous monitoring), yellow (requiring strengthened mitigations and con- trolled deployment), and red (necessitating suspension of development and/or deployment). Experimental results show that all recent frontier AI models reside in green and yellow zones, without crossing red lines.」とあるが、セキュリティだと「However, none could accomplish more complex attacks, such as MH_K, MH_N, or full-chain exploitation. These findings indicate that while current models can execute simple cyber operations, they remain incapable of conducting sophisticated, real-world cyber attacks.」など具体的な内容になっている。

Your AI, Not Your View: The Bias of LLMs in Investment Analysis

Your AI, Not Your View: The Bias of LLMs in Investment Analysis [55.3]
金融分野では、事前訓練されたパラメトリック知識とリアルタイム市場データとの相違により、LLM(Large Language Models)は頻繁に知識紛争に直面している。 LLMに基づく投資分析において、確認バイアスの最初の定量的分析を行う。われわれは、大口株に対する一貫した選好と、ほとんどのモデルにおけるコントラリアン戦略を観察する。
論文参考訳（メタデータ） (Mon, 28 Jul 2025 16:09:38 GMT)
LLMの投資に関するバイアスの定量的分析。
「The results show that LLMs are not neutral decision-makers, with distinct preferences for certain financial factors depending on the model. While sector preferences varied significantly across models, showing no overall trend, a common bias towards large- size stocks and a consistent preference for a contrarian investment view over momentum were observed.」というバイアスがあるというのと、「While the models correctly reversed their decisions when presented only with counter-evidence, their flexibility sharply decreased in situations where supporting and counter-evidence were mixed and conflicting.」とかなり頑固なよう。
LLMに何かを判断させる際には細心の注意が必要。

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra [29.6]
本稿では,エージェント・ベース・モデリングを用いて経済政策を設計・評価する新しい枠組みを提案する。下位レベルでは、有界な労働者エージェントは、テキストベースのユーティリティ関数をテキストで学習するために労働供給を選択する。上位のレベルでは、プランナーエージェントは、現在の連邦政府の括弧に固定された一貫した境界税制を提案するために、文脈内強化学習を採用する。
論文参考訳（メタデータ） (Mon, 21 Jul 2025 17:21:14 GMT)
「Our results show that a Llama-3 model can (i) recover the Mirrleesian trade-off between equity and efficiency, (ii) approach Saez-optimal schedules in heterogeneous settings where analytical formulas are unavailable, and (iii) reproduce political phenomena—such as majority exploitation and welfare-enhancing leader turnover—without any hand-crafted rules. Taken together, the experiments suggest that large language models can serve as tractable test beds for policy design long before real-world deployment, providing a bridge between modern generative AI and classical economic theory.」とのこと。LLM basedなマルチエージェントシミュレーションとして興味深い結果であるのと、（凝ったアプローチのように見えるが）Llama-3.1-8B-InstructでOKというのが若干驚き。
リポジトリはsethkarten/LLM-Economist: Official repository of the 2025 paper, LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra.

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games, How large language models judge and influence human cooperation

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games [87.6]
大規模言語モデルは、アライメント、堅牢性、安全なデプロイメントを保証する上で、いかに自己関心と集合的幸福のバランスをとるかが重要な課題である。我々は、行動経済学から制度的に選択した公共財ゲームに適応し、異なるLLMがいかに社会的ジレンマをナビゲートするかを観察することができる。意外なことに、o1シリーズのようなLRMの推論は、協調にかなり苦労している。
論文参考訳（メタデータ） (Sun, 29 Jun 2025 15:02:47 GMT)
「our findings reveal a surprising pattern: while traditional LLMs demonstrate robust cooperation comparable to human outcomes, reasoning- enhanced models frequently struggle to sustain cooperation.」という興味深い結果。reasoningモデルだからなのか、モデルサイズや学習結果の問題なのかとても興味があるところ。
リポジトリはGitHub – davidguzmanp/SanctSim

How large language models judge and influence human cooperation [82.1]
我々は、最先端の言語モデルが協調行動をどのように判断するかを評価する。我々は、善良な相手との協力を評価する際、顕著な合意を守ります。モデル間の差異が協調の頻度に大きく影響を及ぼすことを示す。
論文参考訳（メタデータ） (Mon, 30 Jun 2025 09:14:42 GMT)
LLMが協調的な行動をとるか検証した論文。傾向を分析するのが難しい結果ではあるが「With some exceptions, most LLM families we tested tend to move from IS towards SS as versions and parameter size increases, indicating a shift towards a higher complexity social norm which makes use of more context, specifically assigned reputations. Moreover, different versions of the same family can have vastly distinct social norms, such as Claude 3.5 Haiku [47] and Claude 3.7 Sonnet [48], despite their similar ethical goals [49].」とのこと。（IS, cooperating is good, defection is bad、SS, cooperating is always good, defecting against bad individuals is also good）
「These results highlight an important concern: LLMs are not explicitly designed with a given social norm in mind, instead emerging as a by-product of their training [4]. While these norms may occasionally align with those of humans, they are neither designed to maintain cooperation and minimize disagreement, nor are they co-created with communities from diverse cultures to reflect their norms and needs [3].」というのが実際のところだと思うが、意思決定支援に使うという話は相応にあったりするわけで注意が必要だと思う。

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce [45.3]
作業員がAIエージェントの自動化や強化を望んでいるかを評価するための新しい枠組みを導入する。我々のフレームワークは、ニュアンスな労働者の欲求を捉えるために、オーディオ強化されたミニインタービューを備えている。我々はWORKBankデータベースを構築し、1500のドメインワーカーの好みとAI専門家の能力評価を収集する。
論文参考訳（メタデータ） (Wed, 11 Jun 2025 21:25:21 GMT)
「This paper presents the first large-scale audit of both worker desire and technological capability for AI agents in the context of automation and augmentation.」という調査報告。下記４象限で見ると希望しているものと研究の方向性があっているとは言い難そう。
- Automation “Green Light” Zone: Tasks with both high automation desire and high capability. These are prime candidates for AI agent deployment with the potential for broad productivity and societal gains.
- Automation “Red Light” Zone: Tasks with high capability but low desire. Deployment here warrants caution, as it may face worker resistance or pose broader negative societal implications
- R&D Opportunity Zone: Tasks with high desire but currently low capability. These represent promising directions for AI research and development.
- Low Priority Zone: Tasks with both low desire and low capability. These are less urgent for AI agent development.
下記の研究結果ともあわせてAIを使い続けていくと傾向が変わったりするのか、気になるところ。

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task [17.6]
本研究は、教育文脈における大規模言語モデル（LLM）の使用が認知負荷に与える影響を調査しました。54人の参加者を対象に、LLM、検索エンジン、脳のみのグループに分け、脳波（EEG）を用いて神経活動を記録し、学習効果を測定しました。結果として、LLM群は他のグループと比較して認知的なネットワーク接続が弱く、学習スキルの低下が見られ、AIが学習環境に与える影響の理解に向けた初歩的な指針を提供することを目指しています。
論文参考訳（メタデータ） (Tue, 10 Jun 2025 15:04:28 GMT)
AIの活用が人間にどのような影響を与えるか、教育関連の報告。「As the educational impact of LLM use only begins to settle with the general population, in this study we demonstrate the pressing matter of a likely decrease in learning skills based on the results of our study. The use of LLM had a measurable impact on participants, and while the benefits were initially apparent, as we demonstrated over the course of 4 months, the LLM group’s participants performed worse than their counterparts in the Brain-only group at all levels: neural, linguistic, scoring.」とやや怖い結果になっている。
プロジェクトサイトはYour Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

Protecting Human Cognition in the Age of AI [2.1]
ジェネレーティブAI（GenAI）の急速な普及は、人間の認知に大きな影響を及ぼしており、情報との関わり方や思考、学習の仕方を再構築しています。本稿では、特に学生などの初心者に焦点を当て、効果的な人間とAIの相互作用を理解する重要性を強調し、批判的思考を促進する教育体験の再設計について考察しています。また、GenAIが認知能力に与える影響や、情報過多などの社会的要因との相互作用についても探求しています
論文参考訳（メタデータ） (Fri, 11 Apr 2025 21:14:29 GMT)
短めだがSurvey的な論文。

Community Moderation and the New Epistemology of Fact Checking on Social Media

Community Moderation and the New Epistemology of Fact Checking on Social Media [124.3]
ソーシャルメディアプラットフォームは伝統的に、誤解を招くコンテンツを識別しフラグを立てるために、独立した事実チェック組織に依存してきた。 X(元Twitter)とMetaは、クラウドソースのファクトチェックの独自のバージョンを立ち上げて、コミュニティ主導のコンテンツモデレーションに移行した。主要なプラットフォーム間での誤情報検出の現在のアプローチについて検討し,コミュニティ主導型モデレーションの新たな役割を探求し,大規模クラウドチェックの約束と課題の両方を批判的に評価する。
論文参考訳（メタデータ） (Mon, 26 May 2025 14:50:18 GMT)
コミュニティで現実に行われているファクトチェック（および類似のチェック）に関する調査・評価

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation [38.6]
我々は32Kの実世界の画像質問対の総合的なベンチマークであるHumaniBenchを紹介する。 HumaniBenchは、公正性、倫理、理解、推論、言語の傾き、共感、堅牢性を含む7つのHuman Centered AI(HCAI)の原則を評価している。
論文参考訳（メタデータ） (Fri, 16 May 2025 17:09:44 GMT)
「HumaniBench probes seven HCAI principles—fairness, ethics, understanding, reasoning, language inclusivity, empathy, robustness—through seven diverse tasks that mix open- and closed-ended visual question answering (VQA), multilingual QA, visual grounding, empathetic captioning, and robustness tests.」というベンチマーク。商用モデルが優れた結果を出しているが、個別要素ではオープンなモデルが高スコアの場合もある。
プロジェクトサイトはHumaniBench: A Human-Centric Benchmark for Large Multimodal Models Evaluation

Understanding Gen Alpha Digital Language: Evaluation of LLM Safety Systems for Content Moderation

Understanding Gen Alpha Digital Language: Evaluation of LLM Safety Systems for Content Moderation [8.9]
この研究は、AIシステムがジェネレーションアルファのデジタル言語をどのように解釈するかの独特な評価を提供する(Gen Alpha、2010年生まれ-2024年) Gen Alphaは、没入型のデジタルエンゲージメントと、進化するコミュニケーションと既存の安全ツールとのミスマッチの増加により、新たな形のオンラインリスクに直面している。この研究は、ゲームプラットフォーム、ソーシャルメディア、ビデオコンテンツからの100の最近の表現のデータセットを使用して、オンラインの安全性に直接影響する重要な理解障害を明らかにしている。
論文参考訳（メタデータ） (Wed, 14 May 2025 16:46:11 GMT)
デジタルネイティブ世代とのギャップに関する研究、「Most critically, protection systems consistently lagged behind the rapid evolution of expressions, creating windows of vulnerability where concerning interactions went undetected」で「The resulting trust gap led many Gen Alpha users to avoid reporting concerning interactions, believing adults would misunderstand or minimize their experiences.」とのこと。。
生成AI時代はもっとギャップが広がるのだろうか・・・
リポジトリはGitHub – SystemTwoAI/GenAlphaSlang

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt [44.5]
我々は、道徳的多様性の持続性を見落としているようなシステムが、抵抗を引き起こし、信頼を失わせ、制度を不安定化するのではないかと心配している。理想的な条件下では、合理的なエージェントは単一の倫理上の会話の限界に収束するという考えである。我々は、この前提をオプション的かつ疑わしいものとみなし、紛争理論、文化進化、マルチエージェントシステム、制度経済学に基づく代替アプローチとして、適切性枠組みと呼ぶものを提案する。
論文参考訳（メタデータ） (Thu, 08 May 2025 12:55:07 GMT)
「This paper traces the underlying problem to an often-unstated Axiom of Rational Convergence: the idea that under ideal conditions, rational agents will converge in the limit of conversation on a single ethics. Treating that premise as both optional and doubtful, we propose what we call the appropriateness framework: an alternative approach grounded in conflict theory, cultural evolution, multi-agent systems, and institu- tional economics.」から始まる論文。
1. Contextual grounding、2. Community customization、3. Continual adaptation、4. Polycentric governanceはその通りだと思うし「it’s recognizing the actual pattern of human history, where we’ve demonstrably managed to live together despite fundamental disagreements, not by resolving them」は（実際は良くないことも多々起こっているけど）とても大枠として事実そうかもしれないが、具体的にどうやっていくべきかは頭を抱えるという現実がありそうな。色々と考えさせる論文という印象。
- 「For the latter, we have to shift from seeking agreement to managing conflict and enabling coexistence through shared practices and norms. This doesn’t imply “anything goes”.」とは書かれているが・・・

The Leaderboard Illusion

The Leaderboard Illusion [30.2]
アリーナは最も有能なAIシステムランキングのリーダーボードとして登場した。我々は,ゆがんだ競技場に生じた体系的な問題を同定する。
論文参考訳（メタデータ） (Tue, 29 Apr 2025 15:48:49 GMT)
Chatbot Arena に対する問題点の指摘と改善提案
「We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release and retract scores if desired.」、「At an extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release.」は確かに問題
リーダーボードの設計、運用はとても難しいが、できるところは改善を期待したい

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31