2024年10月14日 – arXiv最新論文の紹介

Biased AI can Influence Political Decision-Making

Biased AI can Influence Political Decision-Making [64.9]
本稿では、AI言語モデルにおけるパルチザンバイアスが政治的意思決定に及ぼす影響について検討する。政治的に偏見のあるモデルに晒された参加者は、意見を採用し、AIの偏見と一致した決定を下す可能性が著しく高いことがわかった。
論文参考訳（メタデータ） (Tue, 08 Oct 2024 22:56:00 GMT)
「We found that participants exposed to politically biased models were significantly more likely to adopt opinions and make decisions aligning with the AI’s bias, regardless of their personal political partisanship.」、「However, we also discovered that prior knowledge about AI could lessen the impact of the bias, highlighting the possible importance of AI education for robust bias mitigation.」という指摘。教育の効果はあるようだが、今後問題は大きくなっていくんじゃないかと思う。。

Data Selection via Optimal Control for Language Models [134.7]
本研究は,大規模コーパスから高品質な事前学習データを選択することにより,下流利用におけるLMの能力を向上させることを目的とする。 PMP条件を解くことで最適なデータ選択を近似するフレームワークであるPMPベースのデータ選択(PDS)を導入する。 PDSの利点は、スケーリング法則に従ってテスト損失曲線の外挿によって証明されたように、10Tトークンでトレーニングされた400Bモデルにまで拡張される。
論文参考訳（メタデータ） (Wed, 09 Oct 2024 17:06:57 GMT)
「by treating data selection as the control variables (i.e., whether a data point is included in pre-training), the LM pre-training process as the dynamic system, and the LM’s downstream performance as the objective, we leverage Pontryagin’s Maximum Principle (PMP; 63) to derive the necessary conditions for optimal data selection in theory.」という制御理論を応用したデータセレクション手法の提案。「The overhead of running PDS to select data is only about 1/9 of that of pre-training a 1.7B model.」と実用的に思える。
プロジェクトサイトはAdvancing AI for Humanity (thegenerality.com)、リポジトリはLMOps/data_selection at main · microsoft/LMOps · GitHub

Agent S: An Open Agentic Framework that Uses Computers Like a Human [31.2]
我々は、GUI(Graphical User Interface)を通じてコンピュータとの自律的なインタラクションを可能にするオープンエージェントフレームワークであるAgent Sを提案する。 Agent Sは、ドメイン固有の知識の取得、長いタスクの水平線の計画、動的で一様でないインターフェイスの処理という、コンピュータタスクの自動化における3つの重要な課題に対処することを目指している。
論文参考訳（メタデータ） (Thu, 10 Oct 2024 17:43:51 GMT)
人が操作するようにコンピュータを操作するエージェントフレームワークの提案
リポジトリはGitHub – simular-ai/Agent-S: Official codebase for Agent S, a open agentic framework that uses computers like a human

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models [24.3]
GSM8Kベンチマークは、小学校レベルの質問に対するモデルの数学的推論を評価するために広く使われている。 GSM-Symbolicは、シンボリックテンプレートから生成された改良されたベンチマークである。以上の結果から,LLMは同一質問の異なるインスタンス化に応答する際,顕著なばらつきを示すことが明らかとなった。
論文参考訳（メタデータ） (Mon, 07 Oct 2024 17:36:37 GMT)
「We introduce GSM-Symbolic, an enhanced benchmark that generates diverse variants of GSM8K questions using symbolic templates」というベンチマークの紹介であるが、「We show that LLMs exhibit more robustness to changes in superficial elements like proper names but are very sensitive to changes in numerical values」というのはなかなか衝撃的な結果。
「To create the templates, we add seemingly relevant but ultimately inconsequential statements to GSM-Symbolic templates.」という無意味な情報を加えたGSM-NoOpでは結果がさらに悪くなるようで、単純なLeakでもない難しさがある。

A Survey on the Honesty of Large Language Models [115.8]
正直とは、大きな言語モデル(LLM)を人間の価値と整合させる基本的な原則である。将来性はあるものの、現在のLLMは依然として重大な不正直な行動を示す。
論文参考訳（メタデータ） (Fri, 27 Sep 2024 14:34:54 GMT)
「Honesty is a fundamental principle for aligning large language models (LLMs) with human values, requiring these models to recognize what they know and don’t know and be able to faithfully express their knowledge.」から始まるサーベイ。
リポジトリはGitHub – SihengLi99/LLM-Honesty-Survey