staka – ページ 19 – arXiv最新論文の紹介

Self-Steering Language Models

Self-Steering Language Models [114.0]
DisCIPLは、”セルフステアリング(self-steering)”言語モデルのメソッドである。 DisCIPLはPlannerモデルを使用してタスク固有の推論プログラムを生成する。我々の研究は、高度に並列化されたモンテカルロ推論戦略の設計空間を開く。
論文参考訳（メタデータ） (Wed, 09 Apr 2025 17:54:22 GMT)
「This paper introduces DISCIPL, a method for “self-steering” LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models.」というアプローチの紹介。
「By decomposing reasoning into planning and execution, our architecture preserves flexibility while enabling orchestration of highly efficient, parallel search patterns.」というのは経験的にも有効そうに思う。検証がしっかりされているのはありがたい。

Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models [51.9]
大規模言語モデル(LLM)の最近の進歩は、複雑な推論タスクを実行する能力を大幅に強化している。システム1推論は計算効率が良いが、最適以下の性能をもたらす。システム2推論(System 2 reasoning)は、思考の遅さや非効率性、不必要な推論の振る舞いにより、かなりの計算コストを発生させることが多い。
論文参考訳（メタデータ） (Mon, 31 Mar 2025 17:58:07 GMT)
「In this survey, we provide a comprehensive analysis of reasoning economy in both the post-training and test-time inference stages of LLMs, encompassing」というサーベイ。
リポジトリはGitHub – DevoAllen/Awesome-Reasoning-Economy-Papers: Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? [42.4]
本研究は,Code Llamaの微調整バージョンを用いて,大規模言語モデル(LLM)によるハイパーパラメータ最適化の実現可能性について検討する。提案手法は,演算オーバーヘッドを著しく低減しつつ,ルート平均角誤差(RMSE)の点で競合的あるいは優れた結果が得られる。結果は、LLMが効率性を超えて、相当な時間節約と同等の安定性を提供し、機械学習の進歩における価値を強調していることを確認した。
論文参考訳（メタデータ） (Tue, 08 Apr 2025 13:15:47 GMT)
「Our evaluations reveal that fine-tuned Code Llama often meets or exceeds the accuracy achieved by Optuna, a well-established hyperparameter optimization framework.」を主張する論文
興味深い結果だが、なんで有効なのだろうという疑問があるようなないような。腑に落ちるような落ちないような・・・

A Survey on Unlearnable Data

A Survey on Unlearnable Data [27.3]
Unlearnable Data(ULD)は、機械学習モデルが特定のデータから意味のあるパターンを学ぶのを防ぐ革新的な防御技術として登場した。我々は、異なるUDLアプローチを比較し、比較し、その強み、制限、および非学習性、不受容性、効率、堅牢性に関連するトレードオフを分析します。本稿では, モデル劣化に伴う摂動不感のバランスや, ULD生成の計算複雑性など, 重要な課題について論じる。
論文参考訳（メタデータ） (Sun, 30 Mar 2025 17:41:30 GMT)
「Unlearnable Data (ULD) refers to a category of data that has been deliberately modified through subtle perturbations, preventing models from effectively learning useful representations during training while maintaining perceptual quality for human observers.」のサーベイ。

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens [119.6]
OLMoTraceは、言語モデルのアウトプットを、完全にマルチトリルのトレーニングデータにリアルタイムでトレースする。 OLMoTraceは、トレーニングテキストコーパス内の言語モデル出力のセグメントとドキュメントの冗長な一致を見つけ、表示する。
論文参考訳（メタデータ） (Wed, 09 Apr 2025 17:59:35 GMT)
「OLMOTRACE finds and shows verbatim matches between segments of language model output and documents in the training text corpora.」というシステムの提案とOSS実装の公開。Limitationにも「The retrieved documents should not be interpreted as having a causal effect on the LM output, or as supporting evidence or citations for the LM output.」と書かれているとはいえ（かつLLMのデータが必要とはいえ）、様々な応用が考えられそう。
リポジトリはGitHub – allenai/infinigram-api

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems [133.5]
大規模言語モデル(LLM)の出現は、人工知能の変革的シフトを触媒している。これらのエージェントがAI研究と実践的応用をますます推進するにつれて、その設計、評価、継続的な改善は複雑で多面的な課題を呈している。この調査は、モジュール化された脳にインスパイアされたアーキテクチャ内でインテリジェントエージェントをフレーミングする、包括的な概要を提供する。
論文参考訳（メタデータ） (Mon, 31 Mar 2025 18:00:29 GMT)
「This survey provides a comprehensive overview, framing intelligent agents within a modular, brain-inspired architecture that integrates principles from cognitive science, neuroscience, and computational research.」という非常に包括的なサーベイ。
リポジトリはGitHub – FoundationAgents/awesome-foundation-agents: About Awesome things towards foundation agents. Papers / Repos / Blogs / …

DeepSeek-R1 Thoughtology: Let’s about LLM Reasoning

DeepSeek-R1 Thoughtology: Let’s <think> about LLM Reasoning [31.8]
本稿では,DeepSeek-R1の思考長,長期的・紛らわしい文脈の管理,文化的・安全性に関する影響と制御性について検討する。 DeepSeek-R1には、余分な推論時間によってモデルパフォーマンスが損なわれるような推論の‘スイートスポット’がある。また、DeepSeek-R1の安全性上の脆弱性は、非合理的な脆弱性と比べても大きい。
論文参考訳（メタデータ） (Wed, 02 Apr 2025 00:36:08 GMT)
DeepSeek R1の推論に関する分析、「DeepSeek-R1 exhibits higher safety vulnerabilities compared to its non-reasoning counterpart DeepSeek-V3 (DeepSeek-AI et al , 2025b).　We also show that the model’s reasoning capabilities can be used to generate jailbreak attacks that successfully elicit harmful responses from safety-aligned LLMs.」、「When presented with moral or cultural questions, DeepSeek-R1 reasons for significantly longer when prompted in English than when prompted in Chinese. It also provides different responses, displaying different sets of cultural values in each language」は面白い。

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations [45.6]
我々はScholarCopilotを紹介した。ScholarCopilotは学術書記のための既存の大規模言語モデルを強化するために設計された統合フレームワークである。 ScholarCopilotは、検索トークン[RET]を生成して学術的な参照をいつ取得するかを決定し、その表現を利用してデータベースから関連する引用を検索する。効率を上げるために、単一のフレームワーク内で生成タスクと引用タスクの両方を共同で最適化します。
論文参考訳（メタデータ） (Tue, 01 Apr 2025 14:12:14 GMT)
学術論文のためのLLM、「ScholarCopilot dynamically interleaves retrieval and generation by producing retrieval tokens ([RET]) based on current context, enabling context-aware citation retrieval and optional user refinement.」というRETという特殊なトークンを用いた動作が特徴的

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement [100.9]
ThinkLite-VLはQwen2.5-VL-7Bインストラクションの平均性能を7%向上させる。私たちのコード、データ、モデルはhttps://github.com/si0wang/ThinkLite-VL.orgで公開されています。
論文参考訳（メタデータ） (Thu, 10 Apr 2025 17:49:05 GMT)
効率のよいVision-Languageモデルの推論強化方法の提案。「Our model achieves SoTA performance using only 11k data, and without any additional knowledge distillation.」と使用データが少ない。カギはデータ品質とのこと「Our key insight highlights the critical importance of selecting genuinely challenging examples for Reinforcement Fine-Tuning (RFT).」
リポジトリはGitHub – si0wang/ThinkLite-VL

Towards Trustworthy GUI Agents: A Survey

Towards Trustworthy GUI Agents: A Survey [64.6]
本調査では,GUIエージェントの信頼性を5つの重要な次元で検証する。敵攻撃に対する脆弱性、シーケンシャルな意思決定における障害モードのカスケードなど、大きな課題を特定します。 GUIエージェントが普及するにつれて、堅牢な安全基準と責任ある開発プラクティスを確立することが不可欠である。
論文参考訳（メタデータ） (Sun, 30 Mar 2025 13:26:00 GMT)
GUIエージェントの信頼性に関するサーベイ。整理軸は「Security」、「Reliability」、「Explainability」、「Ethical Alignment」、「Evaluation methodologies」

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31