2026年1月23日 – arXiv最新論文の紹介

When Should We Introduce Safety Interventions During Pretraining?

When Should We Introduce Safety Interventions During Pretraining? [100.4]
先行研究は、有害な内容の表現などの事前訓練の介入が、結果のモデルの安全性を大幅に向上させることを示した。介入の導入は一般的に、過度な拒絶率の増加を伴わない、より堅牢なモデルをもたらす。また、より安全な世代に向けたモデルのステアビリティにも明らかなメリットがあると考えています。
論文参考訳（メタデータ） (Sun, 11 Jan 2026 22:38:17 GMT)
「Our experiments show that incorporating safety pretraining interventions indeed help, and the clearest result is that there is much improved robustness after benign finetuning when pretraining interventions are introduced earlier (e g , at 0% or 20% of the pretraining tokens). This also manifests into impacts on the model’s underlying representation geometry; incorporating interventions and metadata earlier in pretraining leads to greater separation of safe vs unsafe content.」とのこと。
タイミングによって結構な差が出ているのが意外。

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents [61.0]
AIエージェントは、悪意のあるコンテンツがエージェントの行動をハイジャックして認証情報を盗んだり、金銭的損失を引き起こすような、インジェクション攻撃に弱い。 CUAのためのシングルショットプランニングでは、信頼できるプランナーが、潜在的に悪意のあるコンテンツを観察する前に、条件付きブランチで完全な実行グラフを生成する。このアーキテクチャ分離は命令インジェクションを効果的に防止するが、ブランチステアリング攻撃を防ぐには追加の対策が必要であることを示す。
論文参考訳（メタデータ） (Wed, 14 Jan 2026 23:06:35 GMT)
コンピュータ利用エージェントに対するセキュリティ向上策の提案、「•Dual-LLM Architecture for CUAs: We design the first Dual-LLM architecture adapted for Computer Use Agents, using Single-Shot Planning with an Observe-Verify-Act paradigm to provide Control Flow Integrity guarantees.」、「Branch Steering & Defenses: We identify Branch Steering as a distinct data-flow threat vector, where attackers manipulate visual cues (e g , fake buttons) to fool the agent into choosing a dangerous, yet valid, path within its pre-written plan. We demonstrate its feasibility, and evaluate redundancy-based mitigation, highlighting the fundamental distinction between control-flow and data-flow security in isolated architectures.」

Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows

Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows [9.4]
FlamePilotは、自動および自己補正CFDによる燃焼モデリング研究を促進するように設計されている。システムは、科学的な記事から学び、初期設定から最適化された結果までシミュレーションを導くための重要な情報を抽出することができる。ケーススタディでは、FlamePilotが研究論文を自動で構成されたシミュレーションに変換し、シミュレーションを実行し、結果を後処理し、エビデンスに基づく改善を提案し、収束のために多段階のパラメータスタディを管理した。
論文参考訳（メタデータ） (Sun, 04 Jan 2026 04:00:28 GMT)
「we introduce FlamePilot, an LLM agent designed to empower combustion modeling research through automated and self-corrective CFD workflows. FlamePilot differentiates itself through an architecture that leverages atomic tools to ensure the robust setup and execution of complex simulations in both OpenFOAM and extended frameworks such as DeepFlame.」とドメインを特化した研究支援エージェント。

月	火	水	木	金	土	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31