2026年3月4日 – arXiv最新論文の紹介

Test-Time Computing for Referring Multimodal Large Language Models [143.5]
そこで我々は,新しいテスト時間適応フレームワークである ControlMLLM++ を提案する。学習可能な視覚的プロンプトを凍ったマルチモーダルな大言語モデルに注入する。
論文参考訳（メタデータ） (Mon, 23 Feb 2026 04:42:10 GMT)
「We introduce ControlMLLM++, a novel test- time latent variable optimization framework that injects explicit visual prompts into frozen pre-trained MLLMs to enable referring capabilities without additional training.」とのこと。「ControlMLLM++ falls into this category, performing test-time optimization of latent perturbations to visual tokens to steer attention maps towards the referred region r.」というアプローチ。
リポジトリはGitHub – mrwu-mac/ControlMLLM: [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models’

Counterfactual Simulation Training for Chain-of-Thought Faithfulness [46.3]
我々は,CST(Counterfactual Simulation Training)と呼ばれるトレーニング手法を導入する。 CSTは、シミュレーターが偽の入力に対してモデルの出力を正確に予測できるCoTに報酬を与える。最大235Bパラメータのモデルによる実験により、CSTはキューベースのカウンターファクトの精度を大幅に向上できることが示された。
論文参考訳（メタデータ） (Tue, 24 Feb 2026 09:15:30 GMT)
CoTの信頼性を向上させるため「we introduce a training method called Counterfactual Simulation Training (CST), which aims to improve CoT faithfulness by rewarding CoTs that enable a simulator to accurately predict a model’s outputs over counterfactual inputs. We apply CST in two settings: (1) CoT monitoring with cue-based counterfactuals, to detect when models rely on spurious features, reward hack, or are sycophantic, and (2) counterfactual simulation over generic model-based counterfactuals, to encourage models to produce more faithful, generalizable reasoning in the CoT.」というアプローチを提案。Reasoningの過程をコントロールするのも重要なのはそうだと思う。
リポジトリはGitHub – peterbhase/counterfactual-simulation-training: Codebase for paper: “Counterfactual Simulation Training for Chain-of-Thought Faithfulness”