2025年10月17日 – arXiv最新論文の紹介

Self-Improving LLM Agents at Test-Time [49.9]
言語モデル(LM)の1つのパラダイムは、大規模なトレーニングデータセットの作成に依存している。実際には、大量のデータを集めることは非効率であり、それらのトレーニングは違法に高価である。テスト時間自己改善(TT-SI)とテスト時間蒸留(TT-D)の2つのバリエーションについて検討する。
論文参考訳（メタデータ） (Thu, 09 Oct 2025 06:37:35 GMT)
「(i) identify uncertain samples via a novel uncertainty estimator, (ii) generate new training instances similar to these samples, and (iii) update the model online.」というステップからなるself improvement。「Test-Time Self-Improvement (TT-SI), where the model trains on self-generated samples using parameter efficient fine-tuning techniques (PEFT) (Hu et al , 2022), and Test-Time Distillation (TT-D) where adaptation is guided by supervision from samples synthesized by a more capable teacher model.」の2種類を検討している（後者はself-improvingなのか若干疑問ではあるが。。）

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond [101.2]
AccidentBenchは、自動車事故シナリオとBeyondドメインを組み合わせた大規模なベンチマークである。このベンチマークには、約2000のビデオと19000以上の人間による質問応答ペアが含まれている。
論文参考訳（メタデータ） (Tue, 30 Sep 2025 17:59:13 GMT)
事故シナリオのベンチマーク、「AccidentBench targets understanding and reasoning across diverse vehicle accident scenarios (83.0%), while also encompassing airspace (10.2%) and waterway (6.8%) domains, in which safety, perception, and decision-making are deeply interdependent. Unlike benchmarks that emphasize isolated skills or single domains, AccidentBench systematically challenges models across several critical understanding and reasoning capabilities: temporal understanding and reasoning (tracking event sequences and causality over extended periods); spatial understanding and reasoning (understanding dynamic spatial relationships and multi-agent trajectories); and intent and goal reasoning (inferring agent intentions and planning goals), which further includes complex strategic and counterfactual reasoning (evaluating higher-order strategies, action implications, and “what-if” scenarios).」
リポジトリはGitHub – SafeRL-Lab/AccidentBench: AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31