2025年9月25日 – arXiv最新論文の紹介

Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning

Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning [23.2]
大規模言語モデル(LLM)は、様々なタスクにまたがる印象的な機能を示しているが、構造化されたシンボリックプランニングを実行する能力はまだ限られている。論理的連鎖推論によりLLMのシンボリックプランニング能力を高めるために設計された新しい命令チューニングフレームワークPDDL-Instructを提案する。
論文参考訳（メタデータ） (Sun, 14 Sep 2025 02:42:34 GMT)
「We have presented PDDL-INSTRUCT, a novel framework that significantly enhances the symbolic planning capabilities of Large Language Models through logical chain-of-thought instruction tuning. By decomposing the planning process into verifiable logical reasoning chains and providing explicit verification feedback, our approach enables LLMs to generate valid plans with unprecedented reliability across diverse planning domains.」と工夫した形の計画作成用PostTraining

SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs [35.2]
大規模言語モデル(LLM)は、外部環境において様々なツールを自律的に呼び出す上で、優れたパフォーマンスを示している。本稿では, LLMツール利用の安全性を評価するために, ツールを直接実行することによって生じる不可逆的な害を避けることを目的としている。ツール利用セキュリティを総合的に評価する最初のベンチマークであるSafeToolBenchを提案する。ツール利用セキュリティに対するLCMの認識を3つの観点から向上することを目的とした,新しいフレームワークであるSafeInstructToolも提案する。
論文参考訳（メタデータ） (Tue, 09 Sep 2025 01:31:25 GMT)
LLMのツール利用におけるセキュリティを評価するベンチマーク、「we further pro- pose SafeInstructTool, the first framework to evaluate risks across these three perspectives from nine dimensions: User Instruction Perspective (Data Sensitivity, Harmfulness of the Instruction, Urgency of the Instruction, Frequency of Tool Utilization in the Instruction), Tool Itself Perspective (Key Sensitivity, Type of Operation, Impact Scope of the Operation) and Joint Instruction-Tool Perspective (Alignment Between Instruction and Tool, Value Sensitivity). Thus, it can enhance LLMs’ awareness of tool utilization safety, leading to more safer and trustworthy language agents.」とのこと
リポジトリはGitHub – BITHLP/SafeToolBench: [2025 EMNLP Findings] SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

MMORE: Massive Multimodal Open RAG & Extraction

MMORE: Massive Multimodal Open RAG & Extraction [35.5]
MMOREは、大規模な異種文書フォーマットから知識を取り込み、変換し、取り出すパイプラインである。 MMOREはテキスト、テーブル、画像、メール、オーディオ、ビデオを含む15以上のファイルタイプをサポートし、それらを統一されたフォーマットに処理する。処理ベンチマークでは、MMOREは1ノードのベースラインよりも3.8倍のスピードアップを示し、スキャンされたPDFのドッキングよりも40%高い精度を示している。
論文参考訳（メタデータ） (Mon, 15 Sep 2025 13:56:06 GMT)
「MMORE is a scalable, open-source pipeline for retrieval- augmented generation over diverse, real-world data. It supports more than 15 file types, including PDFs, spread- sheets, images, audio, and video, and enables structured, high-throughput integration into LLM workflows.」と便利そうなソフトウェア。
リポジトリはGitHub – swiss-ai/mmore: Massive Multimodal Open RAG & Extraction A scalable multimodal pipeline for processing, indexing, and querying multimodal documents Ever needed to take 8000 PDFs, 2000 videos, and 500 spreadsheets and feed them to an LLM as a knowledge base? Well, MMORE is here to help you!

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30