Prompt – arXiv最新論文の紹介

Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption

Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption [52.0]
そこで本研究では,大規模な言語モデルにおいて,チェーン・オブ・ディフェンシブ・思想と呼ばれる単純な手法を用いて,参照破損に対するロバスト性を大幅に向上したことを示す。特に、メソッドの単純さと適用性を考えると、この改善は驚くべきものです。
論文参考訳（メタデータ） (Tue, 29 Apr 2025 13:50:05 GMT)
「Number the references (if they are not already). 2.Include additional task instructions to firstly identify relevant and reliable contexts. 3. Before responses, insert structured reasoning steps that enunciates the indices of the relevant contexts (Irelevant) and the indices of reliable contexts (Ireliable).」というChain of defensive thoughtの提案
「In particular, we show how a wide range of large language models exhibit significantly improved robustness against reference corruption using a simple method called chain-of-defensive-thought, where only a few exemplars with structured and defensive reasoning are provided as demonstrations.」とのこと。

Prompt Compression for Large Language Models: A Survey

Prompt Compression for Large Language Models: A Survey [31.6]
本稿では, ハードプロンプト法とソフトプロンプト法に分類した, プロンプト圧縮技術の概要について述べる。また, 各種急速圧縮手法の下流適応について検討した。
論文参考訳（メタデータ） (Wed, 16 Oct 2024 09:13:23 GMT)
プロンプト圧縮手法のサーベイ

PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation

PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation [22.7]
大規模言語モデル(LLM)はNLPの分野に革命をもたらした。本研究では,機械翻訳(MT)および要約データセット上で,オープンソースのLLMベースのメトリクスに対して,720以上のプロンプトテンプレートを評価する。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 17:56:29 GMT)
機械翻訳と要約を対象とした大規模なプロンプトテンプレートの評価。複数のオープンなLLMで検証しており、LLM間の性能差も参考になる。コードが公開されたら細かく見てみたいところ。
プロジェクトサイトはNLLG (nl2g.github.io)、リポジトリはGitHub – Gringham/PrExMe

The Prompt Report: A Systematic Survey of Prompting Techniques

The Prompt Report: A Systematic Survey of Prompting Techniques [42.6]
本稿では, プロンプトの分類を組立て, 利用分析を行うことにより, プロンプトの構造的理解を確立した。本稿では,33の語彙の包括的語彙,58のテキストのみのプロンプト技術,40のモダリティのテクニックを提示する。
論文参考訳（メタデータ） (Thu, 06 Jun 2024 18:10:11 GMT)
プロンプトテクニックのサーベイ
本当に色々あるという感想。そして本サーベイに入っていないものもいっぱいある…。

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 [26.1]
本稿では,大規模言語モデルのクエリとプロンプトのプロセスの合理化を図った26の原則を紹介する。 LLaMA-1/2 (7B, 13B, 70B) と GPT-3.5/4 を用いて実験を行い, 提案法の有効性を検証した。
論文参考訳（メタデータ） (Tue, 26 Dec 2023 18:59:33 GMT)
LLMへのプロンプトでよく言われているベストプラクティス的な手法を試し比較した論文。それなりに有効そうなものが多い。
リポジトリはVILA-Lab/ATLAS: Principled instruction dataset on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171 (github.com)

PromptBench

PromptBench: A Unified Library for Evaluation of Large Language Models [33.8]
大規模言語モデル(LLM)を評価する統合ライブラリであるPromptBenchを紹介する。プロンプト構築、プロンプトエンジニアリング、データセットとモデルのローディング、敵のプロンプトアタック、動的評価プロトコル、分析ツールなど、研究者が使いやすく拡張した重要なコンポーネントで構成されている。
論文参考訳（メタデータ） (Wed, 13 Dec 2023 05:58:34 GMT)
LLM（に対するプロンプト）の評価を行うためのフレームワーク。簡単に使えそう＆便利そう
リポジトリはGitHub – microsoft/promptbench: A unified evaluation framework for large language models

Program-Aided Reasoners (better) Know What They Know

Program-Aided Reasoners (better) Know What They Know [59.3]
プログラム支援言語モデル(PAL)の校正と,5つのデータセットにまたがるテキストベースのChain-of-Thought(COT)技術の比較を行った。以上の結果から, PALは75%の症例で校正の改善につながることが示唆された。
論文参考訳（メタデータ） (Thu, 16 Nov 2023 04:17:49 GMT)
PALとCOTの比較、「Overall, we demonstrate that, in the majority of cases, program-aided reasoners better know what they know than text-based counterparts.」とのこと。理由が知りたいところ。
リポジトリはhttps://github.com/mathuryash5/code-calibratesとのこと

Thread of Thought

Thread of Thought Unraveling Chaotic Contexts [133.2]
思考のスレッド(ThoT)戦略は、人間の認知プロセスからインスピレーションを得ている。実験では、他のプロンプト技術と比較して、ThoTは推論性能を著しく改善する。
論文参考訳（メタデータ） (Wed, 15 Nov 2023 06:54:44 GMT)
プロンプトテクニック“Thread of Thought” (ThoT) strategyの提案。「chaotic context X and query Q」に対して「“[X] Q: [Q] Walk me through this context in manageable parts step by step, summarizing and analyzing as we go. A:”.」としてから回答を得るアプローチ。CoTより優れているとのこと。

Everything of Thoughts

Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation [42.5]
効果的な思考設計は、パフォーマンス、効率、柔軟性の3つの重要な観点を考慮すべきである。我々は,既存の思考パラダイムのペンローズ三角形の法則に反する,思考のすべて (XoT) と呼ばれる新しい思考促進手法を導入する。
論文参考訳（メタデータ） (Tue, 7 Nov 2023 12:30:36 GMT)
of thoughtシリーズワイルカードの2番目（？）
「XOT leverages pretrained reinforcement learning and Monte Carlo Tree Search (MCTS) to incorporate external domain knowledge into thoughts, thereby enhancing LLMs’ capabilities and enabling them to generalize to unseen problems efficiently.」ということでX-of-Thoughts – arXiv最新論文の紹介 (devneko.jp)とも異なるアプローチ

DePT: Decoupled Prompt Tuning

DePT: Decoupled Prompt Tuning [133.7]
この作業は、即時チューニングにおいてBase-New Tradeoff (BNT)ジレンマを突破する。チューニングされたモデルがベースタスクに一般化されるほど、それが新しいタスクに一般化される。提案するDecoupled Prompt Tuning (DePT) フレームワークは,プロンプトチューニング中に特徴チャネルから独立した特徴空間へベース固有の知識を分離する。
論文参考訳（メタデータ） (Thu, 14 Sep 2023 05:45:40 GMT)
GitHub – Koorye/DePT: Offical implemention of paper “Decoupled Prompt Tuning”

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31