2023年9月15日 – arXiv最新論文の紹介

On Large Language Models’ Selection Bias in Multi-Choice Questions [117.7]
大規模言語モデル(LLM)の研究において、MCQ(Multi-choice Question)は一般的だが重要なタスク形式として機能する。我々の研究は、LCMがMCQに固有の「選択バイアス」を示すことを示している。選択バイアスを軽減するためにPriDeと呼ばれる新しい手法を提案する。
論文参考訳（メタデータ） (Thu, 7 Sep 2023 17:44:56 GMT)
多肢選択問題で回答の位置によりLLMの性能が変わることが知られている（For instance, moving the golden answers to position D degrades the accuracy of gpt-3.5-turbo by 6.3 (from 67.2 to 60.9)）。この報告ではそのバイアスを軽減する手法 PriDe(Debiasing with Prior estimation)を提案している。
「It cannot be mitigated via basic prompting strategies (§2.5), such as explicit debiasing instruction (i.e., instructing LLMs to treat each option fairly) and Chain-of-Thought prompting (Wei et al , 2022).」や「We find that removing option IDs can debias LLMs,」というのも面白い。正しくバイアス除去を行うと全体的なパフォーマンスも向上するよう。

Baseline Defenses for Adversarial Attacks Against Aligned Language Models [109.8]
我々は,大規模言語モデルに対する主要な敵攻撃に対するベースライン防衛戦略を評価した。検出(複雑度に基づく)、入力前処理(言い換えと再帰化)、対人訓練の3種類の防衛について検討する。驚くべきことに、他のドメインで予想されるよりも、フィルタリングや前処理で成功しています。
論文参考訳（メタデータ） (Fri, 1 Sep 2023 17:59:44 GMT)
LLMへの攻撃に対する対応に関する研究、detection (perplexity based), input preprocessing (paraphrase and retokenization), adversarial trainingが対象
「Interestingly, in this initial analysis, we find much more success with filtering and preprocessing strategies than in the vision domain, and that adaptive attacks against such defenses are non-trivial.」「The domain of LLMs is appreciably different from “classical” problems in adversarial machine learning.」という記載が印象的。