2024年10月2日 – arXiv最新論文の紹介

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.6]
本研究では,乱数から発散する概念に触発された偏差に基づくキャリブレーション手法を導入し,プリトレーニングデータ検出のためのトークン確率のキャリブレーションを行う。我々は,中国語テキスト上でのLLMの検出手法の性能を評価するために,中国語のベンチマークであるPatentMIAを開発した。
論文参考訳（メタデータ） (Mon, 23 Sep 2024 07:55:35 GMT)
事前学習に何が使われたかを検知するタスクpretraining data detectionに関する手法DC-PDD およびベンチマークの提案。「The pretraining data detection problem can be viewed as an instance of the membership inference attack (MIA) task (Shokri et al , 2017), where the primary objective is to determine if a particular text was part of a target LLM’s training corpus.」
DC-PDD computes the divergence between the token probability distribution and the token frequency distribution for detection.とのこと。
リポジトリはGitHub – zhang-wei-chao/DC-PDD

Deep Graph Anomaly Detection: A Survey and New Perspectives [86.8]
グラフ異常検出(GAD)は、異常なグラフインスタンス(ノード、エッジ、サブグラフ、グラフ)を特定することを目的とする。ディープラーニングアプローチ、特にグラフニューラルネットワーク(GNN)は、GADにとって有望なパラダイムとして現れています。
論文参考訳（メタデータ） (Mon, 16 Sep 2024 03:05:11 GMT)
GNNを用いた異常検知に関するサーベイ。
リポジトリはGitHub – mala-lab/Awesome-Deep-Graph-Anomaly-Detection: Official repository for survey paper “Deep Graph Anomaly Detection: A Survey and New Perspectives”, including diverse types of resources for graph anomaly detection.