Judging the Judges: A Collection of LLM-Generated Relevance Judgements

Judging the Judges: A Collection of LLM-Generated Relevance Judgements [37.1]
本稿では,SIGIR 2024におけるLLMJudgeの大規模自動妥当性評価の結果をベンチマークし,報告する。 8つの国際チームが作成したTREC 2023ディープラーニングトラック関連判定のラベルを42 LLMで作成し、ベンチマークする。
論文参考訳（メタデータ） (Wed, 19 Feb 2025 17:40:32 GMT)
「This paper benchmarks and reports on the results of a large-scale automatic relevance judgment evaluation, the LLMJudge challenge at SIGIR 2024, where different relevance assessment approaches were proposed.」とのことでいろいろ検証なアプローチのまとめ。

コメントを残す

コメントを残す コメントをキャンセル