ReaLMistake – arXiv最新論文の紹介

Evaluating LLMs at Detecting Errors in LLM Responses [30.6]
この研究は、LLMによる客観的、現実的で多様なエラーからなる最初のエラー検出ベンチマークであるReaLMistakeを紹介した。我々はReaLMistakeを用いて12の大規模言語モデルに基づいて誤り検出を行う。
論文参考訳（メタデータ） (Thu, 04 Apr 2024 17:19:47 GMT)
LLMのエラー検出ベンチマーク。「Our experiments on this benchmark with error detectors based on 12 LLMs show that detecting mistakes in LLMs (GPT-4 and Llama 2 70B) is challenging even for recent LLMs.」という結論はそうだよなーという感じではありつつ、LLMにはときにくい課題かつエラー検出難しいものがありそうで面白い
リポジトリはpsunlpgroup/ReaLMistake: This repository includes a benchmark and code for the paper “Evaluating LLMs at Detecting Errors in LLM Responses”. (github.com)

コメントを残す

コメントを残す コメントをキャンセル