RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques [59.9]
我々は,Large Language Models (LLMs) の批判能力を評価するために設計された新しいベンチマークを導入する。通常、オープンループ方式で機能する既存のベンチマークとは異なり、我々のアプローチでは、批判から生成された修正の質を評価するクローズドループ手法を採用している。
論文参考訳（メタデータ） (Fri, 24 Jan 2025 13:48:10 GMT)
LLMの批判能力を評価するためのベンチマークの提案、「We investigate three distinct scenarios: self-critique, crosscritique, and iterative critique. Our findings reveal that in nearly all cases, the o1-mini model demonstrates the most impressive performance.」とのこと。
リポジトリはGitHub – tangzhy/RealCritic

コメントを残す

コメントを残す コメントをキャンセル