DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis [52.6]
本稿では,生のベンチマークと総合的自動評価フレームワークであるDeepScholar-benchを紹介する。 DeepScholar-benchは、最近の高品質なArXiv論文からクエリを抽出し、真の研究合成タスクにフォーカスしている。また,LOTUS APIを用いて効率的に実装した参照パイプラインであるDeepScholar-baseを開発した。
論文参考訳（メタデータ） (Wed, 27 Aug 2025 16:36:34 GMT)
「DeepScholar- bench draws queries from recent, high-quality ArXiv papers and focuses on a real research synthesis task: generating the related work sections of a paper by retrieving, synthesizing, and citing prior research.」というベンチマークの提案。Live benchmarkとなっている。
プロジェクトサイトはGitHub – guestrin-lab/deepscholar-bench: benchmark and evaluate generative research synthesis

コメントを残す

コメントを残す コメントをキャンセル