ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows

ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows [109.3]
CS-54k(CS-54k)は、コンピュータ科学におけるQ&Aペアの高品質なコーパスである。 CS-4kは、科学研究を支援するAIの能力を評価するためのベンチマークである。 CS-50kは大規模なトレーニングデータセットである。
論文参考訳（メタデータ） (Thu, 23 Oct 2025 07:07:35 GMT)
「We introduce CS-4k, the first benchmark that systematically evaluates the end-to-end research workflow in computer science through open-ended scientific question answering, offering a rigorous yardstick to assess LLMs’ ability to assist scientific research.」というベンチマーク。また、これらデータを用いたポストトレーニングの有効性を主張。
リポジトリはGitHub – wph6/ResearchGPT: Official repo for ReseachGPT

コメントを残す

コメントを残す コメントをキャンセル