Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows [203.4]
PIM(Practical Inquiry Model)に基づく運用SGI定義を提案する。深層研究、アイデア生成、ドライ/ウェット実験、実験推論の4つのタスクを通じて運用しています。私たちのPIMによる定義、ワークフロー中心のベンチマーク、実証的な洞察は、真に科学的な発見に参加するAIシステムの基盤を確立します。
論文参考訳（メタデータ） (Thu, 18 Dec 2025 12:44:36 GMT)
scientific general intelligence (SGI)、「SGI is an AI that can autonomously navigate the complete, iterative cycle of scientific inquiry with the versatility and proficiency of a human scientist」の研究、ベンチマーク等も提案している。「Experiments reveal a consistent pattern: in Deep Research, models show step-level alignment but low exact-match accuracy (10–20%), with brittleness in quantitative reasoning; in Idea Generation, hypotheses are fluent but underspecified and infeasible; in Dry Experiment, code is executable but PassAll@k remains low; in Wet Experiment, sequences show omissions and misordering; and in Experimental Reasoning, causal reasoning outperforms comparative, with persistent multimodal challenges. These highlight gaps between linguistic fluency and integrated scientific cognition.」とあるなど道半ばという感じではあるが非常に流行っている分野だと思う。
SGI-Benchの上位はGemini 3 Pro, Claude Sonnet 4.5, Qwen3 Max, GPT-4.1, GPT-5.2 Proと各社のフロンティアモデルが並ぶ。
リポジトリはSGI-Bench — Scientific General Intelligence

コメントを残す

月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル