2025年9月4日 – arXiv最新論文の紹介

Mimicking the Physicist’s Eye:A VLM-centric Approach for Physics Formula Discovery [98.6]
VIPERR-aq1は、方程式推論のための視覚誘導を行うマルチモーダルモデルである。視覚知覚、軌跡データ、象徴的推論を統合し、科学的発見過程をエミュレートする。常に最先端のVLMベースラインを精度と解釈性で上回る。
論文参考訳（メタデータ） (Sun, 24 Aug 2025 14:34:21 GMT)
物理方程式発見タスクへの取り組み。PostTrainingによってフロンティアなモデルを超える性能。「Our framework draws inspiration from human scientific reasoning and follows a two-stage pipeline. In the first stage, Motion Structure Induction (MSI), the model undergoes Supervised Fine- Tuning (SFT), learning to interpret kinematic evidence under joint supervision of Chain-of-Thought (CoT) rationales and ground-truth equations, before producing initial symbolic hypotheses guided by causal CoT prompts. In the second stage, Reward-Guided Symbolic Calibration (RGSC), reinforcement learning with Group Relative Policy Optimization (GRPO) (Shao et al , 2024) re- fines these hypotheses using a structural reward function that favors topological correctness over」というフレームワークとのこと。
プロジェクトサイトはVIPER-R1: Mimicking the Physicist’s Eye

A Survey on Large Language Model Benchmarks [45.0]
一般的な能力ベンチマークは、中核言語学、知識、推論などの側面をカバーする。ドメイン固有のベンチマークは、自然科学、人文科学、社会科学、エンジニアリング技術といった分野に焦点を当てている。ターゲット固有のベンチマークは、リスク、信頼性、エージェントなどに注意を払う。
論文参考訳（メタデータ） (Thu, 21 Aug 2025 08:43:35 GMT)
「We systematically review the current status and development of large language model benchmarks for the first time, categorizing 283 representative benchmarks into three categories: general capabilities, domain- specific, and target-specific.」とベンチマークのサーベイ
LLMの動きを広範に知るため様々なベンチマークが作られており、これら調査は非常にありがたい。