2025年6月5日 – arXiv最新論文の紹介

Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation

Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation [108.1]
本稿では,RAG出力における幻覚検出の新しい手法であるFRANQ(Fithfulness-based Retrieval Augmented Uncertainty Quantification)を紹介する。本稿では,事実性と忠実性の両方に注釈を付したQAデータセットを提案する。
論文参考訳（メタデータ） (Tue, 27 May 2025 11:56:59 GMT)
RAGのためのUncertainty Quantification (UQ)手法、FRANQ (Faithfulness-based Retrieval Augmented UNcertainty Quantifica- tion)の提案

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation [38.6]
我々は32Kの実世界の画像質問対の総合的なベンチマークであるHumaniBenchを紹介する。 HumaniBenchは、公正性、倫理、理解、推論、言語の傾き、共感、堅牢性を含む7つのHuman Centered AI(HCAI)の原則を評価している。
論文参考訳（メタデータ） (Fri, 16 May 2025 17:09:44 GMT)
「HumaniBench probes seven HCAI principles—fairness, ethics, understanding, reasoning, language inclusivity, empathy, robustness—through seven diverse tasks that mix open- and closed-ended visual question answering (VQA), multilingual QA, visual grounding, empathetic captioning, and robustness tests.」というベンチマーク。商用モデルが優れた結果を出しているが、個別要素ではオープンなモデルが高スコアの場合もある。
プロジェクトサイトはHumaniBench: A Human-Centric Benchmark for Large Multimodal Models Evaluation

MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback [128.3]
本研究では,事前試験の結果に基づいて仮説を優先順位付けすることを目的とした,実験誘導ランキングの課題について紹介する。本稿では,3つのドメインインフォームド仮定に基づいて,仮説性能を既知の基底的真理仮説に類似した関数としてモデル化するシミュレータを提案する。実験結果を用いて,124の化学仮説のデータセットをキュレートし,シミュレーションの有効性を検証した。
論文参考訳（メタデータ） (Fri, 23 May 2025 13:24:50 GMT)
「a systematic framework for experiment-guided hypothesis ranking in chemistry」に対するデータセットの作成と手法の提案。有望そうな結果になっているのがすごい・・・
リポジトリはGitHub – wanhaoliu/MOOSE-Chem3