2025年12月24日 – arXiv最新論文の紹介

Are We on the Right Way to Assessing LLM-as-a-Judge?

Are We on the Right Way to Assessing LLM-as-a-Judge? [16.3]
人間のアノテーションを必要とせずにLCM審査員の質を評価する新しい評価スイートであるSageを紹介する。合理的選択理論の公理に触発されたセージは、LLM-as-a-Judgeを測定するための2つの新しいレンズ(局所的な自己整合性と大域的な論理的整合性)を導入した。 Sage に基づいて,現在最先端の LLM は,スコアリングとペアワイズの両方において,審査員として機能する場合に,重大な信頼性上の問題があることを明らかにした。
論文参考訳（メタデータ） (Wed, 17 Dec 2025 23:49:55 GMT)
LLM-as-a-Judgeを評価するベンチマークの提案、および「Our experiments reveal significant robustness deficiencies in current state-of-the-art models. We attribute these inconsistent judgments to a newly identified phenomenon called situational preference where models fail to maintain a stable internal gauging principle across different contexts. To address this, we demonstrate that implementing self-generated rubrics effectively mitigates situational preference and boosts judgment consistency. We also investigate the impact of fine-tuning and explanatory reasoning on evaluation performance.」との指摘。
リポジトリはEntroplay.ai

The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text [101.7]
Worldcanvasは、リッチでユーザ指向のシミュレーションを可能にする、プロンプト可能なワールドイベントのためのフレームワークである。表現力のある世界イベント生成をサポートすることで、Worldcanvasは、受動的予測器からインタラクティブなユーザ形状のシミュレータまで、世界モデルを前進させる。
論文参考訳（メタデータ） (Thu, 18 Dec 2025 18:59:59 GMT)
「World models [3, 12, 15, 22, 38, 46] are unlocking their true potential, evolving from passive simulators into interactive canvases for creation. A landmark step in this evolution is the introduction of “promptable world events,” a concept pioneered by models like Genie 3 [3], which transforms the world model into an interactive canvas where text prompts can trigger significant environmental changes.」という前提のもと、「By enabling users to precisely specify what, when, where, and who through intuitive motion trajectories, natural language and ref images, our approach supports semantic actions, complex interactions, object entry/exit and reference- guided appearance.」が可能なモデルを構築。
プロジェクトサイトはThe World is Your Canvas

OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations [35.4]
OnCoCo 1.0は、オンラインカウンセリングにおけるきめ細かいメッセージ分類のための新しいパブリックデータセットである。これは、精神社会的オンラインカウンセリング会話の自動分析を改善するために設計された、新たな統合されたカテゴリシステムに基づいている。
論文参考訳（メタデータ） (Wed, 10 Dec 2025 16:18:20 GMT)
「Contribution With this publication we introduce OnCoCo 1.0 (Online Counseling Conversations), a new bi-lingual dataset (German and English) for rich content analysis in psychosocial online coun- seling. Our dataset extends current conversational corpora by providing a detailed and ethically cu- rated dataset for bilingual counseling contexts.」というオンラインカウンセリングのデータセット。かなり珍しいように思う。
リポジトリはGitHub – th-nuernberg/oncoco_v1_dataset: OnCoCo 1.0 Dataset for Classification of Psycho-social Counseling Messages