YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models

YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models [36.1]
我々はYuFeng-XGuardについて紹介する。YuFeng-XGuardは大規模言語モデル(LLM)のための論理中心ガードレールモデルである。 YuFeng-XGuardは不透明な二項判定を生成する代わりに、明確なリスクカテゴリや信頼性スコアを含む構造化されたリスク予測を生成する。リスク認識を政策執行から切り離す動的政策機構を導入し、モデルの再訓練なしに安全政策を調整できるようにした。
論文参考訳（メタデータ） (Thu, 22 Jan 2026 02:23:18 GMT)
「Instead of producing opaque binary judgments, YuFeng-XGuard generates structured risk predictions, including explicit risk categories and configurable confidence scores, accompanied by natural language explanations that expose the underlying reasoning process.」と詳細を出してくれるガードレール。
モデルはAlibaba-AAIG/YuFeng-XGuard-Reason-8B · Hugging Face

コメントを残す

コメントを残す コメントをキャンセル