2025年4月30日 – arXiv最新論文の紹介

(Im)possibility of Automated Hallucination Detection in Large Language Models [40.1]
大規模言語モデル(LLM)が生成する幻覚を自動的に検出する可能性を分析するための理論的枠組みを提案する。未知のターゲット言語から抽出された例に基づいて訓練されたアルゴリズムが、LLMの出力が正しいか、幻覚を構成するかを確実に判断できるかどうかを検討する。我々は、専門家ラベル付きフィードバックの使用、すなわち、正の例(誤記)と負の例(誤記)の両方で検出器を訓練することで、この結論を劇的に変えることを示した。
論文参考訳（メタデータ） (Wed, 23 Apr 2025 18:00:07 GMT)
ハルシネーションに関する報告で、「Automated detection of hallucinations by a detector that is trained only on correct examples (positive examples) is inherently difﬁcult and typically impossible without additional assumptions or signals.」、「Reliable automated hallucination detection is achievable when the detector is trained using both correct (positive) and explicitly labeled incorrect (negative) examples.」
論文中にも指摘のあるように「These ﬁndings underscore the critical role of human feedback in practical LLM training.」と今の構築過程と整合的（もっともhumanである必要性はあるのかはどうなるかわからないが・・・）

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials [70.1]
リッチなマルチモーダルWebチュートリアルから学習し,汎用GUIエージェントを構築するTongUIフレームワークを提案する。我々は、5つのオペレーティングシステムと200以上のアプリケーションにまたがる143Kトラジェクトリデータを含むGUI-Netデータセットを作成する。我々はGUI-Net上でQwen2.5-VL-3B/7Bモデルを微調整してTongUIエージェントを開発する。
論文参考訳（メタデータ） (Thu, 17 Apr 2025 06:15:56 GMT)
WEBチュートリアルを活用したデータセット構築とfine tuningによるエージェント開発
プロジェクトサイトはTongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials