2025年4月1日 – arXiv最新論文の紹介

Can LLMs Automate Fact-Checking Article Writing?

Can LLMs Automate Fact-Checking Article Writing? [69.9]
我々は、一般的なファクトチェックパイプラインを拡張し、フルファクトチェック記事の自動生成の必要性を論じる。我々は,人間のファクトチェッカーの筆記ワークフローを模倣した LLM ベースのエージェントフレームワーク QRAFT を開発した。
論文参考訳（メタデータ） (Sat, 22 Mar 2025 07:56:50 GMT)
いわゆる普通のファクトチェックではなく「QRAFT as a multi-agent collaboration that mimics the factchecking article writing process of human experts」というフレームワークQRAFTの提案。
他手法よりも性能はよいものの「Our evaluation shows that while QRAFT outperforms several previously proposed text-generation approaches, it lags considerably behind expert-written articles.」というのは残念

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [51.3]
大規模言語モデル(LLM)は複雑なタスクにおいて顕著な機能を示した。 OpenAI o1とDeepSeek-R1の最近の進歩は、System-2推論ドメインのパフォーマンスをさらに改善した。
論文参考訳（メタデータ） (Thu, 20 Mar 2025 17:59:38 GMT)
overthinkingの防止、効率的な推論に関するサーベイ
リポジトリはGitHub – Eclipsess/Awesome-Efficient-Reasoning-LLMs

Survey on Evaluation of LLM-based Agents [28.9]
LLMベースのエージェントの出現は、AIのパラダイムシフトを表している。本稿では,これらのエージェントに対する評価手法に関する総合的な調査を初めて実施する。
論文参考訳（メタデータ） (Thu, 20 Mar 2025 17:59:23 GMT)
「We systematically analyze evaluation benchmarks and frameworks across four critical dimensions: (1) fundamental agent capabilities, including planning, tool use, self-reflection, and memory; (2) applicationspecific benchmarks for web, software engineering, scientific, and conversational agents; (3) benchmarks for generalist agents; and (4) frameworks for evaluating agents.」とエージェントの評価に関するサーベイ