2025年6月26日 – arXiv最新論文の紹介

CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios

CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios [30.2]
大規模な言語モデルが外部ツールを利用する能力により、ますます多様なタスクに対処できるようになった。タスクがより複雑で長期的になると、複雑なツール利用プロセスが様々な予期せぬエラーを引き起こす可能性がある。このようなエラーの特定、診断、回復など、効果的に対処する方法が、ツール学習を進める上で重要な研究方向として現れている。
論文参考訳（メタデータ） (Wed, 11 Jun 2025 17:59:18 GMT)
「ICTOOL, the first self-critique evaluation benchmark for tool utilization of LLMs. Distinct from prior result-oriented evaluation methods, we categorize error patterns more finely and evaluate models from multiple perspectives, enabling a deeper exploration of LLMs’ tool-use capabilities in errorprone scenarios.」というベンチマーク。最新モデルでの結果が気になるところ。
リポジトリはGitHub – Shellorley0513/CriticTool

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index [124.7]
Infini-gram miniはペタバイトレベルのテキストコーパスを検索可能にするスケーラブルなシステムである。私たちは128コアのCPUノードで、50日間で46TBのインターネットテキストをインデックスします。 Infini-gram miniのベンチマーク汚染の大規模解析における重要な利用例を示す。
論文参考訳（メタデータ） (Fri, 13 Jun 2025 21:13:57 GMT)
大規模データのインデックス化に関する報告。このインデックスを用いて各種ベンチマークの汚染度を計算している（Benchmark Contamination Monitoring System – a Hugging Face Space by infini-gram-mini）。今までも指摘されていたことだが、信頼性に疑問がでてくるものもありそう。
プロジェクトサイトはHome | infini-gram-mini、リポジトリはGitHub – xuhaoxh/infini-gram-mini

Institutional Books 1.0: A 242B token dataset from Harvard Library’s collections, refined for accuracy and usability [1.3]
Institutional Books 1.0は、2006年からHarvard LibraryのGoogle Booksプロジェクトへの参加を通じてデジタル化されたパブリックドメインブックのコレクションである。ハーバード図書館で作業し、これらの論文を抽出し、分析し、処理し、歴史文書の広範囲に記録されたデータセットにしました。この分析は、当初250以上の異なる言語で書かれた1,075,899巻に及ぶ、約250億個のトークンをスキャンしたハーバード図書館のコレクション全体をカバーしている。
論文参考訳（メタデータ） (Tue, 10 Jun 2025 00:11:30 GMT)
「OCR-extracted text (original and post-processed) as well as the metadata (bibliographic, source, and generated) of the 983,004 volumes, or 242B tokens, identified as being in the public domain have been made available.」という大規模データ
データセットはinstitutional/institutional-books-1.0 · Datasets at Hugging Face、リポジトリはGitHub – instdin/institutional-books-1-pipeline: The Institutional Data Initiative’s pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.