2025年4月15日 – arXiv最新論文の紹介

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations [45.6]
我々はScholarCopilotを紹介した。ScholarCopilotは学術書記のための既存の大規模言語モデルを強化するために設計された統合フレームワークである。 ScholarCopilotは、検索トークン[RET]を生成して学術的な参照をいつ取得するかを決定し、その表現を利用してデータベースから関連する引用を検索する。効率を上げるために、単一のフレームワーク内で生成タスクと引用タスクの両方を共同で最適化します。
論文参考訳（メタデータ） (Tue, 01 Apr 2025 14:12:14 GMT)
学術論文のためのLLM、「ScholarCopilot dynamically interleaves retrieval and generation by producing retrieval tokens ([RET]) based on current context, enabling context-aware citation retrieval and optional user refinement.」というRETという特殊なトークンを用いた動作が特徴的

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement [100.9]
ThinkLite-VLはQwen2.5-VL-7Bインストラクションの平均性能を7%向上させる。私たちのコード、データ、モデルはhttps://github.com/si0wang/ThinkLite-VL.orgで公開されています。
論文参考訳（メタデータ） (Thu, 10 Apr 2025 17:49:05 GMT)
効率のよいVision-Languageモデルの推論強化方法の提案。「Our model achieves SoTA performance using only 11k data, and without any additional knowledge distillation.」と使用データが少ない。カギはデータ品質とのこと「Our key insight highlights the critical importance of selecting genuinely challenging examples for Reinforcement Fine-Tuning (RFT).」
リポジトリはGitHub – si0wang/ThinkLite-VL

Towards Trustworthy GUI Agents: A Survey [64.6]
本調査では,GUIエージェントの信頼性を5つの重要な次元で検証する。敵攻撃に対する脆弱性、シーケンシャルな意思決定における障害モードのカスケードなど、大きな課題を特定します。 GUIエージェントが普及するにつれて、堅牢な安全基準と責任ある開発プラクティスを確立することが不可欠である。
論文参考訳（メタデータ） (Sun, 30 Mar 2025 13:26:00 GMT)
GUIエージェントの信頼性に関するサーベイ。整理軸は「Security」、「Reliability」、「Explainability」、「Ethical Alignment」、「Evaluation methodologies」

DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning [39.1]
文書画像のセグメンテーションは、文書解析と認識に不可欠である。既存のメソッドはこれらのタスクを別々に処理し、その結果、一般化とリソースの浪費が制限される。本稿では,様々な文書画像セグメンテーションタスク用に設計されたトランスフォーマーベースの統合フレームワークであるDocSAMを紹介する。
論文参考訳（メタデータ） (Sat, 05 Apr 2025 07:14:53 GMT)
MLLM全盛の現状でも重要なDocument image segmentationについて「DocSAM integrates layout analysis, multi-grained text segmentation, and table structure decomposition into a single model, reducing the need for specialized models and enhancing efficiency.」という手法の提案。
リポジトリはGitHub – xhli-git/DocSAM