2025年8月14日 – arXiv最新論文の紹介

CoAct-1: Computer-using Agents with Coding as Actions

CoAct-1: Computer-using Agents with Coding as Actions [95.0]
CoAct-1はGUIベースの制御と直接プログラム実行を組み合わせた新しいマルチエージェントシステムである。我々は、CoAct-1が60.76%の最先端の成功率を達成したOSWorldベンチマークで、我々のシステムを評価した。
論文参考訳（メタデータ） (Tue, 05 Aug 2025 21:33:36 GMT)
「CoAct-1 features an Orchestrator that dynamically delegates subtasks to either a conventional GUI Operator or a specialized Programmer agent, which can write and execute Python or Bash scripts. This hybrid approach allows the agent to bypass inefficient GUI action sequences for tasks like file management and data processing, while still leveraging visual interaction when necessary.」とコード生成をうまく使うGUIエージェントの提案。OS WorldでSoTAを主張。
プロジェクトサイトはCoAct-1

MLP Memory: Language Modeling with Retriever-pretrained External Memory [26.0]
そこで本研究では,事前学習可能な外部メモリを用いてデコーダから切り離すことを提案する。私たちのアーキテクチャは、下流のタスクに強い難易度とパフォーマンスを示します。 3つの幻覚ベンチマークと9つのメモリ集約タスクにおいて優れた性能を示す。
論文参考訳（メタデータ） (Sun, 03 Aug 2025 16:40:53 GMT)
「In this work, we propose an external memory for LLM that is pretrained to mimic a retriever on the entire pretraining dataset. Specifically, following the RAG setting in kNN-LM [27], this memory learns to map the LLM hidden state at a certain step to a vocabulary distribution matching the output of the kNN retriever. During inference, the LLM’s native output is interpolated with the retriever-pretrained output from the external memory.」と記憶（知識）部分を切り離したアーキテクチャの提案
これがうまく動作するのであれば面白いなと思う一方で、知識と思考が切り離せるのかはやや疑問で思考・生成部分への影響が気になるところ。

Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges [22.1]
本稿では,表型入力表現の分類と表理解タスクの導入を通じて,重要な概念を紹介する。テーブルは2次元であり、構造化されたデータベーステーブルから複雑な多層スプレッドシートまで、それぞれ異なる目的を持った形式を含んでいる。我々は、さらなる研究の必要性を示す分野におけるいくつかの重要なギャップを強調している。
論文参考訳（メタデータ） (Thu, 31 Jul 2025 23:41:31 GMT)
LLMによるテーブルデータ取り扱いのサーベイ