arXiv最新論文の紹介

Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning

Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning [78.4]
Reflective Monte Carlo Tree Search (R-MCTS)は、AIエージェントの能力を高めるために設計された新しいテストタイムアルゴリズムである。 R-MCTSは1)従来のMCTSを拡張し、対照的な反射を取り入れ、エージェントは過去の相互作用から学ぶことができる。自己学習によりGPT-4oを微調整することでエージェントの性能を向上させる。
論文参考訳（メタデータ） (Wed, 02 Oct 2024 21:42:35 GMT)
「We propose Reflective Monte Carlo Tree Search (R-MCTS), an extension of classic MCTS that improves the agent’s decision making process on the fly by incorporating reflection over its past task executions, and state estimations using multi-agent-debate」というタイプのモンテカルロ木探索の提案と、それによるSFTでベンチマーク結果を改善。ToTや単純なMCTSより優れた結果。
リポジトリはjasonyux/RMCTS-self-learning · GitHub

Contextualized Data-Wrangling Code Generation in Computational Notebooks

Contextualized Data-Wrangling Code Generation in Computational Notebooks [131.3]
我々は、マルチモーダルなコンテキスト依存を明確にしたデータラングリングコード生成例をマイニングするために、CoCoMineという自動アプローチを提案する。コンテクスト化されたデータラングリングコード生成のための58,221のサンプルを含むデータセットであるCoCoNoteをNotebooksで構築する。実験結果は、データラングリングコード生成にデータコンテキストを組み込むことの重要性を示す。
論文参考訳（メタデータ） (Fri, 20 Sep 2024 14:49:51 GMT)
「Data wrangling involves cleaning, structuring, and enriching raw data into a desired format for further analysis [96], such as by removing duplicates, casting types, and extracting features [17].」のためのコード合成を目指したデータセット構築とそれを利用したDataCoderの提案。DataCoderのアーキテクチャが「Data Encoder」 + 「Code + Text Encoder」 +「 Decoder」という構成、よく見られるLLM baseなアーキテクチャでないことも興味深い。
リポジトリはGitHub – Jun-jie-Huang/CoCoNote: Source Code for ASE-24 paper “Contextualized Data-Wrangling Code Generation in Computational Notebooks”.

One missing piece in Vision and Language: A Survey on Comics Understanding

One missing piece in Vision and Language: A Survey on Comics Understanding [13.8]
この調査は、コミックインテリジェンスのためのタスク指向フレームワークを提案する最初のものである。データ可用性とタスク定義における重要なギャップに対処することで、将来の研究を導くことを目的としている。
論文参考訳（メタデータ） (Sat, 14 Sep 2024 18:26:26 GMT)
コミック理解のサーベイ。かなりの研究がなされており驚いた。。
リポジトリはGitHub – emanuelevivoli/awesome-comics-understanding: The official repo of the Comics Survey: “A missing piece in Vision and Language: A Survey on Comics Understanding”

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis [19.4]
大規模言語モデル(LLM)の探索技術は主に英語に焦点を合わせており、世界の言語の大部分を見下ろしている。複数のオープンソースのLCMモデルで実験を行い、探索精度、層間の傾向、および複数の言語に対する探索ベクトル間の類似性を解析した。
論文参考訳（メタデータ） (Sun, 22 Sep 2024 14:14:05 GMT)
多言語での動作解析、「(1) a consistent performance gap between high-resource and lowresource languages, with high-resource languages achieving significantly higher probing accuracy; (2) divergent layer-wise accuracy trends, where high-resource languages show substantial improvement in deeper layers similar to English; and (3) higher representational similarities among high-resource languages, with low-resource languages demonstrating lower similarities both among themselves and with high-resource languages.」とのこと
Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? – arXiv最新論文の紹介 (devneko.jp)でも思ったが、この手の動作解析はとても面白い。

Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs

Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs [117.7]
これは、疎結合なボックスアノテーションとトラッキングラベルのないビデオから学習する、初めてのセルフ教師付きトラッカーである。 Walker氏はMOT17、DanceTrack、BDD100Kで競争力を発揮する最初のセルフトラッカーである。
論文参考訳（メタデータ） (Wed, 25 Sep 2024 18:00:00 GMT)
「Remarkably, our proposal outperforms the previous self-supervised trackers even when drastically reducing the annotation requirements by up to 400x.」を主張するトラッキング手法。
リポジトリはGitHub – mattiasegu/walker: Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs (ECCV 2024)

Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models

Judgment of Thoughts: Courtroom of the Binary Logical Reasoning in Large Language Models [7.5]
本稿では,二項論理推論タスクに特化して設計された,素早い工学手法について述べる。この枠組みでは、裁判官、検察官、弁護士が、より信頼性が高く正確な推論を容易にするためにこの技術を利用する。実験結果から,本手法は既存手法よりも有意に優れていた。
論文参考訳（メタデータ） (Wed, 25 Sep 2024 05:28:05 GMT)
「JoT employs three roles—lawyer, prosecutor, and judge—to facilitate more reliable and accurate reasoning by the model.」という手法の提案
有効なタスクとそうでないタスクがあるよう。三審制とか取り入れると性能が上がったりするんやろうか。

A Survey of Foundation Models for Music Understanding

A Survey of Foundation Models for Music Understanding [60.8]
この研究は、AI技術と音楽理解の交差に関する初期のレビューの1つである。音楽理解能力に関して,近年の大規模音楽基盤モデルについて検討,分析,検証を行った。
論文参考訳（メタデータ） (Sun, 15 Sep 2024 03:34:14 GMT)
「This work, to our best knowledge, is one of the early reviews of the intersection of AI techniques and music understanding.」とのこと。非常に包括的なサーベイ。

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.6]
本研究では,乱数から発散する概念に触発された偏差に基づくキャリブレーション手法を導入し,プリトレーニングデータ検出のためのトークン確率のキャリブレーションを行う。我々は,中国語テキスト上でのLLMの検出手法の性能を評価するために,中国語のベンチマークであるPatentMIAを開発した。
論文参考訳（メタデータ） (Mon, 23 Sep 2024 07:55:35 GMT)
事前学習に何が使われたかを検知するタスクpretraining data detectionに関する手法DC-PDD およびベンチマークの提案。「The pretraining data detection problem can be viewed as an instance of the membership inference attack (MIA) task (Shokri et al , 2017), where the primary objective is to determine if a particular text was part of a target LLM’s training corpus.」
DC-PDD computes the divergence between the token probability distribution and the token frequency distribution for detection.とのこと。
リポジトリはGitHub – zhang-wei-chao/DC-PDD

Deep Graph Anomaly Detection: A Survey and New Perspectives

Deep Graph Anomaly Detection: A Survey and New Perspectives [86.8]
グラフ異常検出(GAD)は、異常なグラフインスタンス(ノード、エッジ、サブグラフ、グラフ)を特定することを目的とする。ディープラーニングアプローチ、特にグラフニューラルネットワーク(GNN)は、GADにとって有望なパラダイムとして現れています。
論文参考訳（メタデータ） (Mon, 16 Sep 2024 03:05:11 GMT)
GNNを用いた異常検知に関するサーベイ。
リポジトリはGitHub – mala-lab/Awesome-Deep-Graph-Anomaly-Detection: Official repository for survey paper “Deep Graph Anomaly Detection: A Survey and New Perspectives”, including diverse types of resources for graph anomaly detection.

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale [97.2]
LLMは、デジタル環境と対話し、特定の目的を完遂する自律エージェントとして機能する。デジタルタスクに対する大規模な直接的なデモが欠如していることもあって、正確性はまだ十分ではない。我々は、この間接的な知識を大規模に直接監督するアプローチであるSynatraを提案する。
論文参考訳（メタデータ） (Tue, 24 Sep 2024 00:51:45 GMT)
複雑なタスクを対象としてAgentがとるべき行動を合成するアプローチの提案。マニュアル等で「キーワードを入力する」と書かれているような曖昧な箇所をLLMで補間することが性能向上寄与するという話のよう。Agentの限界（人間との違い）を感じるとともに合成データの有効性、LLMの強力さを感じる。
「We use 100k such synthetically-created demonstrations to finetune a 7B CodeLlama, and demonstrate that the resulting agent surpasses all comparably sized models on three web-based task benchmarks Mind2Web, MiniWoB++ and WebArena, as well as surpassing GPT-3.5 on WebArena and Mind2Web.」と有効性を確認。「In addition, while synthetic demonstrations prove to be only 3% the cost of human demonstrations (at $0.031 each), we show that the synthetic demonstrations can be more effective than an identical number of human demonstrations collected from limited domains.1」コストパフォーマンスも優れる。
リポジトリはSynatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale (oootttyyy.github.io)

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31