Synthetic data – arXiv最新論文の紹介

Distilling On-device Language Models for Robot Planning with Minimal Human Intervention

Distilling On-device Language Models for Robot Planning with Minimal Human Intervention [117.9]
PRISMは、SLM(Small Language Model)対応ロボットプランナーを蒸留するためのフレームワークである。 PRISMを3つのLCM対応プランナーに適用し、マッピング、探索、操作、家事支援を行う。 GPT-4o の 10-20% から 93% 以上まで, PRISM は Llama-3.2-3B の性能を向上することを示した。
論文参考訳（メタデータ） (Fri, 20 Jun 2025 21:44:27 GMT)
robot planningを対象とした「Given a source LLM-enabled planner, PRISM synthesizes tasks and environments, elicits plans from the LLM-enabled planner in these synthesized environments, and then uses the resulting data to train an SLM-enabled planner that serves as a drop-in replacement for the source model.」という蒸留フレームワークの提案。直観的にも有効そうだが実際有望な結果。
プロジェクトサイトはPRISM

What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning [

What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning [22.4]
LLM生成データの多様性レベルが下流モデルの性能にどのように影響するかを示す。また、LLM生成データの異なる割合を混合したデータに基づいて訓練されたモデルの性能についても検討する。
論文参考訳（メタデータ） (Tue, 24 Jun 2025 02:44:58 GMT)
合成データが与える影響に関する報告。特に多様性の度合いに注目している。
「Our experimental results show that, with minimal distribution shift, moderately diverse LLM-generated data can enhance model performance in scenarios with insufficient labeled data, whereas highly diverse generated data has a negative impact.」とのこと。

Self-Adapting Language Models

Self-Adapting Language Models [44.5]
大規模言語モデル(LLM)は強力だが静的であり、新しいタスクや知識、例に対応して重みを適応するメカニズムが欠如している。我々は,自己適応型LSM(Self-Adapting LLMs, SEAL)を導入する。知識の定式化と数ショットの一般化の実験により、SEALは自己指向適応が可能な言語モデルに向けた有望なステップであることが示された。
論文参考訳（メタデータ） (Thu, 12 Jun 2025 17:48:13 GMT)
「We propose Self-Adapting LLMs (SEAL), a framework that enables language models to improve themselves by generating their own synthetic data and optimization parameters (“self-edits”) in re- sponse to new data. The model is trained to produce these self-edits directly through token generation with the data provided in the model’s context. Self-edit generation is learned via reinforcement learning (RL) where the model is rewarded for generating self-edits (SE) that, when applied, improve the model’s performance at the target task.」という自己適合、自己進化、自己改善のアプローチ。SQuADやARC-AGI benchmark（のサブセット）を用いて効果を検証している。
合成データを介しての自己改善はやはり有効そうという印象。（今でも一定実用的であると思うが）AGIとかいう世界観を考えると時間的制約が解消できるかがポイントだろうか。（AIにも睡眠が必要と言いつつこの手の処理を行うような少し未来が妄想される）
プロジェクトサイトはSelf-Adapting Language Models

Self-Adapting Improvement Loops for Robotic Learning [30.8]
専門家によるデモンストレーションで訓練されたビデオ生成モデルは、ロボットタスクを解くためのパフォーマンスの高いテキスト条件付きビジュアルプランナーとして利用されてきた。本研究では,自己生成トラジェクトリ上で,ドメイン内ビデオモデルを反復的に更新する自己改善ループ(SAIL)を提案する。従来のドメイン内ビデオモデルトレーニングでは,新規タスクの繰り返しに対して,パフォーマンスが継続的に向上することが確認できた。
論文参考訳（メタデータ） (Sat, 07 Jun 2025 04:34:37 GMT)
「we highlight that adaptation with large-scale pretrained text-conditioned video models is critical for facilitating self-improvement, by contributing text-conditioned generalization capabilities and motion priors.」とこちらは動画生成モデルを活用するアプローチ。
プロジェクトサイトはSAIL

Self-Challenging Language Model Agents

Self-Challenging Language Model Agents [98.6]
本稿では,エージェントが自ら生成する高品質なタスクについて,エージェントを訓練するためのセルフチェンジフレームワークを提案する。このフレームワークは、Llama-3.1-8B-Instructの2倍の改善を実現している。
論文参考訳（メタデータ） (Mon, 02 Jun 2025 14:23:33 GMT)
「we present the Self-Challenging Agent (SCA) method for self-improvement of general multi-turn tool-use LLM agents. SCA can create its own tasks to challenge itself and learn from them. To do this, it utilizes the Code-as-Task (CaT) formulation which ensures high quality synthetic tasks. Through RL on these self-generated synthetic tasks, SCA can be used to train a Llama-3.1-8B model to achieve an average relative success rate improvement of 95.8% on existing test tasks across four different multi-turn tool-use environments.」とのこと。。。AGIに近づいている感のある未来を感じる報告。（「While SCA serves as a preliminary step, there remains many research questions for building an effective self-improvement flywheel for general LLM agents.」とあるとおり、実態上はまだいろいろ壁はあるのだろうが）
コード生成を効果的に使っているのも興味深いが、形式言語で表されるようなタスクは解ける段階というのは意外と早く来るのだろうか。。。

OpenThoughts: Data Recipes for Reasoning Models

OpenThoughts: Data Recipes for Reasoning Models [215.2]
OpenThoughtsプロジェクトは、推論モデルをトレーニングするためのオープンソースのデータセットを作成することだ。 OpenThoughts2-1Mデータセットは、公開推論データに基づいてトレーニングされた最初のモデルであるOpenThinker2-32Bに導かれた。 OpenThinker3-7Bモデル。
論文参考訳（メタデータ） (Wed, 04 Jun 2025 17:25:39 GMT)
LRM構築のためのオープンデータセット。データ拡張の方向性としても参考になる。
プロジェクトサイトはOpen Thoughts

SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis

SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis [90.0]
Retrieval-augmented Generation (RAG) システムは複雑なディープ検索シナリオにおいて高度な大規模言語モデル(LLM)を持つ。既存のアプローチでは、高品質なトレーニングトラジェクトリが欠如し、分散ミスマッチに苦しむ、重要な制限に直面しています。本稿では,複雑なトレーニングパラダイムではなく,戦略的データエンジニアリングによるギャップを埋めるフレームワークであるSimpleDeepSearcherを紹介する。
論文参考訳（メタデータ） (Thu, 22 May 2025 16:05:02 GMT)
「Our approach synthesizes high-quality training data by simulating realistic user interactions in live web search environments, coupled with a multi-criteria curation strategy that optimizes the diversity and quality of input and output side.」、小規模なデータでも改善幅が大きいとのこと。
プロジェクトサイトはGitHub – RUCAIBox/SimpleDeepSearcher: SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis

DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories

DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories [120.3]
DreamGenは、ニューラルトラジェクトリを通じて行動や環境を一般化するロボットポリシーをトレーニングするためのパイプラインだ。私たちの研究は、手作業によるデータ収集を超えて、ロボット学習をスケールするための、有望な新たな軸を確立します。
論文参考訳（メタデータ） (Mon, 19 May 2025 04:55:39 GMT)
「This pipeline is designed to be general-purpose across different robots, environments, and tasks. (1) We fine-tune video world models on a target robot to capture the dynamics and kinematics of the specific embodiment; (2) we prompt the model with pairs of initial frames and language instructions to generate large volumes of robot videos, capturing both familiar behaviors from fine-tuning and novel ones in unseen settings; (3) we then extract pseudo-actions using either a latent action model [13] or an inverse dynamics model (IDM)[14]; (4) finally, we use the resulting video-action sequence pairs, dubbed neural trajectories, for training downstream visuomotor policies.」と動画生成モデルを活用したデータ合成手法の提案。イメージトレーニングのようで面白い。
プロジェクトサイトはDreamGen

Anyprefer: An Agentic Framework for Preference Data Synthesis

Anyprefer: An Agentic Framework for Preference Data Synthesis [62.4]
ターゲットモデルを調整するための高品質な嗜好データを合成するフレームワークであるAnypreferを提案する。審査員モデルの応答を正確に評価するために、外部ツールが導入される。合成されたデータは、58Kの高品質な選好ペアからなる新しい選好データセットであるAnyprefer-V1にコンパイルされる。
論文参考訳（メタデータ） (Sun, 27 Apr 2025 15:21:59 GMT)
「To address the challenges of synthesizing high-quality preference data, we propose an automatic framework called Anyprefer, which models the preference data synthesis process as a two-player cooperative Markov game.」というAgenticなデータ合成フレームワークの提案。

DeepCritic: Deliberate Critique with Large Language Models

DeepCritic: Deliberate Critique with Large Language Models [77.6]
我々は,Large Language Models(LLMs)の数学批判能力の研究と向上に焦点をあてる。 Qwen2.5-7B-Instructをベースとした批判モデルを開発した。
論文参考訳（メタデータ） (Thu, 01 May 2025 17:03:17 GMT)
Deepな批評を行うモデルの提案。「In Stage 1, we first utilize Qwen2.5-72B-Instruct to generate an initial step-wise critique for each step in the solution, followed by an in-depth critique of the initial critique.」、「In Stage 2, we perform RL to the SFT model on either existing human-annotated data or auto-labeled data via Monte Carlo sampling-based correctness estimation, to further stimulate the critique ability of the critic.」の2ステージ構成で構築。Criticモデルは他のモデル出力の修正にも有効なことが知られているが「our 7B critique model is also capable of supervising and correcting the outputs of a 72B generator, demonstrating a potential of weak-to-strong supervision」は興味深い。
リポジトリはGitHub – RUCBM/DeepCritic: Official repository for paper “DeepCritic: Deliberate Critique with Large Language Models”

ReasonIR: Training Retrievers for Reasoning Tasks

ReasonIR: Training Retrievers for Reasoning Tasks [139.5]
ReasonIR-8Bは一般的な推論タスクのために特別に訓練された最初のレトリバーである。新たに29.9 nDCG@10をリランカなしで、36.9 nDCG@10をリランカで達成している。
論文参考訳（メタデータ） (Tue, 29 Apr 2025 09:49:28 GMT)
合成データを活用し「We trained REASONIR-8B by fine-tuning LLAMA3.1-8B (Touvron et al , 2023) on a combination of public datasets and the synthetic data generated by REASONIR-SYNTHESIZER.」と構築された bi-encoder retrieverの提案。このような手法を用いてなお、BM25とのハイブリッドが有効という点も興味深い。
リポジトリはGitHub – facebookresearch/ReasonIR: Official repository for paper “ReasonIR Training Retrievers for Reasoning Tasks”.、reasonir/ReasonIR-8B · Hugging Face

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31