fine tuning – arXiv最新論文の紹介

Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence

Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence [131.4]
タスク認識方式でアダプタを初期化する新しい手法であるコンテキスト指向分解適応(CorDA)を提案する。本手法は,タスク認識により,知識保存モード (KPM) と命令レビューモード (IPM) の2つのオプション適応モードを実現する。
論文参考訳（メタデータ） (Mon, 16 Jun 2025 07:55:14 GMT)
knowledge-preserved mode (KPM) 、instruction- previewed mode (IPM)の導入、結果「Experimental results demonstrate that our method in KPM outperforms LoRA not only in downstream performance but also in maintaining zero-shot capabilities for both large language models and vision language models. Meanwhile, the IPM exhibits superior fine-tuning performance and faster convergence in both standard and quantized adaptation across various tasks.」とのこと。
peft/examples/corda_finetuning at main · huggingface/peft · GitHubにサンプルがある

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy [48.3]
強い推論モデルの開発において,教師付き微調整(SFT)と強化学習(RL)の相乗効果について検討した。スケーリング戦略は推理性能に顕著な改善をもたらします我々のAceReason-Nemotron-1.1 7Bモデルは、Qwen2.5-7Bに基づく推論モデルにおいて、AceReason-Nemotron-1.0と新しい最先端性能を著しく上回っている。
論文参考訳（メタデータ） (Mon, 16 Jun 2025 09:27:48 GMT)
LRM開発において重要なSFTとRLの関係を検証した論文。「Our results show that both scaling strategies substantially improve the reasoning abilities of large language models (LLMs).」とのこと。
「Interestingly, even strong SFT models with robust coding abilities benefit substantially from math-only RL training. This leads to further gains in coding performance.」のように隣接領域（？）での性能向上は、この分野だと色々なところで見られて興味深い性質だと思っている。
リポジトリはnvidia/AceReason-Nemotron-1.1-7B · Hugging Face

$\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge

$\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge [6.1]
我々は,複数のドメインにまたがる仮説的かつ妥当なニュースからなるデータセットである$textitNew News$を紹介した。我々は,文脈を伴わないモデルから知識を抽出し,文脈を伴わないモデルの重みに組み込むための,セルフプレイデータ生成プロトコルのスイートを探索する。以上の結果から,Sys2-FTの自己QAプロトコルは,モデルによるニュースの重み付け学習を大幅に改善することが示された。
論文参考訳（メタデータ） (Sat, 03 May 2025 12:49:35 GMT)
ICLとFTのギャップに関する分析とSys2-FTという手法の提案。「Our results demonstrate that the self-QA protocol of Sys2-FT significantly improves models’ in-weight learning of the news.」とのこと。
ICLとFTの差異はとても興味深いし実用上も重要。

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training [127.5]
ファウンデーションモデルでは、教師付き微調整(SFT)と強化学習(RL)がポストトレーニング技術として広く使われている。本稿では,一般化と記憶におけるSFTとRLの違いについて検討する。 RLは、特に結果に基づく報酬で訓練された場合、ルールベースのテキストと視覚的バリエーションの両方で一般化されることを示す。
論文参考訳（メタデータ） (Tue, 28 Jan 2025 18:59:44 GMT)
まさに今知りたい情報という感じの論文、「Through extensive experiments on the GeneralPoints and V-IRL tasks, we demonstrated that RL exhibits superior performance in learning generalizable knowledge, while SFT tends to merely memorize the training data, across both the rule and visual variations.」とのこと。
上記に加え、「SFT is necessary for RL training when the backbone model does not follow instructions.」はとても興味深い。基礎性能によって効果的なトレーニング方針が異なるというのは他の事例でもよく見られる印象があり（直感的にもそうだろうとも思い）、このあたりは重要なノウハウでありそう。
プロジェクトサイトはSFT Memorizes, RL Generalizes

Knowledge Injection via Prompt Distillation

Knowledge Injection via Prompt Distillation [48.7]
本稿では,新しい知識を学習するための新しい微調整手法を提案し,RAGの性能に到達できることを示す。提案手法は, 急速蒸留と呼ばれる自己蒸留法に基づいている。
論文参考訳（メタデータ） (Thu, 19 Dec 2024 15:44:01 GMT)
LLMにない知識を用いる場合はRAGを利用することが多いが、それと同様の性能を発揮できるfine tuning手法、 prompt distillation の提案。RAGと組み合わせることも可能とのこと。

Predicting Emergent Capabilities by Finetuning

Predicting Emergent Capabilities by Finetuning [99.0]
微調整された言語モデルでは,出現頻度の低いモデルに展開するスケーリングのポイントをシフトできることがわかった。提案手法は4つの標準NLPベンチマークを用いて検証する。いくつかのケースでは、最大4倍の計算でトレーニングされたモデルが出現したかどうかを正確に予測できる。
論文参考訳（メタデータ） (Mon, 25 Nov 2024 01:48:09 GMT)
「we found that our specific emergence prediction approach (e g , emergence law) can accurately predict the point of emergence up to 4x the FLOPS in advance, representing meaningful progress on the challenging unsolved problem of emergence prediction.」とのこと。
fine tuningでどこまでいけるか？を知りたい状況は多いので有用な研究（だが、現時点で実用的かはやや疑問）

LoRA vs Full Fine-tuning: An Illusion of Equivalence

LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.1]
本研究では, 異なる微調整法が, スペクトル特性のレンズを用いてモデルの重み行列を解析することにより, 事前学習モデルを変化させる方法について検討した。単一値分解が全く異なる構造を持つ全微調整およびLoRA収量行列が得られた。イントルーダ次元がLoRAの微調整モデルになぜ現れるのか、なぜそれらが望ましくないのか、そしてどのようにしてその効果を最小化できるかを検討することで結論を下す。
論文参考訳（メタデータ） (Mon, 28 Oct 2024 17:14:01 GMT)
LoRAで得られたWeightとファインチューニングで得られたWeightの差異を分析、「More specifically, we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call intruder dimensions. Intruder dimensions do not appear during full fine-tuning. Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially. ：とのこと。
興味深い性質であると思うのと、頑健性を評価するのは大変なので問題が見過ごされやすそうなのが若干怖い。

Empirical Insights on Fine-Tuning Large Language Models for Question-Answering

Empirical Insights on Fine-Tuning Large Language Models for Question-Answering [50.1]
大規模言語モデル(LLM)は、大量のデータセットの事前トレーニングを通じて、広範囲な世界の知識を符号化する。我々は,事前学習したLLMが記憶する知識の量に基づいて,教師付き微調整(SFT)データを分類した。実験の結果,SFTの段階では60個のデータポイントが事前学習中に符号化された知識を活性化することができ,LLMがQAタスクを実行できることがわかった。
論文参考訳（メタデータ） (Tue, 24 Sep 2024 07:38:38 GMT)
「To our surprise, we find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes.」、「What really hurts the accuracy is the discrepant logit scales between the fine-tuning classes and the other classes, implying that a simple post-processing calibration would bring back the pre-trained model’s capability and at the same time unveil the feature improvement over all classes.」という指摘。
リポジトリはGitHub – OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated

The representation landscape of few-shot learning and fine-tuning in large language models

The representation landscape of few-shot learning and fine-tuning in large language models [43.8]
In-context Learning (ICL) と supervised Fine-tuning (SFT) は、現代の大規模言語モデル (LLM) の性能向上のための2つの一般的な戦略である。この2つの事例において,隠れた表現の確率的景観を解析した。 ICLとSFTは、どちらもネットワークの中央で急激な遷移を行う場合において、非常に異なる内部構造を生成する。
論文参考訳（メタデータ） (Thu, 5 Sep 2024 16:15:12 GMT)
ICLとSFTの動作の差の分析、「we compare how LLMs solve the same question-answering task, finding that ICL and SFT create very different internal structures, in both cases undergoing a sharp transition in the middle of the network.」とのことで挙動がかなり異なるよう。

PEFTのサーベイ

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey [57.5]
Efficient Fine-Tuning (PEFT) は、様々な下流タスクに対して大きなモデルを効率的に適応することで、実用的なソリューションを提供する。 PEFTは、事前訓練された大規模モデルのパラメータを調整して特定のタスクに適応させ、追加のパラメータや計算リソースの数を最小限にするプロセスを指す。この調査は、PEFTアルゴリズムとそのシステム実装の両方を理解することを目的とした研究者にとって必須のリソースであり、最近の進歩と実用化に関する詳細な知見を提供する。
論文参考訳（メタデータ） (Thu, 21 Mar 2024 17:55:50 GMT)
PEFTのサーベイ
非常に多くの研究成果が出ている領域であり、ほんとうにありがたい

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31