model merging – arXiv最新論文の紹介

Model Merging for Knowledge Editing

Model Merging for Knowledge Editing [53.8]
大規模言語モデル(LLM)は、世界が進化するにつれて正確で現在の知識を維持するために継続的な更新を必要とする。既存の知識編集アプローチは知識更新のための様々なソリューションを提供するが、しばしば連続的な編集シナリオに苦労する。本稿では,頑健な教師付き微調整(R-SFT)とモデルマージを組み合わせた2段階のフレームワークを提案する。
論文参考訳（メタデータ） (Sat, 14 Jun 2025 07:42:39 GMT)
SFTとmodel mergeによるknowledge editing
リポジトリはGitHub – Applied-Machine-Learning-Lab/MM4KE

Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [104.0]
モデルマージは、複数のエキスパートモデルを単一のモデルにまとめることを目的としており、ストレージとサービスコストを削減している。これまでの研究は主に、コードと数学のタスクに視覚分類モデルやLLM(Large Language Models)を統合することに焦点を当ててきた。本稿では,VQA,Geometry,Chart,OCR,Gundingといった複数のタスクを含むMLLMのモデルマージベンチマークを紹介する。
論文参考訳（メタデータ） (Mon, 26 May 2025 12:23:14 GMT)
マルチモーダルなモデルマージに関するベンチマークの紹介。
リポジトリはGitHub – WalkerWorldPeace/MLLMerging: Official implementation of “Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging”.

An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging [12.1]
本稿では,言語固有の大規模言語モデル(LLM)の推論能力の向上を目的とする。 DeepSeek R1は推論に優れていますが、主に英語や中国語のような高リソース言語にメリットがあります。低リソース言語は、英語中心のトレーニングデータとモデル最適化の優位性のため、いまだに保存されていない。
論文参考訳（メタデータ） (Thu, 13 Feb 2025 08:10:45 GMT)
LLMの推論能力を高めるためのモデルマージ+SFT、「We demonstrate that, with only publicly available datasets and a computational budget of $120, it is possible to enhance the reasoning capabilities of language-specific LLMs to match the level of DeepSeek R1, without compromising their performance on target language tasks.」とのこと
Qwen2.5とDeepSeek R1を利用した日本語大規模言語モデル「Qwen2.5 Bakeneko 32B」シリーズを公開｜rinna株式会社でも近いアプローチをとっているように見える。

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging [102.2]
汎用言語モデルを新しいスキルに適用することは、現在、高価なプロセスである。既存のモデルに新たなスキルを付加する効果について,新たなスキルを単独で訓練し,その後一般モデルとマージすることによって検討した。
論文参考訳（メタデータ） (Wed, 16 Oct 2024 18:23:50 GMT)
「As training datasets targeting new skills are constructed, it is an open question how best to patch preexisting models to incorporate the new skills represented by those datasets.」という状況での「continued finetuning (CFT) 」、「retraining (RT)」、「parallel train then merge (PTM)」の比較
「We find that PTM is an efficient and effective method of augmenting preexisting models, enabling the addition of new skills with a fraction of the compute required compared to other common methods.」と結論

Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models [105.0]
我々は,MAETと命名された多言語能力抽出と伝達手法を提案する。我々のキーとなる考え方は、大きな言語モデルから言語に依存しない能力に関する重みを分解し抽出することである。実験結果から,MAETは高度能力の抽出と伝達を効果的に行うことができ,トレーニングベースライン法よりも優れることがわかった。
論文参考訳（メタデータ） (Thu, 10 Oct 2024 11:23:18 GMT)
「Our key idea is to decompose and extract language-agnostic ability-related weights from LLMs, and transfer them across different languages by simple addition and subtraction operations without training.」という多言語能力の抽出とそのモデルマージ手法、MEAT: Multi-lingual Ability Extraction and Transfer approachを提案。「Our approach MAET achieves better performance than the competitive baseline methods (e g , continual pre-training and model merging with task vector) in multi-lingual complex reasoning tasks, including mathematical reasoning tasks and scientific reasoning tasks.」とのこと。
リポジトリはhttps://github.com/RUCAIBox/MAET

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.4]
モデルマージは、機械学習コミュニティにおける効率的なエンパワーメント技術である。これらの手法の体系的かつ徹底的なレビューに関する文献には大きなギャップがある。
論文参考訳（メタデータ） (Wed, 14 Aug 2024 16:58:48 GMT)
最近、よく話題に上がるモデルマージに関するサーベイ