Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging [102.2]
汎用言語モデルを新しいスキルに適用することは、現在、高価なプロセスである。既存のモデルに新たなスキルを付加する効果について,新たなスキルを単独で訓練し,その後一般モデルとマージすることによって検討した。
論文参考訳（メタデータ） (Wed, 16 Oct 2024 18:23:50 GMT)
「As training datasets targeting new skills are constructed, it is an open question how best to patch preexisting models to incorporate the new skills represented by those datasets.」という状況での「continued finetuning (CFT) 」、「retraining (RT)」、「parallel train then merge (PTM)」の比較
「We find that PTM is an efficient and effective method of augmenting preexisting models, enabling the addition of new skills with a fraction of the compute required compared to other common methods.」と結論

コメントを残す

コメントを残す コメントをキャンセル