Mixture-of-Transformers – arXiv最新論文の紹介

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models [112.0]
Mixture-of-Transformer (MoT) はスパースマルチモーダルトランスアーキテクチャである。 MoTはモデルの非埋め込みパラメータをモダリティで分離する。複数の設定とモデルスケールでMoTを評価する。
論文参考訳（メタデータ） (Thu, 07 Nov 2024 18:59:06 GMT)
性能がルータに依存するMixture of Expertsに対して、「MoT extends the standard transformer architecture by incorporating modality-specific weights for all non-embedding model parameters, including feed-forward networks, attention matrices, and layer normalization.」というアプローチのMixture of Transformerの提案。「In the Chameleon 7B setting (autoregressive text-and-image generation), MoT matches the dense baseline’s performance using only 55.8% of the FLOPs.」と有効性を主張。

コメントを残す

コメントを残す コメントをキャンセル