2024年11月13日 – arXiv最新論文の紹介

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models [112.0]
Mixture-of-Transformer (MoT) はスパースマルチモーダルトランスアーキテクチャである。 MoTはモデルの非埋め込みパラメータをモダリティで分離する。複数の設定とモデルスケールでMoTを評価する。
論文参考訳（メタデータ） (Thu, 07 Nov 2024 18:59:06 GMT)
性能がルータに依存するMixture of Expertsに対して、「MoT extends the standard transformer architecture by incorporating modality-specific weights for all non-embedding model parameters, including feed-forward networks, attention matrices, and layer normalization.」というアプローチのMixture of Transformerの提案。「In the Chameleon 7B setting (autoregressive text-and-image generation), MoT matches the dense baseline’s performance using only 55.8% of the FLOPs.」と有効性を主張。

A Survey of Small Language Models [104.8]
小言語モデル (SLM) は, 計算資源の最小化による言語タスクの効率化と性能の向上により, ますます重要になってきている。本稿では,SLMのアーキテクチャ,トレーニング技術,モデル圧縮技術に着目した総合的な調査を行う。
論文参考訳（メタデータ） (Fri, 25 Oct 2024 23:52:28 GMT)
Small Language Model（といっても感覚的には小規模LLM）のサーベイ
「The inherent difficulty of a survey of small language models is that the definitions of “small” and “large” are a function of both context and time. GPT2, a “large language model” in 2019 at 1.5B parameters, is smaller than many “small” language models covered in this survey.」とある通り、Smallとは？というのが大きな疑問。