Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length [112.8]
文脈長無制限の効率的なシーケンスモデリングのためのニューラルネットワークであるMegalodonを紹介する。 Llama2と比較して、Megalodonは70億のパラメータと2兆のトレーニングトークンのスケールでTransformerよりも効率が良い。
論文参考訳（メタデータ） (Fri, 12 Apr 2024 20:28:14 GMT)
Transformerより効率が良いとする構造の提案。MEGA (exponential moving average with gated attention)を継承。同規模のLlama2より性能がよさそうで驚き。
リポジトリはXuezheMax/megalodon: Reference implementation of Megalodon 7B model (github.com)

コメントを残す

コメントを残す コメントをキャンセル