Mamba – ページ 3 – arXiv最新論文の紹介

MambaOut: Do We Really Need Mamba for Vision?

MambaOut: Do We Really Need Mamba for Vision? [70.6]
状態空間モデル(SSM)のRNNライクなトークンミキサーを備えたアーキテクチャであるMambaが最近導入され、注意機構の2次複雑さに対処した。本論文は,マンバが長周期および自己回帰特性を有するタスクに理想的に適していることを概念的に結論づける。我々は,コアトークンミキサーであるSSMを除去しながら,Mambaブロックを積み重ねてemphMambaOutという一連のモデルを構築した。
論文参考訳（メタデータ） (Mon, 13 May 2024 17:59:56 GMT)
Mambaの特徴を「Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics.」とし、分類問題には不要であるとし、実証した論文。一方で「the potential of Mamba for visual detection and segmentation tasks, which align with the long-sequence characteristic, merits further exploration.」ともあり、タスクの特徴を踏まえることが重要。

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation [16.3]
マルチモーダルなセマンティックセグメンテーションのためのSiamese MambaネットワークであるSigmaを紹介する。シームズエンコーダを用いて,マンバ核融合機構を革新することにより,様々なモーダルから本質的な情報を効果的に選択する。本手法はRGB-ThermalとRGB-Depthのセグメンテーションタスクにおいて厳密に評価される。
論文参考訳（メタデータ） (Fri, 05 Apr 2024 17:59:44 GMT)
MambaベースのMulti-modal semantic segmentationモデルの提案。画像分野の応用も有望なんだろうか。
リポジトリはzifuwan/Sigma: Python implementation of Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation (github.com)

RS-Mamba

RS-Mamba for Large Remote Sensing Image Dense Prediction [58.1]
VHRリモートセンシングにおける高密度予測タスクのためのリモートセンシングマンバ(RSM)を提案する。 RSMは、線形複雑なリモートセンシング画像のグローバルな特徴をモデル化し、大きなVHR画像を効率的に処理できるように設計されている。 RSMは、VHRリモートセンシングの高密度予測タスクにおいて最先端の性能を達成する。
論文参考訳（メタデータ） (Wed, 03 Apr 2024 12:06:01 GMT)
リモートセンシングへのMambaの応用、テキストに目が行きがちだが、「We proposed a Remote Sensing Mamba for dense prediction tasks in ultra-high resolution remote sensing imagery, addressing the limitations of CNN-based models in global context information modeling and the challenges of transformer-based models handling large remote sensing images.」ということでtransformerだと厳しい用途に向いているモデルのよう。
リポジトリはwalking-shadow/Official_Remote_Sensing_Mamba: Official code of Remote Sensing Mamba (github.com)

MambaByte

MambaByte: Token-free Selective State Space Model [71.9]
マンババイト(英: MambaByte)は、マンバSSMがバイト配列で自己回帰的に訓練したトークンレス適応である。 MambaByteは、言語モデリングタスクにおいて、最先端のサブワードトランスフォーマーよりも優れています。
論文参考訳（メタデータ） (Wed, 03 Apr 2024 02:36:27 GMT)
バイトを対象としたMamba、「Due to their recurrent nature, SSMs enable significantly faster text generation to Transformer models.」とある通り、token freeなバイト対象モデルにMambaは向いているのかもしれない

Jamba: A Hybrid Transformer-Mamba Language Model

Jamba: A Hybrid Transformer-Mamba Language Model [36.5]
本稿では,新しいハイブリッドなTransformer-Mamba混在型アーキテクチャに基づく,新しいベースとなる大規模言語モデルであるJambaを紹介する。 JambaはTransformer層とMamba層のブロックをインターリーブし、両方のモデルファミリーの利点を享受する。
論文参考訳（メタデータ） (Thu, 28 Mar 2024 23:55:06 GMT)
DBRX, Jamba, Grok-1.5, RWKV Finch – arXiv最新論文の紹介 (devneko.jp)で紹介したJambaの論文。モデルアーキテクチャの詳細などが紹介されている。「Combining Transformer, Mamba, and MoE elements allows flexibility in balancing among the sometimes conflicting objectives of low memory usage, high throughput, and high quality.」とあり、全52BパラメータだがActiveなものは12B、KVキャッシュは4GB（256Kコンテキスト）ととても軽量。Mistralだとパラメータ7.2B、Activeなものも7.2BでKVキャッシュは32GB、Mixstralだと同46.7B, 12.9B, 32GB。（コンテキスト長すぎじゃないかと思わなくはないが）　性能はMixstralと良い勝負であり非常に効率的。
リポジトリはai21labs/Jamba-v0.1 · Hugging Face

DBRX, Jamba, Grok-1.5, RWKV Finch

先週もLLM界隈の話題が多かった。注目はDatabricks（＆元MosaicML）によるDBRXで公開モデルとしては非常に高性能（ライセンスは独自）。「DBRX」を発表: オープンソース大規模言語モデルのスタンダードとして | Databricks Blog

JambaはMamba MoE + transformerでSSMハイブリッドとして商用レベルをうたうモデル。ベースモデルはApache-2ライセンス。Introducing Jamba: AI21’s Groundbreaking SSM-Transformer Model

transformer以外の選択肢だとRWKV-6 Finch（RWKV-x060-World-1B6-v2.1-20240328-ctx4096）がhugging faceで試用可能となっている。長文翻訳はまだまだという感じだがfine tuning等やってみたいところ
RWKV-Gradio-1 – a Hugging Face Space by BlinkDL

Grok-1.5（および2）のアナウンスもありこちらも要注目。
Announcing Grok-1.5 (x.ai)
XユーザーのElon Muskさん: 「Should be available on 𝕏 next week. Grok 2 should exceed current AI on all metrics. In training now.」 / X (twitter.com)

GPT-4やGemini、ClaudeなどAPIベースの選択肢以外が広がることを期待したい。

Is Mamba Effective for Time Series Forecasting

Is Mamba Effective for Time Series Forecasting? [30.2]
状態空間モデル(SSM)は、シーケンス内の複雑な依存関係をキャプチャする能力によって、注目を集めている。本稿では,時系列予測(TSF)のための2つの簡単なSSMモデルを紹介する。 S-MambaとD-MambaはGPUメモリとトレーニング時間を節約しながら優れたパフォーマンスを達成する。
論文参考訳（メタデータ） (Sun, 17 Mar 2024 08:50:44 GMT)
時系列予測へのMambaの応用、「S-Mamba employs one Mamba block to process VC, while D-Mamba incorporates an additional mamba block compared to S-Mamba for VC.」（VC = variates correlations ）という違いを持つ2つの構成で実験、効果を確認とのこと。
「The results prove Mamba possesses robust capabilities and exhibits remarkable potential to replace Transformer in the TSF tasks.」とのことだが、ほんまかいなと思わなくもなく、解釈が気になるところ。。。

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding [49.9]
状態空間モデルMambaは、長周期モデリングからビデオモデリングへの成功を拡大する有望な特性を示している。我々は、マンバがビデオのモデリングにおいて様々な役割を担い、マンバが優位性を示す様々なタスクを調査しながら、包括的な研究を行う。実験の結果,ビデオ専用タスクとビデオ言語タスクの両方において,Mambaの強い可能性を示すとともに,有望な効率と性能のトレードオフを示すことができた。
論文参考訳（メタデータ） (Thu, 14 Mar 2024 17:57:07 GMT)
動画領域へのMambaの応用。「Our comprehensive evaluation of Mamba within the video understanding domain showcases its potential as a viable alternative to traditional transformers」と肯定的な結果。
リポジトリはOpenGVLab/video-mamba-suite (github.com)

GSSMs vs transformerとBlack Mamba

GSSM（Generalized State Space Models）とtransformerの比較とMoEなアプローチ。昨日のMambaのICL（In Context Learning）性能 – arXiv最新論文の紹介 (devneko.jp)の通り、特性はかなり違うのでMoEっぽく使うのはありなのかもしれない。

Repeat After Me: Transformers are Better than State Space Models at Copying [57.4]
一般化された状態空間モデルは、推論時間効率の観点からは有望であるが、入力コンテキストからのコピーを必要とするタスクのトランスフォーマーモデルと比較して限定的であることを示す。
論文参考訳（メタデータ） (Thu, 1 Feb 2024 21:44:11 GMT)
シンプルな事例でのGSSMとtransformerの比較。当然なのかもだが「transformer models dramatically outperform state space models at copying and retrieving information from context.」

BlackMamba: Mixture of Experts for State-Space Models [10.2]
状態空間モデル(SSM)は、最近、大規模な言語モデリングベンチマークでトランスフォーマーと競合する性能を示した。 MoEモデルは、計算コストと遅延コストを大幅に削減しながら、顕著なパフォーマンスを示している。我々は,Mamba SSMとMoEを組み合わせた新しいアーキテクチャであるBlackMambaを紹介した。
論文参考訳（メタデータ） (Thu, 1 Feb 2024 07:15:58 GMT)
リポジトリはZyphra/BlackMamba: Code repository for Black Mamba (github.com)、モデルも公開されている　Zyphra/BlackMamba-2.8B · Hugging Face

MambaのICL（In Context Learning）性能

MambaのICL性能に関して論文が二つ出ていた。結局タスクによるっぽいという感じだろうか。。。少なくとも一定のICL能力があるのは間違いないように思える。一つ目のハイブリッドアーキテクチャの提案はありなのか、それだとMambaの良さが薄くなるのか悩ましいところではある。

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks [26.2]
状態空間モデル(SSM)は言語モデリングにおけるトランスフォーマーネットワークの代替として提案されている。本研究では,各種タスクを対象としたトランスフォーマーモデルに対して,マンバに着目したSSMのICL性能を評価する。その結果、SSMは標準回帰ICLタスクにおいてトランスフォーマーと相容れない性能を示し、スパースパリティ学習のようなタスクでは優れていた。これらの制約に対処するため、我々はMambaとアテンションブロックを組み合わせたハイブリッドモデルを導入し、個別に苦労するタスクにおいて個々のモデルを上回るようにした。
論文参考訳（メタデータ） (Tue, 6 Feb 2024 18:56:35 GMT)
こちらは「Our results show that SSMs perform comparably to Transformers in standard regression ICL tasks, while outperforming them in tasks like sparse parity learning.However, SSMs fall short in tasks involving non-standard retrieval functionality.」とのことでタスクに依存という報告
上記を受けてMambaFormer というハイブリッドアーキテクチャを提案

Is Mamba Capable of In-Context Learning? [68.3]
Mambaは、新しく提案された選択的な状態空間モデルである。マムバは文脈内学習におけるトランスフォーマーモデルの性能と一致することを示す。
論文参考訳（メタデータ） (Mon, 5 Feb 2024 16:39:12 GMT)
こちらは「Mamba matches the performance of transformer models for ICL.」との報告
「Mamba appears to solve ICL problems by incrementally refining its internal representations in a manner akin to an iterative optimization strategy, as transformer do.」という指摘も興味深い

2026年1月
月	火	水	木	金	土	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31