RNN – arXiv最新論文の紹介

It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization [26.4]
我々は、ニューラルネットワークを連想記憶モジュールとして再認識し、注意バイアスと呼ばれる内部的目的を用いてキーと値のマッピングを学習する。高速並列化可能なトレーニングプロセスを維持しつつ、既存の線形RNNのパワーを超える3つの新しいシーケンスモデル(Moneta、Yaad、Memora)を提示する。例えば、Mirasの特定のインスタンスは、言語モデリング、コモンセンス推論、リコール集約タスクのような特別なタスクで例外的なパフォーマンスを達成し、トランスフォーマーや他の現代的な線形リカレントモデルよりも優れています。
論文参考訳（メタデータ） (Thu, 17 Apr 2025 17:59:33 GMT)
Googleによる新たなアーキテクチャの探索、Mirasフレームワークの提案、Building upon our formulation of memory and forget gate, we present Miras1, a fundamental framework to design novel sequence modeling architectures by four choice of: (1) Attentional bias (i.e., memory objective), (2) Retention gate, (3) Memory architecture, and (4) Memory learning algorithm (i.e., optimizer).
有望なアーキテクチャとしてMoneta, Yaad, Memoraを選定し性能を確認。1.3Bまでと規模が小さめであるが非常に有望な結果に見える。

RWKV-TS

RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks [42.3]
伝統的なリカレントニューラルネットワーク(RNN)アーキテクチャは、伝統的に時系列タスクにおいて顕著な地位を占めてきた。近年の時系列予測の進歩は、RNNからTransformersやCNNといったタスクに移行している。我々は,RWKV-TSという,時系列タスクのための効率的なRNNモデルの設計を行った。
論文参考訳（メタデータ） (Wed, 17 Jan 2024 09:56:10 GMT)
時系列予測へのRNN系モデルの改善、高速高性能とのこと
リポジトリはhoward-hou/RWKV-TS: RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks (github.com)

RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era [27.3]
本稿では,トランスフォーマーの効率的な並列化学習とRNNの効率的な推論を組み合わせた新しいモデルアーキテクチャを提案する。提案手法は線形アテンション機構を利用して,トレーニング中に計算を並列化し,推論中に一定の計算量とメモリの複雑さを維持するトランスフォーマーあるいはRNNとしてモデルを定式化することができる。我々の実験は、RWKVが同様の大きさのトランスフォーマーと同等に動作していることを示し、将来の作業がこのアーキテクチャを活用してより効率的なモデルを作成することができることを示唆している。
論文参考訳（メタデータ） (Mon, 22 May 2023 13:57:41 GMT)
性能が高いと噂のRNNベースのRWKVの論文
「While many alternatives to Transformers have been proposed with similar claims, ours is the first to back up those claims with pretrained models with tens of billions of parameters.」という記載が熱く、おっしゃる通りで実用レベルの大きさ＆有名ベンチマークで有効性を示すことは重要だと思う。
リポジトリはGitHub – BlinkDL/RWKV-LM: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it’s combining the best of RNN and transformer – great performance, fast inference, saves VRAM, fast training, “infinite” ctx_len, and free sentence embedding.

月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31