Transfomerに統合することが可能な改善の提案、「We introduced Virtual Width Networks (VWN) as a practical mechanism to decouple representational width from the quadratic compute typically associated with widening. With a modest 1.5× expansion, we observe consistent improvements. When scaling to 8× virtual width, optimization accelerates markedly: next-token prediction loss converges more than 2× faster and multi-token prediction loss more than 3× faster relative to the baseline width. Beyond these discrete points, the performance of VWN exhibits a clear scaling behavior.」、通信やメモリ部分での制約があるとのことだが、「In practice, virtual width expansions in the 1.5×–4× range are more feasible on today’s stacks,」という記載には期待が持てる。