1 Novel memory mechanism: The agent reads text in segments and efficiently updates memory through an overwriting strategy. This design enables the model to process arbitrarily long inputs within a fixed context window, fundamentally overcoming the window length limitations of traditional Transformer architectures.
2 O(n) complexity: By decoupling computation from text length, the complexity of processing long texts is transformed from quadratic growth to linear growth.
3 RL-driven extrapolation: We enhance the DAPO algorithm to support multi-turn training over context-independent conversations. Based on this, the trained model exhibits unprecedented extrapolation performance.