pp. 4283-4295
S&M3800 Research Paper of Special Issue https://doi.org/10.18494/SAM5309 Published: October 11, 2024 A Flexible State Space Model for Large Language Models: The GroupMamba Approach [PDF] Xiling Liu, Qunsheng Ruan, Yingjia Wu, Kai Chen, and Cheng-Fu Yang (Received August 14, 2024; Accepted September 27, 2024) Keywords: transformers, state space models (SSMs), GroupMamba, natural language processing (NLP)
Transformers have consistently excelled in large language models owing to their exceptional scalability, efficient parallel processing, superior contextual comprehension, and versatility across a wide range of tasks. In recent years, state space models (SSMs) have also seen notable advancements, with the Mamba model standing out for its efficient parallel processing capabilities and low computational complexity. However, despite these strengths, SSMs, including Mamba, often struggle to match the performance of transformers in tasks that require deep contextual understanding and the handling of high-dimensional data. In this paper, we introduce GroupMamba, a novel group-based SSM specifically designed to optimize the trade-off between complexity and parallel processing capabilities by strategically grouping SSM modules. These groupings can be customized to suit various tasks, effectively blending the strengths of both Mamba and transformer architectures. Experimental results demonstrate that GroupMamba achieves significant improvements across diverse tasks, including a notable 2% increase in accuracy on public benchmark tests. In this work, we mark a significant advancement in the integration of SSMs and transformers, offering a more adaptable, scalable, and efficient solution for addressing complex natural language processing challenges.
Corresponding author: Cheng-Fu YangThis work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Xiling Liu, Qunsheng Ruan, Yingjia Wu, Kai Chen, and Cheng-Fu Yang, A Flexible State Space Model for Large Language Models: The GroupMamba Approach, Sens. Mater., Vol. 36, No. 10, 2024, p. 4283-4295. |