Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model

Sangjune Park¹, Inhyeok Choi¹, Donghyeon Soon², Youngwoo Jeon¹, Kyungdon Joo¹^†

¹Artificial Intelligence Graduate School, UNIST, ²Department of Computer Science, DGIST

WACV 2026

Paper arXiv Video

Kendrick Lamar - Not Like Us
00:02 ~ 01:19

K-pop Demon Hunters - Golden
00:47 ~ 02:04

Aespa - Black Mamba
02:06 ~ 02:49

MambaDance generates 3D dance given in-the-wild musics. Because of the copyrights, we muted all of the results. Please refer to the timeline in each description of video.

Abstract

Dance is a form of human motion characterized by emotional expression and communication, playing a role in various fields such as music, virtual reality, and content creation. Existing methods for dance generation often fail to adequately capture the inherently sequential, rhythmical, and music-synchronized characteristics of dance. In this paper, we propose a new dance generation approach that leverages a Mamba-based diffusion model. Mamba, specialized for handling long and autoregressive sequences, is integrated into our diffusion model as an alternative to the off-the-shelf Transformer. Additionally, considering the critical role of musical beats in dance choreography, we propose a Gaussian-based beat representation to explicitly guide the decoding of dance sequences. Experiments on AIST++ dataset show that our proposed method effectively reflects essential dance characteristics and advances performance compared to the state-of-the-art methods.

Method

Overall architecture of MambaDance. We extract music feature $m$, and a novel beat representation $b$ from the binary mask of beat of the feature (blue box). Two-stage diffusion architecture makes our approach enable length-agnostic generation in a single inference (green box). Decoder of the diffusion consists of the proposed Mamba-based modules, e.g., Single-Modal Mamba (SMM), Cross-Modal Mamba (CMM), and Adaptive Linear Modulation (ADaLM) (gray box).

Comparisons on FineDance Dataset

Qualitative comparison of MambaDance against state-of-the-art methods on FineDance dataset. Please unmute the video to evaluate the dance generation synchronized to the music beats.

Comparisons on AIST++ Dataset

Qualitative comparison of MambaDance against state-of-the-art methods on AIST++ dataset. Please unmute the video to evaluate the dance generation synchronized to the music beats.

BibTeX (TBD)


    @InProceedings{Park_2026_WACV,
        author    = {Park, Sangjune and Choi, Inhyeok and Soon, Donghyeon and Jeon, Youngwoo and Joo, Kyungdon},
        title     = {Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model},
        booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
        month     = {March},
        year      = {2026},
        pages     = {1767-1776}
    }