MNET++: Music-Driven Pluralistic Dancing Toward Multiple Dance Genre Synthesis

Jinwoo Kim; Beom Kwon; Jongyoo Kim; Sanghoon Lee

doi:10.1109/TPAMI.2023.3312092

MNET++: Music-Driven Pluralistic Dancing Toward Multiple Dance Genre Synthesis

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15036-15050. doi: 10.1109/TPAMI.2023.3312092. Epub 2023 Nov 3.

Authors

Jinwoo Kim, Beom Kwon, Jongyoo Kim, Sanghoon Lee

PMID: 37669201
DOI: 10.1109/TPAMI.2023.3312092

Abstract

Numerous task-specific variants of autoregressive networks have been developed for dance generation. Nonetheless, a severe limitation remains in that all existing algorithms can return repeated patterns for a given initial pose, which may be inferior. We examine and analyze several key challenges of previous works, and propose variations in both model architecture (namely MNET++) and training methods to address these. In particular, we devise the beat synchronizer and dance synthesizer. First, generated dance should be locally and globally consistent with given music beats, circumvent repetitive patterns, and look realistic. To achieve this, the beat synchronizer implicitly catches the rhythm enabling it to stay in sync with the music as it dances. Then, the dance synthesizer infers the dance motions in a seamless patch-by-patch manner conditioned by music. Second, to generate diverse dance lines, adversarial learning is performed by leveraging the transformer architecture. Furthermore, MNET++ learns a dance genre-aware latent representation that is scalable for multiple domains to provide fine-grained user control according to the dance genre. Compared with the state-of-the-art methods, our method synthesizes plausible and diverse outputs according to multiple dance genres as well as generates remarkable dance sequences qualitatively and quantitatively.