Stay on topic with Classifier-Free Guidance
In this work, we propose a permutation invariant language model, SymphonyNet,
as a solution for symbolic symphony music generation. We propose a novel
Multi-track Multi-instrument Repeatable (MMR) representation for symphonic
music and model the music sequence using a Transformer-based auto-regressive
language model with specific 3-D positional embedding. To overcome length
overflow when modeling extra-long symphony tokens, we also propose a modified
Byte Pair Encoding algorithm (Music BPE) for music tokens and introduce a novel
linear transformer decoder architecture as a backbone. Meanwhile, we train the
decoder to learn automatic orchestration as a joint task by masking instrument
information from the input. We also introduce a large-scale symbolic symphony
dataset for the advance of symphony generation research. Empirical results show
that the proposed approach can generate coherent, novel, complex and harmonious
symphony as a pioneer solution for multi-track multi-instrument symbolic music
generation.