Notable other

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

Kyuil Lee, Dezhi Yu, Yongkang Huang

Published
Jun 11, 2026 — 17:39 UTC

Problem
This study addresses the gap in generative modeling of symbolic music, specifically focusing on Bach-style compositions. Despite the growing interest in music generation, there is limited comparative analysis of different model families—autoregressive, latent-variable, and adversarial—on their ability to produce coherent and stylistically accurate music. The work is presented as a preprint, indicating that it has not yet undergone peer review.

Method
The authors employ three distinct model families to generate Bach-style symbolic piano music from a shared MIDI corpus. The first model is an autoregressive Long Short-Term Memory (LSTM) network enhanced with attention mechanisms, which allows for better handling of long-range dependencies in music sequences. The second model family consists of latent-variable approaches, specifically recurrent Variational Autoencoders (VAEs) and vector-quantized VAEs (VQ-VAEs). The VQ-VAE is particularly noted for its ability to mitigate posterior collapse, a common issue in VAEs. Lastly, the study includes a generative adversarial network (GAN) designed to capture local pitch patterns. The training process for these models is not explicitly detailed in terms of compute resources or hyperparameters, but the focus is on their comparative performance in generating musically coherent outputs.

Results
The results indicate that the autoregressive LSTM with attention outperforms the other models in generating musically coherent samples, demonstrating superior ability to maintain stylistic integrity. The VQ-VAE shows promise in producing more structured outputs compared to conventional recurrent VAEs, effectively addressing the posterior collapse issue. In contrast, the GAN approach, while capable of capturing local pitch patterns, struggles with training stability and generalization to Bach’s style, resulting in less reliable outputs. The paper quantifies these findings, although specific numerical results and effect sizes are not provided in the summary.

Limitations
The authors acknowledge several limitations, including the challenges associated with training GANs, which can lead to instability and poor generalization. They also note that while the autoregressive model excels in coherence, it may not capture the full complexity of Bach’s style as effectively as desired. Additionally, the study is limited to a specific corpus of MIDI files, which may not encompass the full diversity of Bach’s compositions. The lack of peer review may also imply that the findings should be interpreted with caution until validated by the community.

Why it matters
This comparative study provides valuable insights into the strengths and weaknesses of different generative modeling approaches for symbolic music, which can inform future research in music generation and machine learning. The findings suggest that while autoregressive models are currently the most effective for generating coherent music, latent-variable models like VQ-VAEs offer potential for improved structure, and adversarial models may require further refinement. This work contributes to the ongoing discourse in the field of generative modeling, as published in arXiv.

Turing Wire

By Turing Wire editorial staff · Jun 11, 2026 · Editorial standards →

Source: arXiv cs.LG