Adam Roberts is a software engineer and machine learning researcher at Google Brain, working on music and art generation as part of the Magenta project. He got his PhD in Computational Biology at UC Berkeley and before joining Google Brain, he worked in the Google Play Music Knowledge team. I met Adam in November at the ISMIR conference, where he was presenting some of the music generation work together with his colleagues from Magenta, and I took this opportunity to ask him a few questions.
I was really excited by progress in many areas of MIR research that I believe will help us improve music generation in the future. For example, there were multiple papers releasing very useful datasets (e.g., SUPRA1 and the Harmonix Set2) that can be used for training generative models. There were also improvements in such things as beat detection (e.g., Böck et al.3) and transcription, which will help provide even more useful data. As far as generation specifically, I was really excited to see Donahue et al.’s applying transfer learning to fine-tune Music Transformer on their NES dataset.4 Finally, I’m always happy to see people building really useful tools for creators like Smith et al.’s Unmixer.5
Yes, there are a ton of trade-offs! Transformers seem to capture long-term structure and expressivity better than RNNs, but they tend to be slower at inference-time, which makes them difficult to use in a real-time setting. VAEs (see our MusicVAE model) are one way of enabling more control over the generative process on top of continuation – trivially provided by language models – but their implementations in the symbolic music space have so far been limited to RNNs. Thus, the quality of their outputs has been constrained by the architecture in addition to trade-offs between supporting accurate reconstructions or better sampling. GANs are another model type that so far has had limited success in the symbolic generation space, but the “perceptual” loss they provide could lead to much better outputs versus the standard autoregressive maximum-likelihood/reconstruction losses typically used.
We have gotten a lot of positive feedback from early adopters,6 but I think it’s a bit too early to know what effect these types of tools will have on music composition and production. YACHT composed an album using the same MusicVAE models before Magenta Studio existed and have spoken a bit about their experience in various publications. I’d imagine more professionals have used the software but may not be as eager to advertise that it has become a part of their technique. In the end, I think there will be a huge benefit in setting up this type of feedback loop with creators in discovering what tools they want and also sparking collaborations and data sharing.
I think it’s up to the (co-)creators and consumers to decide when art is good. Human evaluation and curation, at a minimum, will be necessary for the foreseeable future. If art is created but nobody is there to enjoy it, is it still even art?
I think you can’t really remove the human from the loop here either. The best art to come out of these systems has a lot of human involvement both in the development and training stage as well in curation and post-production. I’m not sure if it will be possible or desirable to have completely machine-generated art that doesn’t get stale over time. It needs to move with the cultural zeitgeist. Also, if such a system were to exist, who is actually the artist: the model or the people who created it?
Zhengshan Shi, Craig Sapp, Kumaran Arul, Jerry McBride, Julius Smith. “SUPRA: Digitizing the Stanford University Piano Roll Archive.” In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), 2019. https://doi.org/10.5281/zenodo.3527858 ↩
Oriol Nieto, Matthew McCallum, Matthew Davies, Andrew Robertson, Adam Stark, Eran Egozy. “The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music.” In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), 2019. http://doi.org/10.5281/zenodo.3527870 ↩
Sebastian Böck, Matthew Davies, Peter Knees. “Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other.” In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), 2019. http://doi.org/10.5281/zenodo.3527850 ↩
Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison Cottrell, Julian McAuley. “LakhNES: Improving Multi-instrumental Music Generation with Cross-domain Pre-training.” In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), 2019. http://doi.org/10.5281/zenodo.3527902 ↩
Jordan Smith, Yuta Kawasaki, Masataka Goto. “Unmixer: An Interface for Extracting and Remixing Loops.” In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), 2019. http://doi.org/10.5281/zenodo.3527938 ↩
Adam Roberts, Jesse Engel, Yotam Mann, Jon Gillick, Claire Kayacik, Signe Nørly, Monica Dinculescu, Carey Radebaugh, Curtis Hawthorne, Douglas Eck. “Magenta Studio: Augmenting Creativity with Deep Learning in Ableton Live.” In Proceedings of the International Workshop on Musical Metacreation (MUME), 2019. https://ai.google/research/pubs/pub48280 ↩