Another company at the forefront of AI research and driving the transition from Artificial Narrow Intelligence (ANI) to Artificial General Intelligence (AGI) is Google and its parent company Alphabet. In particular, in the area of AI-generated music creation, Google created early AI applications with Magenta Studios, which ultimately opened the door to Artificial General Intelligence in music creation with WaveNet, developed by DeepMind. Both developments will be explored in this part of the blog series.
AI in the Music Industry – Part 12: Google’s Magenta Studio and the WaveNet
The following blog post was published on 1 June 2016: “We’re happy to announce Magenta, a project from the Google Brain team that asks: Can we use machine learning to create compelling art and music? If so, how? If not, why not?”[1] Google’s TensorFlow framework, which had already been used in Google Photos, Google Maps and Gmail, was also the technological basis for the AI music creation.[2] This is a recurrent neural network (RNN) that has been used over the years to create a number of applications, including a browser-based real-time piano keyboard,[3] the melody mixer MusicVAE[4] or GanSynth, which is able to generate audio files not only sequentially but also simultaneously in order to be able to make comparisons.[5] An important step towards the widespread use of AI creating and modifying music was the establishment of Magenta Studio. This is a kind of open-source AI toolbox consisting of five applications that can be used to create original pieces of music using AI (generate), to develop existing pieces of music (continue), to combine them (Interpolate), to adapt them to the human ear (groove), and finally to derive the appropriate drumbeat (drumify). The tools are very easy to use and can be used either directly or via plug-ins for the Ableton Live music software.[6]
Magenta Studio made waves at the Google I/O 2019 developer conference by collaborating with the band The Flaming Lips to create an AI-assisted concert performance. Magenta’s ‘Piano Genie’, an AI-based real-time instrument, was used to create the song ‘Fruit Genie’, which was then performed live in concert alongside the AI.[7] The ‘Bach Doodle’ also attracted media attention. Internet users were able to send their self-composed melodies to an AI which, using a deep learning algorithm, harmonised and visualised them in the style of Bach chorales, somewhat reminiscent of David Cope’s early AI experiments.[8] Since then, Magenta Studio has released a number of other applications, including ‘Maestro’,[9] an AI-assisted vocal coach; ‘Tone Transfer’,[10] which allows a melody played on a piano to be “translated” to many other instruments; and ‘Chamber Ensemble Generator’, which creates realistic chamber music.[11] But all these applications are just examples of the power of Google’s AI.
In addition to TensorFlow,[12] which forms the basis of the Magento Studio applications, there is also Google WaveNet. It is a deep-learning algorithm that can produce music recordings on its own, without human supervision. WaveNet was developed by DeepMind,[13] the AI start-up that became famous with AlphaGo. AlphaGo is an artificial intelligence that used an algorithm to learn the game Go, which is very popular in Asia, and, to the surprise of even its own developers, defeated South Korean Go world champion Lee Sedol in March 2016.[14]
At the same time as Google integrated DeepMind into its group in 2014, its AI developers were working on WaveNet. The original aim was to develop a text-to-speech (TTS) algorithm that could convert written text into human speech. However, it became clear that speech technology could also be used to generate music. WaveNet is a convolutional neural network (CNN) that is normally used for image recognition. Rather than using multiple parallel networks, as is required for image recognition, WaveNet uses a single CNN consisting of a series of hidden layers connected between the input and output layers. This means that the system requires less computing power and is therefore more robust.
WaveNet’s learning algorithm works in a similar way to that of AlphaGo.[15] It learns by trial and error. The audio data is treated as uninformed content, i.e. WaveNet does not analyse its training data in terms of music theory, as is the case with other AIs, but simply tries out new combinations and makes new connections. The AI’s learning process therefore takes a very long time and requires a lot of computing power, as it analyses all the possible combinations in the sheet music. Each note generated by the AI requires 16,000 micro-music samples per second from the dataset. It can therefore take months to produce the first usable results from a given dataset. However, once the AI has learnt the basic laws of music, the self-learning process becomes faster and faster. The audio quality of the pieces of music also improves during this process, and WaveNet is now able to imitate the human voice naturally, making deep fakes possible. But WaveNet’s capabilities go far beyond this. The AI can learn to create original pieces of music without human intervention. It becomes an autonomous composer, creating music on its own and applying the rules it has learnt. This is the first time that Artificial Narrow Intelligence (ANI) has been overcome and the boundaries of Artificial General Intelligence (AGI) have been crossed. AI therefore has the potential to be a creative music maker, creating unique works of music.
Endnotes
[1] Magenta, “Welcome to Magenta!”, June 1, 2016, accessed: 2024-04-22.
[2] TensorFlow, “Why TensorFlow?”, n.d., accessed: 2024-04-22.
[3] Magenta, “Real-time Performance RNN in the Browser”, October 5, 2017, accessed: 2024-04-22.
[4] Magenta, “MusicVAE: Creating a palette for musical scores with machine learning”, May 15, 2018, accessed: 2024-04-22.
[5] Magenta, “GANSynth: Making music with GANs”, February 25, 2019, accessed: 2024-04-22.
[6] Magenta, “Magenta Studio”, February 12, 2019, accessed: 2024-04-22.
[7] Magenta, “Magenta + Deeplocal + The Flaming Lips = Fruit Genie”, May 13, 2019, accessed: 2024-04-22.
[8] Magenta, “Visualizing the Bach Doodle Dataset”, July 16, 2019, accessed: 2024-04-22.
[9] Magenta, “Maestro: An AI-guided vocal coach”, January 26, 2021, accessed: 2024-04-22.
[10] Magenta, “Tone Transfer”, October 1, 2020, accessed: 2024-04-22.
[11] Magenta, “The Chamber Ensemble Generator and CocoChorales Dataset”, September 30, 2022, accessed: 2024-04-22.
[12] Magenta, “Magenta Studio 2.0”, August 24, 2023, accessed: 2024-04-22.
[13] Demis Hassabis, Shane Legg and Mustafa Suleyman founded DeepMind Technologies in London in 2010, which was bought by Google for US $500 million in 2014. Se: TechCrunch, “Google Acquires Artificial Intelligence Startup DeepMind For More Than $500M”, January 27, 2014, accessed: 2024-04-22.
[14] The Go game between Lee Sedol and AlphaGo is described in detail in chapter 3 of Marcus du Sautoy’s book ‘The Creativity Code’ (2019).
[15] How the WaveNet learning algorithm works is described in detail in Martin Clancy’s doctoral thesis “Reflections on the Financial and Ethical Implications of Music Generated by Artificial Intelligence” in chapter 4.11.
Commentaires