AI in the Music Industry – Part 2: How AI Works

Peter Tschmuck
Feb 12, 2024
7 min read

Updated: Aug 2, 2024

In part 1 of the “AI in the music industry” series, we defined artificial intelligence, which will lead us in a second step to explain the basics of how AI works. Part 2 of the series will therefore highlight which AI systems are already in use in practice and what potential they have for the music industry.

AI in the Music Industry – Part 2: How AI Works

We cannot go into the technical details and programming of algorithms for AI applications here. This is beyond the scope of this article, but an attempt should be made to understand how AI works. Common software-based AI is based on propositional logic.[1] Any statement can be true or false. The statement “The sun is shining today” can be easily verified or falsified by looking out of the window. For a digital AI, the statement takes the value 1 if the sun is shining (true) or the value 0 if it is not (false). Statements can be easily mathematised and linked with AND, OR and IF-THEN rules. The latter are the basis of any algorithm, which might read: “IF the water temperature in the boiler rises to 80° Celsius, THEN switch off the heating coil”. These links can be used to create very complicated chains of statements for the AI to work with.

This is how an AI “thinks” and conclude to make predictions. But it is done by mathematical calculations based on a set of rules. The set of rules that the AI uses to combine statements correctly is called syntax. It is the grammar by which the AI learns to derive correct statements. However, the AI cannot understand the content of the statement, the semantics. It simply processes input information according to predefined rules and generates an output without understanding what it is doing.[2]

AI needs huge amounts of data, also known as big data, in which it can uncover hidden relationships. AI is therefore a subfield of data science, which is concerned with extracting knowledge from large amounts of data at the interface of computer science, mathematics and statistics. As we have seen, one area of AI is machine learning. Unlike programming, machine learning does not apply a programmed algorithm to a collected set of data to produce a specific result. Instead, the AI system is fed with a data set and a corresponding output, and the computer learns by generating an algorithm that relates the data set and the output.[3]

Figure 1: AI, machine learning and deep learning

Source: After Choi et al., 2020, “Introduction to Machine Learning, Neural Networks, and Deep Learning”, Translational Vision Science & Technology, vol 9(4), article 14, p 2

Machine learning can be supervised, unsupervised or a hybrid form of semi-supervised learning. Supervised machine learning is based on labelled data, where both the input and output values are known. The data must be labelled by humans in advance so that the AI system can be trained on it. On this basis, the artificial intelligence can identify patterns and relationships in the data and make predictions. To do this, the data set is divided into three subsets for training, validation and testing. The training data set is used to show the AI system how the input and output data are related, which can be represented in a mathematical function from which the learning algorithm[4] is derived. The algorithm is now applied to the entire training data set. The results are compared with the parameters from the validation data set and adjusted. How well the learning algorithm works is then measured on the test data set, which is separate from the training data set. In this way, the AI system learns to adapt the algorithm to changes and form hypotheses about the output value. The learning process is called supervised because the results generated by the algorithm are always compared with the correct results from the test data set.[5]

In unsupervised learning, the correct output values are not known, so there can be no training data set from which to derive the learning algorithm. With unsupervised learning, there is no need to label the data. Instead, the algorithm recognises structures in the data set that are generated by statistical methods such as variance or cluster analysis. When a new dataset is added, the AI system can build on the results of previous analyses and further develop the algorithm.[6] Semi-supervised learning combines the above mentioned two methods, if the data set allows it and certain correlations between input and output data are known in advance.[7] The advantage is that less data is needed to train the AI, which results in a reduced training time.

Reinforcement learning is the third major form of machine learning, alongside supervised and unsupervised learning. Here the AI system learns a strategy to maximise the rewards it receives and avoid penalties. The AI system is connected to its environment and learns how to behave through feedback.[8] So there is not just one correct solution to a problem, but a multitude of solutions that are tested through trial and error until the best solution is found.

Machine learning is used where problems are clearly structured and can be easily modelled mathematically and statistically. However, these problems are difficult for humans to process intellectually, but can be solved quickly and efficiently by computers – think of robotics. Things get more complicated when problems are easy for humans to solve intuitively, but difficult for machines to put it into mathematical rules, as in speech and image recognition. Artificial neural networks (ANNs), which consist of nodes (neurons) that communicate with each other, have been developed to enable machines to solve such poorly structured problems. Such an ANN is modelled on the human brain, in which synapses exchange information with each other via neurons. The ANN’s counterparts are learning algorithms known as perceptrons, which link multiple input factors (input layer) to output factors (output layer) through a large number of invisible layers (hidden layers). One such multi-layer perceptron algorithm is the artificial neural network (ANN). The layers allow the algorithms to adapt to and learn from new inputs. Artificial neural networks can be either supervised, where the relationship between input and output is known and each predicted output is compared with a known output, or unsupervised, where the relationship between input and output is unknown and the algorithm has to recognise structures in the data set in order to learn from them. Reinforcement learning is also possible, where the algorithm has a goal that it tries to achieve through a trial-and-error process in interaction with the environment. Such an artificial neural network is also known as a reinforced neural network (RNN).

Machine learning using ANN and RNN is called deep learning because the layers between the input and output layers interact with each other in such a complex way that a human operator is no longer able to understand how the AI’s learning processes work.[9] A deep learning algorithm can therefore perform tasks by rearranging and structuring information from large amounts of raw data. However, the difference to traditional machine learning is not only in the amount of data used, which is much larger and available in unstructured form, but also in the extraction of information. Machine learning requires a human to tell the AI system what information to extract, whereas with deep learning, the AI system itself decides which information is important and which is not.[10]

Image recognition is a good example to understand how deep learning works. The AI is trained on images of cats, for example, so the input layer of the neural network processes the raw data in the form of millions of pixels with features of a cat, which passes its output values in a numerical format to the next layer, which passes them to the next layer, and so on and so forth. The values in the hidden layers become more abstract with each processing step and can no longer be traced back to the input data. The AI system must therefore decide for itself how to explain the relationship between the observed data and to come to a specific output in the final layer. As a result, it is no longer possible for the human operator to understand how the AI arrives at a particular result and identifies a cat as such.[11]

Image recognition, in particular, still requires a special approach to achieve reliable results, because it is very difficult to teach an AI to filter out the features that make up a cat from the huge number of images it is fed. For this reason, the input layer of an AI is equipped with a large number of filters, each of which analyses only small sections of an image and calculates a probability for a particular characteristic of a cat. A higher hierarchical layer of the neural network analyses these probabilities and links them to more complex combinations of characteristics that make up a cat. This process can continue through many hierarchical neural levels until an AI can not only distinguish a cat from a non-cat, but also between cat breeds. It is important to understand that the AI itself decides which characteristics make up a cat. Artificial neural networks arranged hierarchically in this way are called convolutional neural networks (CNN) because the nodes in the network are called convolutional filters.[12]

Regardless of whether ANN, RNN or CNN is used in AI systems, it is important to understand that all these forms of deep learning no longer make it possible to understand how the AI generates an output from input factors. The learning system is complex and the result can no longer be predicted. If the result is a piece of music, then it can certainly be described as creative, even if the AI only produces it pseudo-creatively. In the next part of the series we will take a closer look at early applications of algorithmic computing in the music business, such as music recognition and music recommendation.

Endnotes

[1] Ralf Otte, 2021, Maschinenbewusstsein. Die neue Stufe der KI – wie weit wollen wir gehen? Frankfurt/New York: Campus Verlag, Kindle edition, pos 624-641.

[2] Ibid., pos 787-821.

[3] Choi et al., 2020, “Introduction to Machine Learning, Neural Networks, and Deep Learning”, Translational Vision Science & Technology, vol 9(4), article 14, p 3

[4] An algorithm is a well-defined step-by-step procedure for solving a problem, expressed in simple if-then propositions.

[5] See Batta Mahesh, 2018, “Machine Learning Algorithms – A Review”, International Journal of Science and Research, vol 9(1), pp 381-383.

[6] Ibid., p 383.

[7] Ibid., p 384.

[8] Ibid., p 385.

[9] Choi et al., 2020, “Introduction to Machine Learning, Neural Networks, and Deep Learning”, Translational Vision Science & Technology, vol. 9(4), article 14, pp 7-9.

[10] Blog of Fraunhofer-Institute for Industrial Engineering, “Spielarten der Künstlichen Intelligenz: Maschinelles Lernen und Künstliche Neuronale Netze”, May 24, 2019, accessed: 2024-01-31.

[11] Kai-Fu Lee and Qiufan Chen, 2021, AI 2041: Ten Visions for Our Future, Taipeh: Taiwan Commonwealth Publishing, pp 50-54.

[12] Ibid., S. 96-98.

#supervisedlearning #RNN #reinforcementlearning #artificialintelligence #reinforcedneuralnetwork #semisupervisedlearning #artificialneuralnetworks #CNN #deeplearning #ANN #AI #convolutionalneuralnetworks #machinelearning #unsupervisedlearning

AI in the Music Industry – Part 2: How AI Works

AI in the Music Industry – Part 2: How AI Works

Recent Posts

Comments