The Future of Music: Will AI Take Over?
Creativity has always been the prerogative of humanity. Perhaps we have already recognized the superiority of artificial intelligence in cognitive tasks. We use it in computing and information processing, for instance. But when it comes to "human" activities that include creativity, AI remains inferior to us 🛸
Decades ago, artificial intelligence found its applicability in music. Sooner or later, experts envision that artificial intelligence will learn to generate music. They expect that it will be almost indistinguishable from what people create. Yet, AI is unlikely to be able to supplant composers and performers. Its product is secondary and cannot give listeners the same emotions.
Мodern musicians have found a variety of purposes for AI in the creation of music. Yes, for now, it is impossible to create a complete piece from scratch. But artificial intelligence can be used to improve the composition in different ways.
The evolution of music production: How does artificial intelligence make it simpler?
When we talk about creating music using a computer, we can talk about both an assistive system or a computer environment that helps musicians (composers, arrangers, producers), and an autonomous system aimed at creating original music. Both types of systems can involve neural network algorithms and deep learning.
We can also talk about the different stages of making music.
Artificial intelligence can be built into the process and helps us. Such are composition, arrangement, orchestration, and others. When a person composes music, he rarely creates a new piece from scratch. He re-uses or adapts musical elements that he has heard before. Likewise, you can turn on a computer assistant at various stages of the creation of a work. It can complement a human composer in different ways.
The traditional approach is to create music in symbolic form. Such include musical scores, a MIDI event sequence, melody, chord progression, and more. That is, artificial intelligence creates a symbolic form used to play the piece.
In other words, it abolishes the traditional difficult process used to create sound. It issues an "instruction" instead of a variety of audio signals. The benefit here is in the reduction of information that the algorithms must produce. This, in turn, reduces the problem of synthesizing to a more solvable one. It allows the efficient use of simple machine learning models.
The new advanced approach, for example, made it possible to create music in the Bach style. Another example is a neural network from OpenAI Musenet, which appeared in April 2019. MuseNet can compose four-minute compositions on ten instruments and combine various musical styles. This neural network used a huge array of MIDI recordings to get trained.
Another example is Jukebox. This is a neural network that generates music in various genres. It can even generate an elementary voice, as well as various musical instruments. The Jukebox creates the audio signal, bypassing the symbolic representation. Such musical models have much greater capacity and complexity than their symbolic counterparts. This implies higher computational requirements for training the model.
The “Science” behind AI in music: How exactly do neural networks create music?
How exactly do neural networks create music? There is one general principle. A neural network "looks" at a huge number of examples and learns to generate something similar. These algorithms are usually based on autoencoders and Generative Adversarial Networks (GANs).
An autoencoder is a neural network that learns to represent a complex and multidimensional dataset in a "simplified" form. Then, it recreates the original data from this simplified representation. That is, the autoencoder-based music generation model first compresses the raw sound into a space of lower dimensions. We then train the model to generate sound from this compressed space. Last but not least, we upscale it to the original sound space.
A generative adversarial neural network can be metaphorically represented as the work of a “counterfeiter” and an “investigator”. The task of the counterfeiter, or generative neural network model, is to create a realistic instance of data from the noise. For example, a face image or, in our case, a musical sequence.
The "discriminator” tries to distinguish a real piece of data from a "fake" one generated by the generator. And so, competing with each other, both models improve their "skills. As a result, the generative model trains itself to create believable data examples.
Should we listen to AI music in 2021? Is it even worth the time?
How do we know that a piece of music created by a machine is really worthy of our attention? To test the operation of artificial intelligence systems, musicians created the Turing test. Its idea is that a person interacts with a computer program and with another person.
We ask the program and the person questions and try to determine who we are talking to. The program passes the test if we cannot distinguish the program from the person.
In the field of music generation, the "musical Turing test" is sometimes used. For example, take the DeepBach algorithm. As the name suggests, it generates notes in the Bach style. A study included more than 1.2 thousand people (both experts and ordinary people). They had to distinguish real Bach from artificial. It turned out that this is very difficult to do. People could hardly distinguish between chorales composed by Bach and those created by DeepBach.
In the field of audio production, the success is not yet so impressive. Yes, the Jukebox represents a bold leap forward in music quality, audio length, and ability to tune in to an artist or genre. Yet, the differences between artificial music and human-made pieces are still noticeable.
There are traditional chords and impressive solos in the melodies from the Jukebox. Yet, we do not hear large musical structures such as repetitive choruses.
Also in the artificial works, you can hear noises related to the way the models work. The music generation speed is also still slow. It takes about nine hours to render one minute of audio using the Jukebox. For now, it is not useful in interactive applications.