Voice Cloning’s Future, and How It Will Change the Game

December 07, 2021

Liubov Aras

Voice Cloning’s Future, and How It Will Change the Game

Artificial intelligence-assisted voice cloning is tiresome and straightforward, indicating that the technology is nearly mature and ready for widespread use.

You have to read a screenplay as precisely as you can into a microphone for about 30 minutes. Then, after re-recording your flubs and mumbles hundreds of times, you'll send the generated audio files to be processed and, within a few hours, be told that a duplicate of your voice is prepped and ready.

Then, in a chatbox, input whatever you want, and your AI clone will repeat it back to you, with realistic audio that will mislead even friends and family — at least for several moments. Many people may be unaware that such a service exists, and I don't believe we've wholly considered the implications that easy access to this technology will have.

Own the streaming cloud for one month

The analyzer has vastly improved in recent years of breakthroughs in machine learning. Traditionally, the most lifelike synthetic voices were made by recording audio of a voice actor, chopping up their speech into diverse sources, and splicing them together to form new words, much like letters in a ransom note. To create unprocessed audio of anyone speaking from the start, neural pathways can now be trained on unordered target speech data. The final product is quicker, easier, and more realistic. The quality isn't flawless right out of the machine (though human tuning can help), but it'll only get better over time.

How to Make Voice Clones Like a Professional

Making these clones requires no unique skills, which means dozens of firms are already selling comparable services. Simply Google "AI voice synthesis" or "AI voice deep fakes," and you'll see how widespread the technology is, available from specialized stores like Resemble that specialize in speech synthesis. AI and Respeecher have also been integrated into more prominent platforms, such as Veritone (where the technology is used in advertising) and Descriptive (which uses it in the software it makes for editing podcasts).

Celebrity Voice Cloning and How They Benefit from It

In the coming years, celebrity voice cloning is anticipated to be the most popular. Corporations expect celebrities to clone and rent out their voices to supplement their income with no effort. Veritone, for example, said earlier this year that it would allow influencers, sports, and actresses to license their AI voices for endorsements and radio idents without ever having to enter a studio.

Although such apps aren't generally used (or, if they are, aren't widely discussed), it appears to be an apparent opportunity for celebrities to generate money.

Present speech synthesis technology is already being included in products such as Descript's eponymous podcast editing software. The company's "Overdub" function allows podcasters to build an AI clone of their voice, allowing producers to make quick adjustments to their audio in addition to the program's transcription-based editing.

However, the initial shock of hearing a voice clone yourself does not imply that human voices are obsolete. Not at all. With a bit of human editing, you can increase the quality of voice deepfakes, but in their automated form, they still can't match the breadth of inflection and intonation offered by pros. While AI voices may be great for rote speech work — corporate communication systems, automatic official statements, and the like — they can't compete with people in many use situations, according to voice artists.

How to Get the Most Out of Voice Cloning

What does this technology imply for the general population, though? Those of us who aren't famous enough to reap the benefits of technology yet aren't personally scared by its advancement? Well, there is a wide range of possible uses. It's easy to envisage a video game where the character creation screen includes a voice clone option that makes it appear like the player is speaking the entire game's speech. Alternatively, there may be an app for parents that copies their voices to read bedtime stories to their children even when they aren't present.

The Other Side of Voice Cloning

There are also dangers to be aware of. Fraudsters have already utilized voice clones to deceive businesses into transferring funds to their accounts, and further nefarious uses are undoubtedly in the future. Consider a high school student secretly recording a classmate to produce a voice clone, then fabricating audio of that person disparaging a teacher to get them in trouble. Suppose the use of visual deepfakes is any indication, where concerns about political misinformation have proven to be largely unfounded, but the technology has wreaked havoc by enabling nonconsensual pornography. In that case, it's these types of instances that pose the greatest danger.

However, one thing is sure: anyone will be able to construct an AI voice clone of themselves in the future. However, the script for this chorus of computerized voices has yet to be completed.