How I scripted, recorded and revealed 3 podcast sequence assisted by AI. | by Sre Chakra Yeddula | Generative AI

A fast introduction for folk not following my earlier posts. I’m on the journey to utilizing AI to finish all my unfinished hobbies. You can comply with the intro put up right here and January’s put up right here the place I used AI to finish my 3 kids’s books. This article will cowl my February 2023 journey, the place I exploit AI to finish my podcasting initiatives.

Most impactful second for this month

Can AI immortalize your family members?
This previous month whereas I used to be engaged on my February initiatives. One of my spouse’s aunts bought affected by lengthy COVID. One of the signs of her situation was her wrestle to make use of her voice. She was scared that there was an opportunity that she may lose her voice completely. When she heard of what I used to be doing for February, a thought hit all of us. Why not use AI to immortalize her voice? We can prepare a mannequin along with her voice so it is going to all the time be there for her grandkids and her kids to protect her voice in perpetuity.

This is the facility of AI when utilized in a human context.

Video Tutorial fo AI assisted Podcasting

For February 2023 we’re specializing in Audio and video #generativeai

When I began this month’s challenge of launching my podcasts. I didn’t know the place to begin. With plenty of analysis, I used to be capable of collect some information on how the method works. Before we go any additional let me provide the course of right here.

Pre-Production: This is the section the place we determine the subject and content material of the podcast.

Production: This section entails recording audio with the assistance of microphones and so on.,

Post-production: This entails enhancing audio clips collectively and including music or sound results. We may also make modify or course of the audio to take away any undesirable artifacts.

Publishing: Once your podcast is able to be heard by the world, on this stage you establish a internet hosting supplier and publish it to the totally different directories (apple podcasts, Spotify and so on.,)

This course of can appear daunting and even costly when you think about the myriad of choices in mics and podcast setups. This not embody the post-production prices of hiring somebody to do clear and improve your audio. Not anymore. I’ll stroll via intimately under how AI may help you each step of the best way.

I wished to make use of AI throughout all 3 manufacturing phases of getting my podcast on the market. Here is how I laid it out.

In Pre-Production I used AI for subjects concepts, analysis, content material manufacturing, and format design. In Production, for a number of the Podcasts, I used AI to supply and generate audio with good audio high quality and textual content to speech. And lastly in Post-Production, I leveraged the facility of AI to dynamically modify sound ranges, add results, match music to sections and even do noise cancellation and even detect background.

So, let’s get into element in every of those sections and see what it takes to take your concept from a house recording to a studio grade podcast.

For February I wished to finish the under initiatives.

1-Produce and publish a podcast sequence to assist children study a number of the well-known speeches in historical past.

2-Produce and publish a podcast sequence for an grownup viewers (child pleasant too) that lined certainly one of my favourite rabbit-hole subjects — “obscure references.”

3-Produce and publish a video tutorial sequence for youths that had straightforward to know explainers on fundamental science ideas.

In this text I’ll talk about the tech and the method I used to finish 1&2. Requirement 3 is now added as a separate article.

Now that I had my necessities, I got down to format a tough plan on what I wished to do and what sort of applied sciences and instruments I wished to make use of.

I tagged the necessities for 1&2 below these broad classes.

AI Voice technology throughout manufacturing:
I wished to discover and distinction totally different AI instruments on the market that might generate audio and in addition pit them in opposition to an precise human voice actor. I additionally wished to experiment with voice cloning and examine that with the opposite generated voices.

Here is a small pattern of how this appears to be like like. (extra particulars within the under sections)
Here the voices are repeating the identical paragraph in several instruments and strategies (together with an actual human actor). I’ll allow you to all attempt to determine which half sounded the very best.

AI assisted Postproduction:
AI instruments to reinforce audio. strategies like speech improve and noise discount along with AI enabled equalization to supply crisp audio.

Here is a small pattern of how this appears to be like like. (More particulars within the under sections)

In this part, we’ll discover what audio generative AI is, have a look at a number of the fashionable instruments on the market, talk about my expertise in utilizing them for podcasting functions, and eventually examine their efficiency with one another and with that of an actual human voice actor. So prepare for a mixture of technical jargon and informal insights as we examine how this revolutionary know-how stacks up in opposition to natural expertise!

Introducing the know-how

Let me begin with a daring assertion. Audio generated from Generative AI extra particularly TTS (Text-to-Speech) goes to revolutionize communication and media consumption within the subsequent few years. So, what is that this know-how, Let’s discover.

This subset of tech is extra popularly referred to as TTS. there are a lot of fashions which can be used on this know-how, however the preferred fashions are primarily based on Neural community TTS fashions.

Neural networks are a kind of machine studying algorithm that’s impressed by the construction and performance of the human mind. Like the mind, a neural community is made up of many interconnected nodes (or neurons) that work collectively to course of data.

In the case of text-to-speech (TTS), a neural community is educated on a big dataset of speech recordings and corresponding textual content transcripts. The community learns to map from the enter textual content to the output speech, by adjusting the energy of connections between nodes to attenuate the distinction between the anticipated speech and the precise speech within the coaching knowledge.

Once the neural community has been educated, it may be used to generate speech from new enter textual content. The enter textual content is fed into the community, which then applies the discovered mapping to supply the corresponding speech output.

An instance of a neural network-based Audio AI TTS mannequin

TTS has been round for quite a lot of years. The large leap when it comes to the place this know-how may go befell when this paper was launched https://arxiv.org/abs/1806.04558 in it they describe a course of to generate pure sounding speech that’s even distinctive to the voices they used within the coaching mannequin. This opened up a sea of latest options round voice cloning, re-voicing and so on.,

With it introduced in a variety of companies and instruments that have been able to producing lifelike pure sounding voices. How do these companies work; here’s a chart the place we attempt to flesh out their course of.

Typical course of stream for a GenAI audio SAAS software

What have been the advantages to me utilizing this know-how.

Speech generated utilizing these fashions and algorithms could be produced rapidly and with minimal effort and price. Gone are the times whenever you wanted to spend money on a high-quality recording gear or coping with sustaining a quiet atmosphere. I can now produce a podcast whereas i’m in the identical room as a screaming child or a barking canine. This is why it’s no shock that extra individuals are discovering its worth.

With the provision of those platforms on cellular and internet, speech synthesis has turn into extra accessible than ever earlier than.

Exploring the Popular Tools and Platforms out there

There are a variety of generative AI audio companies out there at the moment. All 3 large cloud distributors have their very own TTS companies that folk enthusiastic about can use Amazon’s AWS has Polly, Microsoft’s Azure has Cognitive Services, and Google Cloud has the Cloud Text-to-Speech API. My software although required me to select a extra shopper dealing with software, so I whittled the record all the way down to the under instruments.

Murf AI — Murf AI is a superb software for these seeking to generate speech from textual content rapidly. Their UI may be very person pleasant and makes it very easy to rise up and working with their service in minutes. It has all of the options that you’d anticipate from a contemporary TTS platform like the flexibility to customise voices, languages, and so on.,

Synthesys AI -Synthesys AI is one other useful gizmo for these seeking to generate speech from textual content. They provide a big selection of pure sounding voices and the flexibility to manage pitch, pace, and quantity.

Resemble AI — Resemble AI is a superb software for many who wish to generate speech from textual content with pure sounding voices. They provide options like punctuation management, intonation management, and customized dialects.

Podcastle AI -Podcastle AI is a robust software for producing speech from textual content. They provide voice instructions, automated sentence development, and assist for a number of languages. They additionally provide an incredible podcast constructing platform. They had the very best voice cloning service available on the market.

So, I narrowed this record all the way down to 4 instruments and utilizing every of them for an episode of my podcast sequence. This allowed me to run a complete real-world comparability evaluation and even pitted them in opposition to an actual human voice actor whom I employed for one of many podcast episodes. All the episodes can be found on the sequence “History speaks to my 5-year-old”

History Speaks to my 5-year-old

Epic showdown of evaluating voice generated by the totally different instruments and a Real Human Voice Actor

Here is a abstract of my outcomes.

Comparing the varied generative AI audio instruments

Comprehending the outcomes:

The ease of use and suppleness supplied by a lot of at the moment’s audio generative AI instruments are fairly spectacular! However, it was really thrilling to expertise firsthand how precisely a deep neural community can imitate an actual human’s cadence, intonation, and accent — although solely when tweaked good. Of course, there stays an incomparable naturality to utilizing the vocal abilities of an skilled skilled voice actor to be able to obtain that particular one thing for any challenge. In all although, it was an epic showdown between human capabilities and pc algorithms — one which left either side feeling victorious in their very own particular approach! I feel in just a few years’ time the capability to tell apart between actual and generated audio will stop to exist.

In phrases of closeness to human voice, the VO actor received the spherical, however when it comes to ease of use and talent to switch and make edits, The generated audio AI companies have been clearly the winner. In phrases of price, it was a no contest. I may generate actually good audio at a fraction of the associated fee.

In phrases of which software carried out the very best. It is upto the selecting of the person. I discovered that I used to be capable of get higher outcomes when i utilized somewhat put up processing to a very good AI generated audio. I do nonetheless lean in the direction of higher UI and murfAI was a transparent winner in that. I’m locally’s opinion. So take heed to the podcast sequence and weigh in along with your ideas.

I’ll nonetheless state that the voice cloning system will not be but prepared for manufacturing high quality however is unimaginable for normal on a regular basis use. You can truly pit my voiced clone vs my precise voice by evaluating episode 4 Vs episode 1 of my different podcast obscurate. I’ll cowl that under within the AI put up manufacturing stage.

In conclusion, the instruments are getting higher and higher, however it’s as much as the person to determine which software works for them. Every person can have totally different wants and preferences. It is an effective way to discover what you want and need in a software earlier than making an knowledgeable determination.

In this part we’ll cowl the options of AI I used through the postproduction section. For this experiment, I recorded my second podcast sequence with a daily on a regular basis mic and in a semi-noisy atmosphere. I dwell in a home with 4 adults , 2 children and a canine so discovering a quiet place is like discovering a needle in in a haystack whereas on a transferring prepare.

So, I needed to concentrate on noise discount and enhancing the standard of my recordings. There are some actually high-end audio instruments and software program on the market {that a} savvy postproduction engineer can use like Adobe Audition CC, Izotope RX 8 and iZotope Neutron 3. This required me to both understand how postproduction labored (equalization, spectral evaluation and so on.,) or rent somebody.

This prompted me to seek out extra reasonably priced and equally superior instruments that did a variety of this on their very own and didn’t require me to do a variety of equalizing. The full outcomes could be discovered on my podcast “Obscurate”.

Obscurate

Let me clarify how AI noise cancellation works.

AI noise cancellation works by utilizing superior algorithms to detect and establish the kind of noise in a recording. Once recognized, the algorithm then filters out that particular sound and reduces it to a stage that’s imperceptible. This permits for recordings with very low background noise ranges, even when speaking in noisy environments like airports, cafes or busy streets.

This is totally different out of your noise cancelling earphones, as a result of that’s {hardware} pushed noise cancellation and can’t adaptively change.

I used the under instruments and companies to enhance my sound high quality.

Podcastle’s Magic Dust — This service makes use of AI fashions to detect and scale back noise in a recording. It additionally provides improved readability and reduces background noise. extra particulars right here.

BuzzSprout’s Magic mastering — This is a service that makes use of AI to spice up sound high quality. It can analyze the sound and add in audio results like compression, EQ, and reverb for improved readability. extra particulars right here.

Adobe Podcast’s Speech enhancer — This software makes use of AI to cut back background noise, enhance readability, and add pure reverberation. It may modify the degrees of sound sources robotically. extra particulars right here.

The tech is nice, however it’s nonetheless not a house run, I needed to do some tiny little bit of adjustment and that required me to be taught somewhat little bit of equalization. This article will clarify it in quite simple phrases and If I used to be capable of do it. So can anybody else.

This a part of the method is much less AI and extra human. So right here is how one can take your podcast that you just created to all the favored companies. I’m going to simplify this for you and prevent hours of analysis.

Here is an article that does an incredible job explaining it in additional element. How to Submit a Podcast to Apple, Google and Spotify in 2023 | Voices

Each of the companies like Apple Podcasts, Spotify and Google podcasts all function independently. They are what you’ll name Directory companies., which suggests they act as a gateway to the Podcast and are accountable for distributing that content material to their customers.

In order to get your podcast listed over in these locations. You should host your podcast and supply an RSS feed hyperlink to every of them. Essentially what you might be doing is offering an online web page which has fundamental data in your podcast and a hyperlink to the place it’s saved.

There are a variety of internet hosting choices for producing an RSS hyperlink. I nonetheless will suggest utilizing companies like buzzsprout which can be constructed to take the legwork of internet hosting your personal rss feed.

Once you will have it hosted you’ll be able to then combine or push the rss feed onto the directories.

I used buzzsprout to do my internet hosting. You can see under the totally different choices to combine from one place.

I used to be capable of get 2 high quality podcast sequence up and working with no funding in studio or excessive finish gear. I used to be capable of additionally get my concepts into speech quicker than ever earlier than. Using a number of voices for a conversational podcast proper from my pc.

Podcast 1: History speaks to my 5-year-old

This podcast is aimed toward instructing children a number of the well-known speeches in historical past, It talks concerning the improtance of the speeches and even has a soundbyte of the speech itself. MLK jr , Gandhi, Abraham Lincoln and Churchill are among the many speeches lined.

History Speaks to my 5-year-old

Podcast 2: Obscurate — Curating the Obscure

This podcast is aimed toward a common viewers. This is a subject every week podcast that talks about a number of the most obscure references of a subject for every week. If you like popculture, languages and sports activities. This podcast is a should hear.

Obscurate

Stay updated with the newest information and updates within the inventive AI house — comply with the Generative AI publication.

How I scripted, recorded and revealed 3 podcast sequence assisted by AI. | by Sre Chakra Yeddula | Generative AI | Mar, 2023

Most impactful second for this month

non-Ti RTX 4070, the efficiency of an RTX 3080 for 186W?

The first ‘all-screen’ iPhone may not arrive till 2027

You may also like

Leave a Comment Cancel Reply