Hero image!

I tried YouTube shorts automation with AI

February 05, 2024 - Drygast
AI C# Software Dev SQLite

As with so many other things in my life - this started with me asking myself a question - "how would I do that"? This quest began after watching videos where ChatGPT was used in automating content creation. However, the existing methods involved manual intervention - posing questions, copying responses, and pasting them into another online service to generate videos. This wasn't my vision; I wanted a "generate content" button, requiring minimal input from me, beyond initiating the process.

I was really interested in figuring out how to solve it, and descided to get started. First thing to figure out was what kind of content I wanted to create. I settled on daily dad jokes - why? Well, that remains a mystery.

For those eager to jump straight to the results, I've compiled a playlist on my YouTube channel featuring all the jokes (about 180) I generated. Additionally, I've shared a video describing the process on my channel.

In order to automate my dad-jokes the way I wanted to, I figured that I needed a few things:

  • A service to generate the jokes
  • Local storage for the jokes
  • A service to generate voicelines
  • A video background generator
  • Automatic upload to YouTube
  • Custom software to seamlessly integrate everything

A quick search led me to various services and potential solutions for each requirement..

Crafting a Plan

The first challenge was generating the dad jokes. Through a series of test queries to ChatGPT, I found a format that allowed for easy parsing of responses - an approach that proved effective, most of the time.

You:
Give me 2 jokes (dad-jokes if possible) with "tech" as theme. Format the result as json using this template: { "jokes": [{theme:"", "part1": "","part2": "","part3": "" }] }

ChatGPT:
Certainly! Here are 2 jokes with a tech theme in the requested JSON format:
{
  "jokes": [
    {
      "theme": "tech",
      "part1": "Why don't programmers like nature?",
      "part2": "It has too many bugs!",
      "part3": ""
    },
    {
      "theme": "tech",
      "part1": "Why did the computer go to the doctor?",
      "part2": "Because it had a virus!",
      "part3": ""
    }
  ]
}

The formatting aspect proved to be crucial, serving as a foundation for later code parsing of the jokes. I also needed API access to ChatGPT, and a brief search assured me that this wouldn't pose a challenge.

Moving on, the next requirement was generating voicelines. After exploring various AI voice generator services, I settled on play.ht. What caught my attention was its free tier access to an API, and after conducting successful tests in a preliminary project, I found it to be efficient, eliminating the need for further exploration.

As for the remaining components - video background generation, YouTube uploading, and the custom software - I assumed these tasks would be straightforward and didn't delve deeper into them.

The plan

With these considerations in mind, the plan took shape:

  • Develop a small C# project (given my preference for C# and the availability of example code for API access in C#).
  • Use ChatGPT to supply me with dad-jokes formatted for easy parsing.
  • Save the jokes in a local SQLite database, implementing checks for duplicates and other considerations.
  • Utilize the Play.ht API to generate voicelines and download corresponding files.
  • Create a random custom background and text with FFMpegCore
  • Merge the custom background and text with the voicelines.
  • Upload to YouTube

Implementing

The journey commenced with the creation of a fundamental console application in Visual Studio.

namespace ai_shorts
{
    class Program
    {
        static async Task Main(string[] args)
        {
            System.Diagnostics.Debug.WriteLine($"Main(args: {String.Join(",", args)})");
            VideoGenerator theThing = new VideoGenerator();

            System.Diagnostics.Debug.WriteLine($"Init all the things");
            if (!await theThing.Init(args))
            {
                Console.WriteLine("Init failed   :(");
                return;
            }

            System.Diagnostics.Debug.WriteLine($"Generate video");
            if (!theThing.GenerateVideo())
            {
                Console.WriteLine("GenerateVideo failed   :(");
                return;
            }

            System.Diagnostics.Debug.WriteLine($"Publish video");
            if (!await theThing.PublishVideo())
            {
                Console.WriteLine("PublishVideo failed   :(");
                return;
            }

            Console.WriteLine("Done!");
        }
    }
}

I like to keep things in the older style and also add a bunch of debug info everywhere. Anyway - now for some more complicated things.

ChatGPT and dad-jokes

There was a couple of different NuGet packages to access OpenAI/ChatGPT but I ended up using Betalgo.OpenAI and you can read more about it at https://github.com/betalgo/openai. I basically used their example code and modified it to suit my needs. Here is a part of the code that actually asked ChatGPT for jokes. There is a lot more code surrounding this, but I'm not going to give away the full code of this project.

var chatCompletionResult = await openAiService.ChatCompletion.CreateCompletion(new ChatCompletionCreateRequest
{
    Messages = new List<ChatMessage> {
        ChatMessage.FromUser("Hi ChatGPT!" +
        "Could you please give me 2 jokes (dad-jokes if possible) of type "DadJoke", "SoftwareDevelopment", "Tech", or "Gaming"? Try to use one of the four types, but if you cannot figure out the type, use "DadJoke" as default. " +
        "It would be a big help if you format your response as json using this template: { "jokes": [{type:"", "part1": "","part2": "","part3": "" }] } and respond with the json only. " +
        "And don't tell me that you cannot respond in JSON format - I know you can and I believe in you!")
    },
    Model = Models.ChatGpt3_5Turbo,
    // MaxTokens = 50 //optional
});
if (chatCompletionResult.Successful)
{
    System.Diagnostics.Debug.WriteLine("chatResult: " + chatCompletionResult.Choices.First().Message.Content);
    string chatGPTResponse = chatCompletionResult.Choices.First().Message.Content;
    Regex regex = new Regex(@"{(.|s)*}", RegexOptions.Multiline);
    Match match = regex.Match(chatGPTResponse);
    if (match.Success)
    {
        string jsonString = match.Groups[0].Value;
        System.Diagnostics.Debug.WriteLine("jsonString: " + jsonString);
        var data = JsonConvert.DeserializeObject<MyJokes>(jsonString);
        return data?.Jokes;
    }
    else
    {
        Console.WriteLine(chatGPTResponse);
    }
}

As you can see, I had to encurrage ChatGPT after it suddenly stopped responding in json format. There was a few other strange responses and basically - it was not consistent at all. But in the end I got good responses most of the time.

SQLite

After coding and testing with ChatGPT for a while, I realized that I needed a mechanism to address duplicates. ChatGPT was not the best source of jokes since it exhibited a tendency to recycle a specific set of jokes, occasionally introducing a handful of new ones. So I just implemented a small SQLite database that not only stored the different jokes and checked for duplicates, I also decided to save the generated voicelines there in order to not ask for the same voiceline multiple times (when something went wrong). Here is a glimpse into the basic structure I used:

SQLiteCommand cmd;
cmd = conn.CreateCommand();
cmd.CommandText = "CREATE TABLE IF NOT EXISTS Jokes (Part1 VARCHAR(128), Part2 VARCHAR(128), Part3 VARCHAR(128), Type VARCHAR(32), Status INTEGER, PublishNumber INTEGER)";
cmd.ExecuteNonQuery();
cmd.CommandText = "CREATE UNIQUE INDEX IF NOT EXISTS IX_Joke ON Jokes (Part1, Part2, Part3)";
cmd.ExecuteNonQuery();
cmd.CommandText = "CREATE TABLE IF NOT EXISTS Voicelines (TextToRead VARCHAR(128), Voice VARCHAR(32), Filename VARCHAR(128), Duration REAL)";
cmd.ExecuteNonQuery();
cmd.CommandText = "CREATE UNIQUE INDEX IF NOT EXISTS IX_Voiceline ON Voicelines (TextToRead, Voice)";
cmd.ExecuteNonQuery();

Play.HT

Now that the joke from ChatGPT is stored in the local database, it's time for some voicelines. First I had to create an account and also request an API secret key. The Play.HT API was well documented, but did not contain any C# examples at the time. Anyway - it was not that difficult to write something that worked for my application:

internal class PayloadData
{
    [JsonProperty(PropertyName = "text")]
    internal string Text { get; set; }
    [JsonProperty(PropertyName = "voice")]
    internal string Voice { get; set; }
    internal PayloadData(string text, string voice)
    {
        Text = text;
        Voice = voice;
    }
}

...

internal async Task<bool> GenerateVoiceover(string text, string voice)
{
    PayloadData payloadData = new PayloadData(text, voice);
    string json = JsonConvert.SerializeObject(payloadData);
    var data = new StringContent(CleanText(json), Encoding.UTF8, "application/json");
    data.Headers.ContentType = new MediaTypeHeaderValue("application/json");
    using (var client = new HttpClient())
    {
        client.DefaultRequestHeaders.Add("AUTHORIZATION", $"Bearer {_apiSecretKey}");
        client.DefaultRequestHeaders.Add("X-USER-ID", $"{_apiUsername}");
        client.DefaultRequestHeaders.Add("accept", "text/event-stream");
        var stream = await client.PostAsync(_apiEndpoint, data).Result.Content.ReadAsStreamAsync();
        using (var reader = new StreamReader(stream))
        {
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
                if (!String.IsNullOrEmpty(line))
                {
                    // Ignore these lines
                    if (!line.Contains('{'))
                    {
                        line = "";
                    }
                    if (line.Equals("event: generating"))
                    {
                        Console.Write(".");
                        line = "";
                    }
                    // Keep only the json response
                    // Example line: data: {"id":"1n2AuS0Acz7iMmkVcH","progress":1,"stage":"complete","url":"https://peregrine-results.s3.amazonaws.com/pigeon/1n2AuS0Acz7iMmkVcH_0.mp3","duration":2.368,"size":49005}
                    if (line.StartsWith("data: {"))
                    {
                        line = line.Replace("data: {", "{");
                    }
                }
                // Deserialize server sent event and handle it
                if (!String.IsNullOrEmpty(line))
                {
                    ApiEvent? apiEvent = JsonConvert.DeserializeObject<ApiEvent>(line);
                    if (apiEvent != null)
                    {
                        if (!String.IsNullOrEmpty(apiEvent.ErrorMessage))
                        {
                            Console.WriteLine("Error: " + apiEvent.ErrorMessage);
                            return false;
                        }
                        else
                        {
                            // Example line: {"id":"1n2AuS0Acz7iMmkVcH","progress":1,"stage":"complete","url":"https://peregrine-results.s3.amazonaws.com/pigeon/1n2AuS0Acz7iMmkVcH_0.mp3","duration":2.368,"size":49005}
                            if (apiEvent.Stage.Equals("complete"))
                            {
                                Console.Write(Environment.NewLine);
                                Console.WriteLine("Downloading...");
                                string downloadUrl = !String.IsNullOrEmpty(apiEvent.Url) ? apiEvent.Url : "";
                                string localFilePath = $"{_outputDirectory}\{apiEvent.Id}.mp3";
                                float duration = apiEvent.Duration;
                                if (String.IsNullOrEmpty(downloadUrl))
                                {
                                    Console.WriteLine("downloadUrl is null or empty, download aint gonna happen...");
                                    return false;
                                }
                                using (var downloadClient = new HttpClient())
                                {
                                    using (var s = downloadClient.GetStreamAsync(downloadUrl))
                                    {
                                        using (var fs = new FileStream(localFilePath, FileMode.OpenOrCreate))
                                        {
                                            s.Result.CopyTo(fs);
                                        }
                                    }
                                }
                                _repository?.InsertVoiceline(text, voice, localFilePath, duration);
                            }
                        }
                    }
                }
            }
        }
    }
    return true;
}

private string CleanText(string input)
{
    string output = input.Replace("'", "'");
    return output;
}

Again - this is not the full code, just the important bits. I'm not going to write every single line of code in this article.

Custom video background

I used NuGet packages FFmpegArgs.Executes and FFMpegCore to generate two different random animated backgrounds (starfield and growing circles). I first wanted to use actual videos that I filmed in the nature somewhere, but later I descided to use generated videos instead. Honestly - there is just too much stuff going on in there to write about it here, but I might write a separate blogpost about it some day.

Merging audio

I used FFmpeg to merge the audio files with the backgrounds like this:

private void MergeAudio()
{
	System.Diagnostics.Debug.WriteLine($"VideoGenerator.MergeAudio()");
	Console.WriteLine("Merge video and audio...");

	string audioDirectory = @ConfigurationManager.AppSettings["PlayHT_OutputDirectory"] ?? String.Empty;
	string videoDirectory = @ConfigurationManager.AppSettings["OutputPath"] ?? String.Empty;
	int counter = 0;

	// Add silent audio
	var render = FFmpegRender.FromArguments(
				  $"-f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 -i {_outputFile} -c:v copy -c:a aac -shortest -y temp.mp4",
				  new FFmpegRenderConfig()
					// .WithFFmpegBinaryPath("path to ffmpeg") // Default from PATH or current dir
					.WithWorkingDirectory(System.IO.Path.GetDirectoryName(Assembly.GetEntryAssembly()?.Location))
				  );
	var result = render.Execute();
	if (result.ExitCode != 0)
	{
		throw new Exception($"ERROR: {String.Join(",", result.ErrorDatas)}");
	}

	// string inputVideoFile = videoDirectory + $"\ai-short-{_joke?.CurrentMaxPublishNumber + 1}.mp4";
	string args = $"-i temp.mp4 ";
	foreach (Voiceline v in _voicelines)
	{
		args += $"-i {v.Filename} ";
	}
	args += "-filter_complex "";
	foreach (VideoScriptItem vsi in _videoScript._videoScriptItems)
	{
		if (vsi.VideoScriptType == VideoScriptItem.ItemType.WriteText)
		{
			counter++;
			int theDelayInMs = (int)(((float)vsi.StartFrame / (float)_videoFps) * 1000) + (_voicelinePaddingFrames * 25 * 2);
			args += $"[{counter}]adelay={theDelayInMs}[aud{counter}]; ";
		}
	}
	counter = 0;
	foreach (Voiceline v in _voicelines)
	{
		counter++;
		args += $"[aud{counter}]";
	}
	args += $"amix={counter}:normalize=false,apad[a]; ";
	args += "[0:a][a]amerge[a]";
	args += "" ";

	args += $"-map 0:v -map "[a]" -codec:v copy -codec:a aac -ac 2 -y {_outputFile}";

	var render2 = FFmpegRender.FromArguments(
				  args,
				  new FFmpegRenderConfig()
					// .WithFFmpegBinaryPath("path to ffmpeg") // Default from PATH or current dir
					.WithWorkingDirectory(System.IO.Path.GetDirectoryName(Assembly.GetEntryAssembly()?.Location))
				  );

	var result2 = render2.Execute();
	if (result2.ExitCode != 0)
	{
		throw new Exception($"ERROR: {String.Join(",", result2.ErrorDatas)}");
	}
}

Upload to YouTube

Finally I got to the part where I just needed to upload to YouTube. The concept was actually pretty easy and google describe how it is done in detail so that was pretty easy. It did involve quite a bit of code. One thing that I found out was that for a free account there was a limit for how many uploads I could do in a day. Basically - 6 uploads is the limit. And that includes possible failures - simply calling the API endpoint costs a number of tokens. After 6 attempts you are of of tokens.

End result

I managed to create a few (180-isch) videos and overall it was a lot of fun figuring the whole thing out. I have a playlist on my YouTube channel with all the jokes. I also made a video describing the process and pushed it to my channel.

While I did have fun figuring all this out, I'm done after my 180 videos. ChatGPT was not a great source of jokes, but returned a lot of duplicates and later started to modify old jokes instead of supplying me with new ones. There where so many duplicates that I had to build a full loop into the program in order to handle them and eventually it got boring. ChatGPT also basically made up jokes by explaning the punchline (technically not the same joke I guess), and let's just say that ChatGPT does not have a sense of humor.

In the end - I had fun, but that's it. No more AI dad-jokes from me.