How to Generate a Random Video id like Youtube in Python



Python


In this article, we show how to generate a random video id like youtube does in Python.

Youtube generates random video ids such as, https://www.youtube.com/watch?v=FXSuEIMrPQk

After the ?v= is the random video id that youtube generates.

So, in case you didn't know or weren't aware, youtube generates an 11-digit video id every time a user uploads a video. This 11-digit video id is a unique identifier for the video.

How youtube does this is it uses base-64 encoding.

Base-64 encoding is a system that uses 64 possible characters for each of the 11 digits.

Youtube uses 11 digits. Therefore, there are 64 x 64 x 64 x 64 x 64 x 64 x 64 x 64 x 64 x 64 x 64, or 6411, which is equal to 73,786,976,294,838,206,464 possible combinations.

So which characters does youtube use?

So we know that youtube uses a base-64 encoding system. Which 64 characters does youtube use?

The 64 characters that youtube uses is 0-9 (10 combinations), lowercase alphabetical characters a-z (26 characters), uppercase characters A-Z (26 characters), a hyphen (-), and an underscore (_).

So the characters are 0-9, a-z, A-Z, -, and _

These are all the possible characters that youtube uses to create a video id.

Again, with just 11 digits, we are able to generate 73,786,976,294,838,206,464 possible video ids.

11 digits can generate over 73 quintillion possible combinations with base-64 encoding.

So how can we generate a random 11-digit video id such as youtube does with Python?

The following code below generates a random 11-digit video id, shown below.



So the following Python code above is able to generate an 11-digit random string from the characters we have provided in the $characters variable. This characters variable contains 0-9,a-z,A-Z, -, and _

Those are the 64 characters that youtube uses in order to make a video id.

We then create a variable named result and set it equal to ''

We then have a for loop in which we put the for loop in a range from 0 to 11, in order to generate an 11-digit string.

This string is selected randomly from the $characters variable, choosing any characters from 0-9,a-z,A-Z, -, or _

This result variable now contains the random-generated 11-digit string.

To attach this to the youtube URL, we have the line, youtubeurl= "https://www.youtube.com?v=" + result

We now have the full URL that references an uploaded video, that users can go to and watch this video.

So this is pretty much it in a nutshell.

I'm guessing that youtube in order to filter out offensive URLs more than likely has a list of words that they do not want appearing in the URL, such as profanity or words such as idiot, fool, etc.

So how can we rewrite our above code so that if a generated video id contains offensive words, we can automatically generate another video id.

The code below does this.



Now we have a while loop.

Since result is initially equal to an empty string (''), we have to put in the while loop, while not result (which means while not empty). We also filter out the 2 offensive words, fool and idiot.

The find() function in Python will return the index of a string if it is found either 0 or a positive number depending on where it is found, or it will return a -1 if the item is not found. We therefore use this fact to write our code.

while result.find('fool') == -1 makes sure that the string, 'fool', is not found in the video id generated.

while result.find('idiot') == -1 makes sure that the string, 'idiot', is not found in the video id generated.

We then have to do another check.

We write an if statement that if a result was generated (it's no longer an empty string) and it does not contain the string, 'fool' or 'idiot', we print out the result.

Of course, in youtube's sake, instead of printing out the string, we actually make a database connection, see if the video id has already been taken, and if not, we save it to the database linking this video id to the video that has just been uploaded.

However, with this code above, there's one more problem.

Notice that the characters variable can have any combination of uppercase or lowercase letters. This means that if we want to filter out the word fool from the video id, just looking for 'fool' will not do it. If we truly want to filter out the word fool, we have to filter out, 'fool', 'FOOL', 'Fool', 'FOol' 'FOOl', etc. Since fool has 4 letters and it can be upper or lowercase, there are many possible combinations that it can appear in.

We need to rewrite the above code to account for all the different cases that the word, fool, can appear in.

We can do this by creating another variable and making it the lowercase version of the random video id generated. We then can just see if this word contains the lowercase characters. This way, we account for all different cases that a word can appear in.

This is shown in the code below.



So now we will filter out all forms of the word, fool, including 'FOOL', 'fool', 'Fool', 'FOol', 'fooL', 'FOOl', 'fOol', etc.

We created a second variable, lowerresult, which is set equal to result. This is the lowercase version of the random video id generated.

We then check whether this lowerresult variable is equal to 'foot' or 'idiot'

So this is how we can generate a random 11-digit video id just like youtube does with Python.


Related Resources

How to Insert Images into a Database Table with Python in Django

How to Insert Files into a Database Table with Python in Django

How to Insert Videos into a Database Table with Python in Django



HTML Comment Box is loading comments...