How to Extract Only Non-Alphanumeric Characters from a String in Python Using Regular Expressions 


               


How to Extract Only Non-Alphanumeric Characters from a String in Python Using Regular Expressions



Python


In this article, we show how to extract only non-alphanumeric characters from a string in Python using regular expressions.

So, say, we have the string, "The Knicks game yesterday was great!!! The Knicks won 112-92 at MSG"

And we just want to extract the non-alphanumeric characters. This is characters that are neither a number nor an alphabetical character. We can do this in Python with a basic regular expression.

We simply write a regular expression that only allows non-alphanumeric characters to be returned. Any other characters will not be returned.

The regular expression statement that only returns non-alphanumeric characters is shown below.



This regular expression above will only have non-alphanumeric characters returned. It will not return any other type of character.

To get the full picture, let's look at a complete example.

This is shown in the code below.



So the first thing is that in order to use regular expressions in Python, you have to import the re module. So this is the first thing we do in our code above.

Next, we have the phrase that we extract from, "The Knicks game yesterday was great!!! The Knicks won 112-92 at MSG"

Our goal is to write a regular expression that gets all non-alphanumeric characters from this string.

The expression to do so is, patterns= [r'\W+']

Next, when you're using a regular expression to match a pattern in a string, you must use a for loop for the pattern that you create.

The reason for this is that patterns checks multiple instances of the string. Therefore, it is not just checking the entire string, "The Knicks game yesterday was great!!! The Knicks won 112-92 at MSG" just all as a whole. It is checking every instance of the string to see if there are multiple areas in the string that matches this pattern.

We then create a variable called match and set it equal to, re.findall(p, phrase)

With this line, we are looking to see if any of the phrase has any non-alphanumeric characters.

If so, Python returns it as a list.

We then print out the result.

The result we get is shown below.



Since spaces, exclamation points, and hyphens are non-alphanumeric characters, these are returned as output.

We can also create the code with a function, so that we can just call the function and return the results.

This is shown below.



The code above returns the same output as the previous code.

So this is simple code of how to extract non-alphanumeric characters from a string in Python using regular expressions.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...