How to Remove Certain Characters from a String in Python Using Regular Expressions



Python


In this article, we show how to remove certain characters from a string in Python using regular expressions.

There are many reasons why we may want to exclude certain characters from a string in Python.

For example, you may have a website, where users post questions or comments, such as stackoverflow or quora.

Users may ask questions such as, How do you compute factorials in Python?

Usually the title of the post becomes the URL. However, characters such as ? aren't very compatible with URLs. Neither are periods (.), exclamation points (!)

Therefore, if a user types in, How do I find the derivative of a function in Python? Help!!!!!!

We do not want this to become the URL. We want to remove the question mark and exclamation points.

So how can this be done in Python using regular expressions?

The following line below filters out periods (.), exclamation points (!), and question marks.



Using the ' with a list [] allows us to remove items from a string in Python.

Everything in the list is a character that we exclude or remove from the string.

In this example, we exclude the characters, !.?

Last we must add a + after this list and then close the quotes.

To get the full picture, let's look at a complete example.

This is shown in the code below.



So the first thing is that in order to use regular expressions in Python, you have to import the re module. So this is the first thing we do in our code above.

Next, we have the phrase that we extract from, "How do you do it???????"

Our goal is to write a regular expression that removes the question marks, as well as other symbols such as exclamation points and periods. The above regular expression does so, and we assign this to the variable, patterns.

Next, when you're using a regular expression to match a pattern in a string, you must use a for loop for the pattern that you create.

The reason for this is that patterns checks multiple instances of the string. Therefore, it is not just checking the entire string, "334 animals" just all as a whole. It is checking every instance of the string to see if there are multiple areas in the string that matches this pattern. This is more easily understandable when you have multiple non-digit characters separated in a string. In the case of having multiple non-digit characters that are separate in the string, it will return each non-digit group separately rather than as a clump, which is what is more than likely what is intended.

We then create a variable called match and set it equal to, re.findall(p, phrase)

With this line, we are looking to see if any of the phrase has non digits.

If so, Python returns it as a list.

We then print out the result.

The result we get is shown below.



We can also create the code with a function, so that we can just call the function and return the results.

This is shown below.



The code above returns the same value of ['How do you do it'].

So this is how we can remove certain characters from a string in Python using regular expressions.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...