How to Find the Index Locations of a Match or Matches of a Regular Expression in Python



Python


In this article, we show how to find the index location of a match or matches of a regular expression in Python.

So let's say you are looking for a particular phrase or word in a larger string of text and you would use regular expressions to do so.

Let's say we are looking for the word, 'hot', within a string of text.

In Python, we can find the index locations of matches, so we can know exactly where these matches are found by index location within the string of text.

So let's say we have the string, "The hotdog was delicious"

The start index location of 'hot' is 4.

And the end index location of hot is 7.

This is a little difficult to understand, the end index. However, just think, it does end at t, which is location of 6. However, it really ends at the next index location of 7, since t is at index location, 6. Python views it from a end location of exclusivity which is at index 7.

Therefore the match of 'hot' within the string of text has a start index location of 4 and an end index location of 7.

Therefore, the match of 'hot' spans the index locations 4 to 7.

So let's do this example now in code so that you can see how it's done in code.



So let's now go over this code.

re is the module in Python that allows us to use regular expressions. So we first have to import re in our code, in order to use regular expressions.

After this, we have a variable, phrase, which contains the string that we want to search using regular expressions.

We then have a for loop in which we loop through the phrase for the word, 'hot'.

The re.finditer() function finds all instances of the word 'hot' in the string.

Since the variable, phrase, only has one instance of 'hot, it finds only one instance.

Within the for loop, we then create a variable, indexlocation, which is set equal to, i.span()

We then print out the variable, indexlocation.

What this variable, indexlocation, contains, is the span of the index locations of the match, which in this case is 'hot'

Since hot is located at a start index of 4 and an end index of 7, the span() function returns, (4,7)

We then create a variable, startindex, which contains the start index of the match with the start() function.

We then create a variable, endindex, which contains the end index of the match with the end() function.

Lastly, we have the variable, wholematch. What this does is it reconstructs the match based on the values obtained from the span() function.

indexlocation[0] is the start index location of the match. Since hot is located at an index of 4, phrase[indexlocation[0]] returns h

indexlocation[1] is the end index location of the match. Since 'hot' terminates at an index location of 6, we specify 7 to get the whole match returned.

Thus, phrase[indexlocation[0]:indexlocation[1]]) returns 'hot'

Again, the function re.finditer() finds all matches of the regular expression that you specify.

So if we change the string to have multiple instances of 'hot, re.finditer() will find all instances.

Below, we change the string to have 2 instances of 'hot'. The results are shown.



So you can see that re.finditer() was able to return multiple instances of the term we were searching for with our regular expression.

And this is show we can find the index location of a match of a regular expression in Python.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...