How to Combine Look-behind and Look-ahead Matching in a Regular Expression in Python



Python


In this article, we show how to combine look-behind and look-ahead matching in a regular expression in Python.

A look-behind match is a match that looks behind, or to the left of, an element.

A look-ahead match is a match that looks ahead, or to the right of, an element.

If you use both look-behind matching and look-ahead matching in a regular expression, then we can return the content in between the look-behind and look-ahead match.

So as an example, we have the string, string1= "< ol> <li>Underwear </li> <li>Socks </li> <li>T-shirts </li> </ol>"

The element is, Python Regular Expressions

The look-behind match of this element is, <li>

The look-ahead match of this element is, </li>

How can we combine look-behind and look-ahead matching in a regular expression to return the content in between the look-behind and look-ahead matching expressions?

We have an actual example shown below in the code.



So let's now go over this code.

re is the module in Python that allows us to use regular expressions. So we first have to import re in our code, in order to use regular expressions.

After this, we have a variable, string1, which is an ordered list composed of different elements of clothing, including underwear, socks, and T-shirts.

We then have a regex variable, which is set equal to, re.compile(r"(?<= <li>).+?(?= </li>)")

(?<= <li>) is a look-behind expression that looks for <li>

.+? looks for any characters. The dot looks for any character. The dot followed by a plus means to look for 1 or more of any character. The question mark (?) makes the expression lazy, meaning it will grab the smallest match instead of the largest.

We then have a variable, matches, which finds all content in between <li></li> tags.

We then output all content in between the <li></li> tags.

And this is how we can combine look-behind and look-ahead matching in a regular expression in Python.

We then have our regex variable, which is set equal to, re.compile(r"(?<=\d\))\w+")

?<= means that this is a look behind regular expression.

This regular expression looks for a digit (\d followed by a parentheses ( (\) ). This whole part is enclosed in parentheses

We then have, \w+, which looks for alphabetical characters in the elements.

We then check for all matches using the re.findall() function.

We then output the found matches.

This produces the element after each numerical number in the string.

Remember that look-behind regular expressions do not return the pattern you are matching (in this case a number followed by a closing parenthesis). It only returns the element after the pattern. The pattern is there just to show where to look for each element.

And this is all that is needed to set up a look-behind regular expression in Python.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...