How to Use Named Groups with Regular Expressions in Python



Python


In this article, we show how to use named groups with regular expressions in Python.

Groups are used in Python in order to reference regular expression matches.

By default, groups, without names, are referenced according to numerical order starting with 1 .

Let's say we have a regular expression that has 3 subexpressions.

A user enters in his birthdate, according to the month, day, and year.

Let's say the user must first enter the month, then the day, and then the year.

Using the group() function in Python, without named groups, the first match (the month) would be referenced using the statement, group(1). The second match (the day) would be referenced using the statement, group(2). The third match (the year) would be referenced using the statment, group(3).

Now, with named groups, we can name each match in the regular expression. So instead of referencing matches of the regular expression with numbers (group(1), group(2), etc.), we can reference matches with names, such as group('month'), group('day'), group('year').

Named groups makes the code more organized and more readable.

By seeing, group(1), you don't really know what this represents.

But if you see, group('month') or group('year'), you know it's referencing the month or the year.

So named groups makes code more readable and more understandable rather than the default numerical referencing.

So let's go over some code and see an actual real-world example of named groups in Python.

This is shown in the code below.



So let's now go over this code.

re is the module in Python that allows us to use regular expressions. So we first have to import re in our code, in order to use regular expressions.

After this, we have a variable, string1, which is set equal to a date, June 15, 1987.

We then have a variable, regex, which is set equal to, r"^(?P\w+)\s(?P\d+)\,?\s(?P\d+)"

Let's break this regular expression down now.

So when we want to create a named group, the expression to do so is, (?Pcontent), where the name of the named group is namedgroup and it content is where you see content.

In our regular expression, the first named group is the month and this consists of 1 or more alphabetical characters.

A space then ensues.

The second named group is day. This consists of 1 or more digits.

This is followed by an optional character and a space.

The third named group is year. This consists of 1 or more digits.

We then look up matches with the statement, matches= re.search(regex, string1)

The matches get stored in the variable, matches

We then can output the month by the statement, matches.group('month')

We can output the day by the statement, matches.group('day')

We can output the year by the statement, matches.group('year')

The advantage to named groups is that it adds readability and understandability to the code, so that you can easily see what part of a regular expression match is being referenced.

And this is how we can use named groups with regular expressions in Python.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...