How to Find All Hyperlinks on a Web Page in Python Using BeautifulSoup

In this article, we show how to get all hyperlinks on a webpage in Python using the BeautifulSoup module.

Companies such as google make widespread use of web scrapers such as web crawlers or web spiders to search the web for new hyperlinks in order to index the page.

BeautifulSoup makes it very easy to obtain hyperlinks, or anchor tags, on a web page.

In this page, we'll write code that goes to a certain web page and finds and prints out the hyperlinks on a page. In doing so, it ignores all other HTML elements such as paragraph tags, header tags, tables, etc. It simply prints out all of the hyperlinks on the page.

So the code below shows how to find and print out all hyperlinks on this website's home page, http://www.learningaboutelectronics.com

import requests from bs4 import BeautifulSoup getpage= requests.get('http://www.learningaboutelectronics.com') getpage_soup= BeautifulSoup(getpage.text, 'html.parser') all_links= getpage_soup.findAll('a') for link in all_links: print (link)

So let's go over this code now.

First we must import the requests modle.

We then import BeautifulSoup from bs4.

We create a variable called getpage and set it equal to requests.get('http://www.learningaboutelectronics.com')

We are getting all of the hyperlinks, anchor tags, on the learningaboutelectronics.com home page.

We then the variable, getpage_soup, which holds the parsed HTML page of the URL we are extracting data from.

We then create a variable, all_links, which we set equal to, getpage_soup.findAll('a')

If you're familiar with HTML, you know that the a tag stands for anchor tag. The anchor tag in HTML is what produces hyperlinks. By finding all a tags, we find all hyperlinks on the page.

We then create a for loop to loop through all of the links contained in the getpage_soup file.

We print out each link.

Running this code on the learningaboutelectronics.com home page gives us the following output. <a href="http://www.learningaboutelectronics.com">Learning about Electronics</a> <a href="http://www.learningaboutelectronics.com">Home</a> <a href="http://www.learningaboutelectronics.com/Articles">Articles</a> <a href="http://www.learningaboutelectronics.com/Projects">Projects</a> <a href="http://www.learningaboutelectronics.com/Programming">Programming <font color="red" size="3">coding</font></a> <a href="http://www.learningaboutelectronics.com/Calculators">Calculators</a> <a href="http://www.learningaboutelectronics.com/Contact">Contact</a> <a href="http://www.learningaboutelectronics.com/Articles">Articles</a> <a href="http://www.learningaboutelectronics.com/Articles/How-to-connect-an-adjustable-voltage-regulator">How to Connect an Adjustable Voltage Regulator</a> <a href="http://www.learningaboutelectronics.com/Articles/What-is-a-LM7805-voltage-regulator">What is a LM7805 Voltage Regulator?</a> <a href="http://www.learningaboutelectronics.com/Articles/How-to-test-a-capacitor">How to Test a Capacitor</a> <a href="http://www.learningaboutelectronics.com/Articles/High-pass-filter.php">High Pass Filter</a> <a href="http://www.learningaboutelectronics.com/Articles/Low-pass-filter.php">Low Pass Filter</a> <a href="http://www.learningaboutelectronics.com">Home</a> <a href="http://www.learningaboutelectronics.com/Articles">Articles</a> <a href="http://www.learningaboutelectronics.com/Projects">Projects</a> <a href="http://www.learningaboutelectronics.com/Programming">Programming</a> <a href="http://www.learningaboutelectronics.com/Calculators">Calculators</a> <a href="http://www.learningaboutelectronics.com/Contact">Contact</a>

So, we get all links on the page.

So BeautifulSoup provides great functionality in scraping web pages for various information. It can scrape data from any type of HTML tag. To find all instances of a certain HTML element, you use the findAll() function, just as we've done in this code.

And this is how all hyperlinks on a web page can be found in Python using BeautifulSoup.

Related Resources

How to Randomly Select From or Shuffle a List in Python

HTML Comment Box is loading comments...