How to Get the Contents of a Web Page in Python Using the Requests and BeautifulSoup Modules

In this article, we show how to get the contents of a web page in Python using the requests and beauitfulsoup modules.

So let's go through the steps necessary to get the contents of a web page in Python.

So we can get the contents of a web page just by using the requests module.

We go do this with the following code below.

import requests getpage= requests.get('http://www.learningaboutelectronics.com') getpage.text

We have gotten the contents of the web page using just the requests module (no BeautifulSoup).

However, if you run the code, it comes out all garbled up and very unstructured.

This is where BeautifulSoup comes in, because BeautifulSoup can make it a lot more presentable and more readable, also BeautifulSoup can be used to parse the data so that we can extract the data we want.

So let's just start with how to prettify the text, so that it can be more structured and readable.

Below is the code to do so.

>>> import requests >>> from bs4 import BeautifulSoup >>> getpage= requests.get('http://www.learningaboutelectronics.com/Articles/') >>> getpage_soup= BeautifulSoup(getpage.text, 'html.parser') >>> print(getpage_soup.prettify()) ï»¿ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title> Articles Page of Learning about Electronics </title> <meta content="Articles, Learning about Electronics, equipment, components, software, ICs" name="keywords"/> <meta content="The Articles Page of Learning about Electronics" name="description"/> <meta content="width=device-width, initial-scale=1" name="viewport"/> <link href="default66.css" media="screen" rel="stylesheet" type="text/css"/> </head> <body>  <div id="header"> <div id="logo">  <h2> <a href="http://www.learningaboutelectronics.com"> Learning about Electronics </a> </h2>  </div>   </div>   ï»¿ <script src="jquery.js" type="text/javascript"> </script> <script src="nav.js" type="text/javascript"> </script> <div id="spancolumns"> <br/> <hr/> <span class="menu-trigger"> <img src="/images/mobile.png"/> </span> <largetext> <form action="search.php" id="thisinline" method="POST"> <input name="search_entered" size="15" type="text"/> <input name="submit" type="submit" value="Search"/> </form> </largetext> <br/> <hr/> </div> <div class="nav-menu"> <ul> <li> <a href="http://www.learningaboutelectronics.com"> Home </a> </li> <li class="current_page_item"> <a href="http://www.learningaboutelectronics.com/Articles"> Articles </a> </li> <li> <a href="http://www.learningaboutelectronics.com/Projects"> Projects </a> </li> <li> <a href="http://www.learningaboutelectronics.com/Programming"> Programming <font color="red" size="3"> coding </font> </a> </li> <li> <a href="http://www.learningaboutelectronics.com/Calculators"> Calculators </a> </li> <li> <a href="http://www.learningaboutelectronics.com/Contact"> Contact </a> </li> </ul> </div>   <div id="page">  <div id="rightads"> <script type="text/javascript">  </script> <script src="http://pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript"> </script> </div> <div class="entry"> <br/> <br/> <p id="para1"> <b> Electronic Components </b> <br> <a href="http://www.learningaboutelectronics.com/Resistors/"> Resistors </a> <br/> <a href="http://www.learningaboutelectronics.com/Potentiometers/"> Potentiometers </a> <br/> <a href="http://www.learningaboutelectronics.com/Wires-and-cables/"> Wires and Cables </a> <br/> <a href="http://www.learningaboutelectronics.com/Batteries/"> Batteries </a> <br/> <a href="http://www.learningaboutelectronics.com/Switches/"> Switches </a> <br/> <a href="http://www.learningaboutelectronics.com/Capacitors"> Capacitors </a> <br/> <a href="http://www.learningaboutelectronics.com/Inductors/"> Inductors </a> <br/> <a href="http://www.learningaboutelectronics.com/Diodes/"> Diodes </a> <br/> <a href="http://www.learningaboutelectronics.com/LEDs/"> LEDs </a> <br/> <a href="http://www.learningaboutelectronics.com/LEDDisplays/"> LED Displays </a> <br/> <a href="http://www.learningaboutelectronics.com/Relays"> Relays </a> <br/> <a href="http://www.learningaboutelectronics.com/Transistors/"> Transistors </a> <br/> <a href="http://www.learningaboutelectronics.com/Thyristors/"> Thyristors </a> <br/> <a href="http://www.learningaboutelectronics.com/Microphones"> Microphones </a> <br/> <a href="http://www.learningaboutelectronics.com/Speakers/"> Speakers </a> <br/> <a href="http://www.learningaboutelectronics.com/HeatSinks/"> Heat Sinks </a> <br/> <a href="http://www.learningaboutelectronics.com/Rectifiers/"> Rectifiers </a> <br/> <a href="http://www.learningaboutelectronics.com/Fuses/"> Fuses </a> <br/> <a href="http://www.learningaboutelectronics.com/LCDs/"> LCDs </a> <br/> <a href="http://www.learningaboutelectronics.com/SolarCells/"> Solar Cells </a> <br/> <a href="http://www.learningaboutelectronics.com/Transformers/"> Transformers </a> <br/> <a href="http://www.learningaboutelectronics.com/Motors/"> Motors </a> <br/> <a href="http://www.learningaboutelectronics.com/Sensors/"> Sensors </a> <br/> <a href="http://www.learningaboutelectronics.com/ICs/"> ICs </a> <br/> <a href="http://www.learningaboutelectronics.com/MicrocontrollerBoards/"> Microcontroller Boards </a> <br/> <br/> <b> Electronic Products </b> <br> <a href="http://www.learningaboutelectronics.com/ElectronicProducts/"> Electronic Products </a> <br/> <a href="http://www.learningaboutelectronics.com/Robotics/"> Robotics </a> <br/> <br/> <b> Electronic Concepts </b> <br/> <a href="http://www.learningaboutelectronics.com/Audio/"> Audio Concepts </a> <br/> <a href="http://www.learningaboutelectronics.com/CircuitConcepts/"> Circuit Concepts </a> <br/> <a href="http://www.learningaboutelectronics.com/ElectricalConcepts/"> Electrical Concepts </a> <br/> <a href="http://www.learningaboutelectronics.com/Math/"> Math Concepts </a> <br/> <a href="http://www.learningaboutelectronics.com/Health/"> Health/Medical Electronics </a> <br/> <br/> <b> Electronic Reference Tools </b> <br/> <a href="http://www.learningaboutelectronics.com/Calculators/"> Calculators </a> <br/> <a href="http://www.learningaboutelectronics.com/Programming/"> Programming </a> <br/> <a href="http://www.learningaboutelectronics.com/Schematics/"> Schematics </a> <br/> <a href="http://www.learningaboutelectronics.com/Datasheets/"> Datasheets </a> <br/> <a href="http://www.learningaboutelectronics.com/Downloads/"> Downloads </a> <br/> <a href="http://www.learningaboutelectronics.com/Videos/"> Videos </a> <br/> <a href="http://www.learningaboutelectronics.com/UserContent/"> User Content </a> <br/> <a href="http://www.learningaboutelectronics.com/OnlineRetailers/"> Online Electronics Retailers </a> <br/> <a href="http://www.learningaboutelectronics.com/Blog/"> Blog </a> <br/> <a href="http://www.learningaboutelectronics.com/Services/"> Services </a> <br/> <a href="http://www.learningaboutelectronics.com/Languages/"> This Website In Other Languages </a> </br> </br> </p> </div> </div> </body> <script>'undefined'=== typeof _trfq || (window._trfq = []);'undefined'=== typeof _trfd && (window._trfd=[]),_trfd.push({'tccl.baseHost':'secureserver.net'},{'ap':'cpbh-mt'},{'server':'p3plmcpnl487010'},{'dcenter':'p3'},{'cp_id':'8437534'},{'cp_cache':''},{'cp_cl':'8'}) // Monitoring performance to make your website faster. If you want to opt-out, please contact web hosting support.</script><script src='https://img1.wsimg.com/traffic-assets/js/tccl.min.js'></script></html>    ï»¿ <div id="footer"> <div class="fcenter"> <a href="http://www.learningaboutelectronics.com"> Home </a> | <a href="http://www.learningaboutelectronics.com/Articles"> Articles </a> | <a href="http://www.learningaboutelectronics.com/Projects"> Projects </a> | <a href="http://www.learningaboutelectronics.com/Programming"> Programming </a> | <a href="http://www.learningaboutelectronics.com/Calculators"> Calculators </a> | <a href="http://www.learningaboutelectronics.com/Contact"> Contact </a> </div> <br>  </br> </div>

So the requests module is able to get the text from a web page and BeautifulSoup is able to structure and prettify the text, making it much more human reable.

html.parser parses HTML text

The prettify() method in BeautifulSoup structures the data in a very human readable way.

So this is how we can get the contents of a web page using the requests module and use BeautifulSoup to structure the data, making it more clean and formatted.

Related Resources

How to Randomly Select From or Shuffle a List in Python

HTML Comment Box is loading comments...