Beautiful Soup is a Python package for parsing HTML and XML documents and it resides within a Debian package named python-bs4
. However, python-bs4
package is a default package on Debian Linux system for Python 2 version. Therefore, if your intention is to use Python3 as a default environment you will need to also install Python3 and its corresponding version of BS4 python3-bs4
. Let’s start by python3 installation:
# apt-get install -y vim python3
After a successful installation of python3 package make sure that python3 is set as default:
# update-alternatives --install /usr/bin/python python /usr/bin/python3.4 2 update-alternatives: using /usr/bin/python3.4 to provide /usr/bin/python (python) in auto mode
Confirm that python 3 is a default version:
# python --version Python 3.4.2
All what remains is to install Beautiful Soup parsing HTML and XML package to match python version 3:
# apt-get install python3-bs4
All done. Test Beautiful Soup parsing HTML and XML with the following example script:
#!/usr/bin/env python3 from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.gnu.org") bsObj = BeautifulSoup(html.read()); print (bsObj.title)
Save the above code into a file eg. scrapetest.py
and make it executable:
$ chmod +x scrapetest.py
Once ready execute scrapetest.py
script:
$ ./scrapetest.py <title>The GNU Operating System and the Free Software Movement</title>
Troubleshooting
Traceback (most recent call last): File "scrapetest.py", line 2, in <module> from bs4 import BeautifulSoup ImportError: No module named 'bs4'
Your python and bs4 version does not match or bs4 is not installed. Make sure that bs4 is installed and that it corresponds to you python version.