Beautiful Soup is a Python package for parsing HTML and XML documents and it resides within a Debian package named
python-bs4
. However,
python-bs4
package is a default package on Debian Linux system for Python 2 version. Therefore, if your intention is to use Python3 as a default environment you will need to also install Python3 and its corresponding version of BS4
python3-bs4
. Let's start by python3 installation:
# apt-get install -y vim python3
After a successful installation of python3 package make sure that
python3 is set as default:
# update-alternatives --install /usr/bin/python python /usr/bin/python3.4 2
update-alternatives: using /usr/bin/python3.4 to provide /usr/bin/python (python) in auto mode
Confirm that python 3 is a default version:
# python --version
Python 3.4.2
All what remains is to install Beautiful Soup parsing HTML and XML package to match python version 3:
# apt-get install python3-bs4
All done. Test Beautiful Soup parsing HTML and XML with the following example script:
#!/usr/bin/env python3
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.gnu.org")
bsObj = BeautifulSoup(html.read());
print (bsObj.title)
Save the above code into a file eg.
scrapetest.py
and make it executable:
$ chmod +x scrapetest.py
Once ready execute
scrapetest.py
script:
$ ./scrapetest.py
<title>The GNU Operating System and the Free Software Movement</title>
Troubleshooting
Traceback (most recent call last):
File "scrapetest.py", line 2, in <module>
from bs4 import BeautifulSoup
ImportError: No module named 'bs4'
Your python and bs4 version does not match or bs4 is not installed. Make sure that bs4 is installed and that it corresponds to you python version.