How to install Python3 Beautiful Soup environment on Debian Linux

Beautiful Soup is a Python package for parsing HTML and XML documents and it resides within a Debian package named python-bs4. However, python-bs4 package is a default package on Debian Linux system for Python 2 version. Therefore, if your intention is to use Python3 as a default environment you will need to also install Python3 and its corresponding version of BS4 python3-bs4. Let’s start by python3 installation:

# apt-get install -y vim python3

After a successful installation of python3 package make sure that python3 is set as default:

# update-alternatives --install /usr/bin/python python /usr/bin/python3.4 2
update-alternatives: using /usr/bin/python3.4 to provide /usr/bin/python (python) in auto mode

Confirm that python 3 is a default version:

# python --version
Python 3.4.2

All what remains is to install Beautiful Soup parsing HTML and XML package to match python version 3:

# apt-get install python3-bs4

All done. Test Beautiful Soup parsing HTML and XML with the following example script:

#!/usr/bin/env python3

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.gnu.org")
bsObj = BeautifulSoup(html.read());

print (bsObj.title)

Save the above code into a file eg. scrapetest.py and make it executable:

$ chmod +x scrapetest.py

Once ready execute scrapetest.py script:

$ ./scrapetest.py 
<title>The GNU Operating System and the Free Software Movement</title>

Troubleshooting

Traceback (most recent call last):
  File "scrapetest.py", line 2, in <module>
    from bs4 import BeautifulSoup
ImportError: No module named 'bs4'

Your python and bs4 version does not match or bs4 is not installed. Make sure that bs4 is installed and that it corresponds to you python version.