Extract all URLs using Beautiful Soup and Python3

The following link will extract all URL’s for a given web page.

#!/usr/bin/env python3

# Python Version:  3.4.2
# bs4 version: 4.3.2-2

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://gnu.org") # Insert your URL to extract
bsObj = BeautifulSoup(html.read());

for link in bsObj.find_all('a'):
    print(link.get('href'))

Save the above script into a file eg. extract-url.py and make it executable:

$ chmod +x extract-url.py

Run the script:

$ ./extract-url.py


Comments and Discussions
Linux Forum