The following link will extract all URL’s for a given web page.
#!/usr/bin/env python3 # Python Version: 3.4.2 # bs4 version: 4.3.2-2 from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://gnu.org") # Insert your URL to extract bsObj = BeautifulSoup(html.read()); for link in bsObj.find_all('a'): print(link.get('href'))
Save the above script into a file eg. extract-url.py
and make it executable:
$ chmod +x extract-url.py
Run the script:
$ ./extract-url.py