HTML Scraper


I'm starting to understand python and how it works but I have some ghosts from my DOS days that I cant get out of my head.

For example, in DOS if you saw 'myfile.bat' it would be a batch file, identified by the .bat and you can move (copy con *.bat c:\mydir) delete (del *.bat) etc so Im having a very hard time understanding that in python, something like 'requests.get' isnt a .get file but the first thing interacting with the other via the '.' The dot being a possesive.

I get that this is a slightly controversial program and scraping stuff thats not yours (code, prose ect) is not ideal but this little program helped me to see things a bit clearer and im sure 'Aunite' doesnt mind, hey it takes nearly 200£ a year from me and vomits out reality show garbage so I think we are even.


The Python Code


            
            
import requests #imports the library 'requests' for handling http
from bs4 import BeautifulSoup #imports scraper BS4
response = requests.get ("https://bbc.co.uk", timeout = 5)
#print (response.status_code)#testing to see the connection
#use get with requests from the website
soup = BeautifulSoup (response.text, 'html.parser')#heavy lifting creates soup variable gathers the html from text and uses parser to arrange it all
headlines = soup.find_all ("h3")#finds the h3 tags
for headline in headlines:
    print (headline.text)#prints (duh!)
    #link = headline.find ("a")#this bit adds the links
    #print (link['href']) #this bit prints the links
#print (response.status_code) #if it prints 200 its working
#print (response.text) #prints ALL the scraped headlines!