Fixing The Facebook Ad Library (Part I): Scraping Can Save It

What is the ad library?

Double standards

Scraping the ad library

Sounds cool, how do we do it?

How do you scrape pages?
from selenium import webdriversearchDriver = webdriver.Chrome()searchDriver.get(‘')
usernameBox = searchDriver.find_element_by_name(‘email’)usernameBox.send_keys(“your Facebook email”)passwordBox = searchDriver.find_element_by_name(‘pass’)passwordBox.send_keys(“Your Facebook password”)
try:    loginBox = searchDriver.find_element_by_id(‘loginbutton’)except:    loginBox = searchDriver.find_element_by_name(‘login’)
html = searchDriver.page_sourceprint(html)

Two improvements

def scrollDown(driver):    # Get scroll height.
lastHeight = driver.execute_script(“return document.body.scrollHeight”)
while True: # Scroll down to the bottom.
driver.execute_script(“window.scrollTo(0, document.body.scrollHeight);”
# Wait to load the page
# Calculate new scroll height and compare with last scroll height
newHeight = driver.execute_script(“return document.body.scrollHeight”)
# If the browser hasn’t scrolled any more (i.e. it’s reached the end) then stop
if newHeight == lastHeight:

Let’s take stock




