15/06/2022  •   6 min read  

How Web Scraping is Used to Extract Adidas Web using Login Features with Python?

how-web-scraping-is-used-to-extract-adidas-web-using-login-features-with-python

This blog will describe the web scraping tool that Was developed for one of the most well-known sports clothing manufacturers, Adidas. When you will execute this application, the first thing it does is log in with the email and password we gave in the input. This software will scrape information from the terms we've typed in. There are two possibilities for this program scraping data just or scraping data with sorting by selection. For the purpose of gathering information, such as the title and price of each item.

All of the packages required for the first step were installed, including selenium and web driver-manager. It's time to set up our selenium once we've installed all of the packages. To put it another way, we define a driver variable. The selenium web driver feature, which helps automate all of the things we'll perform later, is stored in this field, and then we choose the Chrome browser. There is no need to install the web driver software. All we have to do now is invoke the web driver-manager program that were previously installed. This package allows you to call the web driver without having to download it first and then input the path from the web driver. As an example, consider the following code.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
import time


# setup selenium
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.adidas.co.id/account-login')
driver.maximize_window()

After that, you will build a login system. To use this feature, we must input our email and password into the email and password box, but first make sure that the box has been cleared. You will need to wait for 10 seconds after filling the two boxes to avoid being identified as a robot. Then press the login button. You will need to take a three-second break after visiting the profile page. Finally, will enter the parameters into the search box and hit the search button. Again, you will need to take a 15-second break to allow for the loading of all data. In the login function, we have this code.

def login_query(user, password, query):
	    # fill login form
	    username = driver.find_element(by=By.XPATH, value='//*[@id="email"]')
	    username.clear()
	    username.send_keys(user)
	    passw = driver.find_element(by=By.XPATH, value='//*[@id="password"]')
	    passw.clear()
	    passw.send_keys(password)
	    time.sleep(10)
	

	    # click login
	    driver.find_element(by=By.XPATH, value='//*[@id="root"]/main/div/div/div/div[1]/form/div[3]/button').click()
	    time.sleep(3)
	

	    # fill search box and search
	    que = driver.find_element(By.CLASS_NAME, 'SearchField-Input')
	    que.clear()
	    que.send_keys(query, Keys.ENTER)
	    time.sleep(15)

To retrieve all existing data, we'll construct a scraping method. The first step is to loop until the next button on the page isn't available. Then we extract the information we want for this page, notably the title and price. The important thing to remember is that each item's price varies, thus we utilize try-except. We'll use standard pricing data instead of price discount data. When our software clicks the next page, we offer a 15 second delay at the conclusion of the code to ensure that the page loads correctly. This is the code that we wrote for the scrap function.

def scrap():
	    while True:
	        main = driver.find_elements(By.CSS_SELECTOR, 'li.ProductCard')
	        for i in main:
	            detail = i.find_element(By.CLASS_NAME, 'gl-product-card__details-main')
	            title = detail.find_element(By.TAG_NAME, 'span').text
	

	            try:
	                pricediscount = detail.find_element(By.CLASS_NAME, 'gl-price-item--sale').text
	                pricenormal = detail.find_element(By.CLASS_NAME, 'gl-price-item--crossed').text
	                discount = i.find_element(By.CSS_SELECTOR, 'div.gl-badge--urgent').text.replace('-', '')
	                price = f'Discount from {pricenormal} to {pricediscount}. ({discount})'
	            except Exception:
	                price = detail.find_element(By.TAG_NAME, 'div').text
	            if price == '':
	                price = 'This product sold out'
	            print(title, price)
	

	        # click next page
	        try:
	            driver.find_element(By.CSS_SELECTOR, '[aria-label="Halaman berikutnya"]').send_keys(Keys.ENTER)
	        except Exception:
	            break
	

	        # handling not yet item
	        time.sleep(15)

After that, we'll write a scrap sorting method. This function will be given a number as a parameter that will determine if the parameter accepted by this function is 1 or not. If two, we'll sort by popularity, if three and four, we'll sort by name order (A-Z or Z-A), and if five and six, we'll sort by price range (lowest to highest). Then, based on the parameters supplied, we click the sorting button and wait 15 seconds for all items to load. Finally, we must use the scrap method we defined before in order to scrape all of the data. Below written is the code in the scrap_sorting function that you will create.

def scrap_sorting(sort_filter):
	    sortlist = None
	    if sort_filter == 1:
	        sortlist = 'DESC recommended_score'
	    elif sort_filter == 2:
	        sortlist = 'ASC position'
	    elif sort_filter == 3:
	        sortlist = 'ASC name'
	    elif sort_filter == 4:
	        sortlist = 'DESC name'
	    elif sort_filter == 5:
	        sortlist = 'ASC price'
	    elif sort_filter == 6:
	        sortlist = 'DESC price'
	

	    # click the sorting filter
	    sort = driver.find_element(By.XPATH, '//*[@id="root"]/main/section/div/div/div[2]/div/div[1]/div/div/div/div[2]/div/div/div/div/button')
	    chose = driver.find_element(By.CSS_SELECTOR, f'button[value="{sortlist}"]')
	

	    sort.send_keys(Keys.ENTER)
	    time.sleep(3)
	    chose.click()
	

	    # for handling not yet item
	    time.sleep(15)
	

	    scrap()

Finally, we write a run function that will execute the programme we've written. To begin, we establish a running variable that will receive input indicating whether we want to execute the scrap function simply or the sorting and scraping function as well. We create a user and password variable that will receive email and password input for the login function's arguments. Finally, we build a query variable that will accept input terms for web searching. Then, if the value of the running variable is one, we build a branch.

The login function is called, the relevant parameters are entered, and then the scrap function is called. Finally, we closed the browser from the earlier selenium call. If the running variable has a value of two, we must first establish a sorter variable as an argument of the scrap sorting function before calling it. Finally, we closed the browser from the selenium we had previously invoked. Finally, if the value of the running variable does not exist from the two branches, we use the run function to repeat the processes from the beginning. This is the code that we wrote in the run function.

def run():
	    runing = int(input('please choose\n1. scraping only\n2. scraping with sorting\nchoose your number: '))
	    user = input('Input Your Username')
	    password = input('Input Your Password')
	    query = input('Input Your Query')
	

	    if runing == 1:
	        login_query(user=user, password=password, query=query)
	        scrap()
	        driver.close()
	    elif runing == 2:
	        sorter = int(input('sort by:\n1. recomended\n2. popular\n3. name A to Z\n4. name Z to A\n5. price low to high\n6. price high to low\nEnter the number you want: '))
	        login_query(user=user, password=password, query=query)
	        scrap_sorting(sorter)
	        driver.close()
	    else:
	        print('please input correct number')
	        run()
	    print('all item has scrap')

And when we ran our software for the first time, it performed just as we expected. This software will log in using the email address and password we provided. This software will scrape information from the terms we've typed in. There are two possibilities for this program scraping data solely or scraping data with a sorting option. Each item's title and pricing data will be collected.

Get A Quote