html - Python Selenium Scraping Crashes, Can I Find Elements For Part of The Web Page? -


i trying scrape data website. website has 'load more products' button. i'm using:

driver.find_element_by_xpath("""//*[@id="showmoreresult"]""").click() 

to hit button , loops set number of iterations.

the problem i'm running once number of iterations have been completed, want extract text webpage using:

posts = driver.find_elements_by_class_name("hotproductdetails") 

however, seems crash chrome, , can no data out. i'd do, populate posts new products have loaded after each iteration.

after 'load more' has been clicked, want grab text 50 products have loaded, append list , continue.

i can run line posts = driver.find_elements_by_class_name("hotproductdetails") within each iteration, grabs every element on page every time, , slows down process.

is there anyway of achieving in selenium or limited using library?

this full script:

import csv import time selenium import webdriver import pandas pd  def cexscrape():     print('loading chrome...')     chromepath = r"c:\users\leonk\documents\python scripts\chromedriver.exe"     driver = webdriver.chrome(chromepath)      driver.get(url)      print('prepping webpage...')         time.sleep(2)         driver.execute_script("window.scrollto(0, document.body.scrollheight);")      y = 0     breakclause = exceptcheck = false         while y < 1000 , breakclause == false:         y += 1         time.sleep(0.5)         try:             driver.find_element_by_xpath("""//*[@id="showmoreresult"]""").click()             exceptcheck = false             print('load count', y, '...')         except:              if exceptcheck: breakclause = true             else: exceptcheck = true             print('load count', y, '...lag...')             time.sleep(2)             continue      print('grabbing elements...')     posts = driver.find_elements_by_class_name("hotproductdetails")     cats = driver.find_elements_by_class_name("supercatlink")      print('generating lists...')     catlist = []     postlist = []         cat in cats: catlist.append(cat.text)     print('categories complete...')     post in posts: postlist.append(post.text)     print('products complete...')         return postlist, catlist  prods, cats = cexscrape()  print('extracting lists...')  cat = [] subcat = [] prodname = [] sellprice = [] buycash = [] buyvoucher = []  c in cats:      cat.append(c.split('/')[0])     subcat.append(c.split('/')[1])  p in prods:     prodname.append(p.split('\n')[0])     sellprice.append(p.split('\n')[2])     if 'webuy' in p:         buycash.append(p.split('\n')[4])         buyvoucher.append(p.split('\n')[6])     else:         buycash.append('nan')         buyvoucher.append('nan')      print('generating dataframe...')  df = pd.dataframe(         {'category' : cat,          'sub category' : subcat,          'product name' : prodname,          'sell price' : sellprice,          'cash buy price' : buycash,          'voucher buy price' : buyvoucher})  print('writing csv...')  df.to_csv('data.csv', sep=',', encoding='utf-8')  print('completed!') 

use xpath , limit products get. if 50 products each time use below

"(//div[@class='hotproductdetails'])[position() > {} , position() <= {}])".format ((page -1 ) * 50, page * 50) 

this give 50 products every time , increase page # next lot. doing in 1 go anyways crash it


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -