python - Scrapy Crawling Speed is so Slow (6 pages / min)! -


i new scrapy , build project using scrapy startproject zhanlang. when start spider scrapy crawl zhanlang -o zhanlang.csv ,it works slowly!only 6pags/min! here code:

def after_login(self, response):     #the site should log in,this function todo after login     yield request(url="https://movie.douban.com/subject/26363254/comments?start=0&limit=20&sort=new_score&status=p",                      meta={'cookiejar': response.meta['cookiejar']},                      callback=self.parse                    )     def parse(self,response):     item = zhanlangitem()     comment in response.xpath('//div[@class="comment-item"]'):         item['name'] = comment.xpath('./div[@class="avatar"]/a/@title').extract_first()         item['text'] = comment.xpath('./div[@class="comment"]/p/text()').extract()         item['vote'] = comment.xpath('.//span[@class="votes"]/text()').extract_first()         yield item     next_page_url = response.xpath('//a[@class="next"]/@href').extract()[0]     next_page_url = "https://movie.douban.com/subject/26363254/comments"+next_page_url     if next_page_url not none:         print next_page_url         yield request(url=next_page_url,                     meta={'cookiejar': response.meta['cookiejar']},                      callback=self.parse                        ) 

here settings:

download_delay = 0.5 # download delay setting honor 1 of: concurrent_requests_per_domain = 16 #concurrent_requests_per_ip = 16  downloader_middlewares = {       'scrapy.downloadermiddlewares.useragent.useragentmiddleware':none,       'zhanlang.middlewares.randomuseragentmiddleware':400,   } 

my middlewares.py are:

from fake_useragent import useragent import requests, random, json  import base64  class randomuseragentmiddleware(object): # random choice useragent     def __init__(self, crawler):         super(randomuseragentmiddleware, self).__init__()         self.ua = useragent()         self.ua_type = crawler.settings.get('random_ua_type', 'random')     @classmethod     def from_crawler(cls, crawler):         return cls(crawler)     def process_request(self, request, spider):         def get_ua():             return getattr(self.ua, self.ua_type)         request.headers.setdefault('user-agent', get_ua()) 

why crawls slowly? should increase speed?thanks


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -