python - How to get Html OnClick parameter using Scrapy -
i want extract nadlanid value link http://www.yad2.co.il/nadlan/sales.php?city=%e1%f0%e9%ee%e9%f0%e4+%e2%$
i used firebug check html code want extract, nadlanid value at: <td onclick="show_ad('2','1','/nadlan/salesdetails.php','nadlanid','1614569','644');"> בית אריה - יאיר שטרן </td>
i use following scrapy code check if scrapy parse above html code:
import scrapy class quotesspider(scrapy.spider): name = "quotes" start_urls = [ 'http://www.yad2.co.il/nadlan/sales.php?city=%e1%f0%e9%ee%e9%f0%e4+%e2%$ ] def parse(self, response): page = response.url.split("/")[-2] filename = 'quotes-%s.html' % page open(filename, 'wb') f: f.write(response.body)`
but there no nadlanid in response.body.
how can nadlanid value?
in case want retrieve javascript function arguments html onclick
attribute.
first find whole onclick text:
text = response.xpath("//td/@onclick").extract_first()
then it's possible use simple regular expression patterns find function arguments:
# capture in between () of show_ad < re.findall("show_ad\((.+?)\)", text)[0].split(',') >["'2'", "'1'", "'/nadlan/salesdetails.php'", "'nadlanid'", "'1614569'", "'644'"]
Comments
Post a Comment