python - How to get Html OnClick parameter using Scrapy -

March 15, 2012

i want extract nadlanid value link http://www.yad2.co.il/nadlan/sales.php?city=%e1%f0%e9%ee%e9%f0%e4+%e2%$

i used firebug check html code want extract, nadlanid value at: <td onclick="show_ad('2','1','/nadlan/salesdetails.php','nadlanid','1614569','644');"> בית אריה - יאיר שטרן </td>

i use following scrapy code check if scrapy parse above html code:

import scrapy  class quotesspider(scrapy.spider):     name = "quotes"     start_urls = [     'http://www.yad2.co.il/nadlan/sales.php?city=%e1%f0%e9%ee%e9%f0%e4+%e2%$ ]  def parse(self, response):     page = response.url.split("/")[-2]     filename = 'quotes-%s.html' % page     open(filename, 'wb') f:         f.write(response.body)`

but there no nadlanid in response.body.

how can nadlanid value?

in case want retrieve javascript function arguments html onclick attribute.

first find whole onclick text:

text = response.xpath("//td/@onclick").extract_first()

then it's possible use simple regular expression patterns find function arguments:

# capture in between () of show_ad < re.findall("show_ad\((.+?)\)", text)[0].split(',') >["'2'",   "'1'",   "'/nadlan/salesdetails.php'",   "'nadlanid'",   "'1614569'",   "'644'"]

Search This Blog

How Y

python - How to get Html OnClick parameter using Scrapy -

Comments

Post a Comment

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -