python - extracting a json response in scrapy -


how use scrapy scrape api uses json format? json looks this:

  "records": [     {       "uri": "https://www.example.com",       "access": {         "update": false       },       "id": 17059,       "vid": 37614,       "name": "mylibery",       "claim": null,       "claimedby": null,       "authoruid": "3",       "lifecycle": "l",       "companytype": "s",       "ugcstate": 10,       "companylogo": {         "filename": "mylibery-logo.png",         "filepath": "sites/default/files/imagecache/company_logo_70/mylibery-logo.png"       } 

i tried code:

import scrapy import json   class apiitem(scrapy.item):     url = scrapy.field()     name = scrapy.field()   class examplespider(scrapy.spider):     name = 'api'     allowed_domains = ["site.com"]     start_urls = [l.strip() l in open('pages.txt').readlines()]      def parse(self, response):         filename = response.url.split("/")[-2]         open(filename, 'wb').write(response.body)         jsonresponse = json.loads(response.body_as_unicode())         item = apiitem()         item["url"] = jsonresponse["uri"]         item["name"] = jsonresponse["name"]         return item 

"pages.txt" list of api pages want scrape , want extract "uri" , "name" , save csv.

but throws error saying:

2017-08-18 13:23:02 [scrapy] error: spider error processing <get https://www.investiere.ch/proxy/api2/v1/companies?extra%5bimagecache%5d=company_logo_70&fields=companytype,lifecycle&page=8&parameters%5binclude_skipped%5d=yes> (referer: none) traceback (most recent call last):   file "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 651, in _runcallbacks     current.result = callback(current.result, *args, **kw)   file "/home/habenn/projects/inapi/inapi/spiders/example.py", line 22, in parse     item["url"] = jsonresponse["uri"] keyerror: 'uri' 

from example given, should this:

item["url"] = jsonresponse["records"][0]["uri"] item["name"] = jsonresponse["records"][0]["name"] 

edit:

to uris , names response, use this:

def parse(self, response):     ...     record in jsonresponse["records"]:         item = apiitem()         item["url"] = record["uri"]         item["name"] = record["name"]         yield item 

note particularly replacing return yield.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -