Downloading a file from the internet with python -
i'm trying retrieve csv data website through link.
when downloaded manually synop.201708.csv.gz
in fact csv wrongly named .gz, weights 2233kb
when running code :
import urllib file_date = '201708' file_url = "https://donneespubliques.meteofrance.fr/donnees_libres/txt/synop/archive/synop.{}.csv.gz".format(file_date) output_file_name = "{}.csv.gz".format(file_date) print "downloading {} {}".format(file_url, output_file_name) urllib.urlretrieve (file_url, output_file_name)
i'm getting corrupted ~361kb file
any ideas why?
what seems happening météofrance site misusing content-encoding
header. website reports serving gzip file (content-type: application/x-gzip
) , encoding in gzip format transfer (content-encoding: x-gzip
). saying page attachment, should saved under normal name (content-disposition: attachment
)
in vacuum, make sense (to degree; compressing compressed file useless): server serves gzip file , compresses again transport. upon receipt, browser undoes transport compression , saves original gzip file. here, decompresses stream, since wasn't compressed again, doesn't work expected.
Comments
Post a Comment