facing issue with "wget" in python -
i novice python. facing issue "wget" " urllib.urlretrieve(str(myurl),tail)"
when run script it's downloading files filename ending "?"
my complete code :
import os import wget import urllib import subprocess open('/var/log/na/na.access.log') infile, open('/tmp/reddy_log.txt', 'w') outfile: results = set() line in infile: if ' 200 ' in line: tokens = line.split() results.add(tokens[6]) # 7th token result in sorted(results): print >>outfile, result open ('/tmp/reddy_log.txt') infile: results = set() line in infile: head, tail = os.path.split(line) print tail myurl = "http://data.xyz.com" + str(line) print myurl wget.download(str(myurl)) # urllib.urlretrieve(str(myurl),tail)
output :
# python last.py 0011400026_recap.xml http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml latest_1.xml http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml currenttime.js
listing files :
# ls 0011400026_recap.xml? currenttime.js? latest_1.xml? today.xml?
a possible explanation of behaviour experience not sanitize input line
with open ('/tmp/reddy_log.txt') infile: ... line in infile: ... myurl = "http://data.xyz.com" + str(line) wget.download(str(myurl))
when iterate on file object, (for line in infile:
) string terminated newline ('\n'
) character — if not remove newline before using line
, oh well, newline character still there in produced use of line
…
as illustration of concept, have @ transcript of test i've done
08:28 $ cat > a_file b c 08:29 $ cat > test.py data = open('a_file') line in data: new_file = open(line, 'w') new_file.close() 08:31 $ ls a_file test.py 08:31 $ python test.py 08:31 $ ls a? a_file b? c? test.py 08:31 $ ls -b a\n a_file b\n c\n test.py 08:31 $
as can see, read lines file , create files using line
filename , guess what, filenames listed ls
have ?
@ end — can better, it's explained in fine manual page of ls
-b, --escape print c-style escapes nongraphic characters
and, can see in output of ls -b
, filenames not terminated question mark (it's placeholder used default ls
program) terminated newline character.
while i'm @ it, have should avoid use temporary file store intermediate results of computation.
a nice feature of python presence of generator expressions, if want can write code follows
import wget # matched on '200' on whole line, assume # want match specific column, 'error_column' # symbolically load external resource my_constants import error_column, payload_column # here sequence of generator expressions, each 1 relying # on previous 1 # 1. lines in file, stripped white space # on right (the newline considered white space) # === not strictly necessary, convenient because # === below want test non-empty lines lines = (line.rstrip() line in open('whatever.csv')) # 2. lines converted list of 'tokens' all_tokens = (line.split() line in lines if line) # 3. each 'tokens' in 'all_tokens' generator expression, # check code '200' , possibly generate new target targets = (tokens[payload_column] tokens in all_tokens if tokens[error_column]=='200') # eventually, use 'targets' generator proceed downloads target in targets: wget.download(target)
don't fooled amount of comments, w/o comments code just
import wget my_constants import error_column lines = (line.rstrip() line in open('whatever.csv')) all_tokens = (line.split() line in lines if line) targets = (tokens[payload_column] tokens in all_tokens if tokens[error_column]=='200') target in targets: wget.download(target)
Comments
Post a Comment