facing issue with "wget" in python -


i novice python. facing issue "wget" " urllib.urlretrieve(str(myurl),tail)"

when run script it's downloading files filename ending "?"

my complete code :

import os import wget import urllib import subprocess open('/var/log/na/na.access.log') infile, open('/tmp/reddy_log.txt', 'w') outfile:     results = set()     line in infile:         if ' 200 ' in line:             tokens = line.split()             results.add(tokens[6]) # 7th token     result in sorted(results):         print >>outfile, result open ('/tmp/reddy_log.txt') infile:      results = set()      line in infile:      head, tail = os.path.split(line)                 print tail                 myurl = "http://data.xyz.com" + str(line)                 print myurl                 wget.download(str(myurl))                 #  urllib.urlretrieve(str(myurl),tail) 

output :

# python last.py 0011400026_recap.xml  http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml  latest_1.xml  http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml  currenttime.js 

listing files :

# ls 0011400026_recap.xml?                   currenttime.js?  latest_1.xml?      today.xml? 

a possible explanation of behaviour experience not sanitize input line

with open ('/tmp/reddy_log.txt') infile:      ...      line in infile:          ...          myurl = "http://data.xyz.com" + str(line)          wget.download(str(myurl)) 

when iterate on file object, (for line in infile:) string terminated newline ('\n') character — if not remove newline before using line, oh well, newline character still there in produced use of line

as illustration of concept, have @ transcript of test i've done

08:28 $ cat > a_file b c 08:29 $ cat > test.py data = open('a_file') line in data:     new_file = open(line, 'w')     new_file.close()  08:31 $ ls a_file  test.py 08:31 $ python test.py 08:31 $ ls a?  a_file  b?  c?  test.py 08:31 $ ls -b a\n  a_file  b\n  c\n  test.py 08:31 $ 

as can see, read lines file , create files using line filename , guess what, filenames listed ls have ? @ end — can better, it's explained in fine manual page of ls

  -b, --escape          print c-style escapes nongraphic characters 

and, can see in output of ls -b, filenames not terminated question mark (it's placeholder used default ls program) terminated newline character.

while i'm @ it, have should avoid use temporary file store intermediate results of computation.

a nice feature of python presence of generator expressions, if want can write code follows

import wget  # matched on '200' on whole line, assume # want match specific column, 'error_column' # symbolically load external resource my_constants import error_column, payload_column  # here sequence of generator expressions, each 1 relying # on previous 1  # 1. lines in file, stripped white space #    on right (the newline considered white space) #    === not strictly necessary, convenient because #    === below want test non-empty lines lines = (line.rstrip() line in open('whatever.csv'))  # 2. lines converted list of 'tokens'  all_tokens = (line.split() line in lines if line)  # 3. each 'tokens' in 'all_tokens' generator expression, #    check code '200' , possibly generate new target targets = (tokens[payload_column] tokens in all_tokens if tokens[error_column]=='200')  # eventually, use 'targets' generator proceed downloads target in targets: wget.download(target) 

don't fooled amount of comments, w/o comments code just

import wget my_constants import error_column  lines = (line.rstrip() line in open('whatever.csv')) all_tokens = (line.split() line in lines if line) targets = (tokens[payload_column] tokens in all_tokens if tokens[error_column]=='200')  target in targets: wget.download(target) 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -