python 3.x - Replace text tags inside txt file python3 -


im trying make proxy scrapper, code:

import bs4 urllib.request import request, urlopen bs4 import beautifulsoup soup import lxml contextlib import redirect_stdout  meh=[]  pathf = '/home/user/tests.txt'  url = request('https://www.path.to/table', headers={'user-agent': 'mozilla/5.0'})  page_html = urlopen(url).read()  page_soup = soup(page_html, features="xml")  final = page_soup.tbody  meh.append(final)  open(pathf, 'w') f:     redirect_stdout(f):         print(meh[0].text.strip()) 

now want text show in more readable way, because this:

12.183.20.3615893usunited statessocks5anonymousyes11 seconds ago220.133.97.7445657twtaiwansocks5anonymousyes11 seconds ago

how can turn text more readable file? like:

12.183.20.36 15893 united states socks5 anonymous yes 11 seconds ago (new line) ...

here actual output without '.text.strip()' format after jsbeautifier trip if helps https://ghostbin.com/paste/g56qe

you can extract td elements list instead of extracting complete table body:

final_list = page_soup.findall('td') 

and list of text nodes:

list_of_text_nodes = [td.text.strip() td in final_list] 

output:

[u'182.235.38.81', u'40748', u'tw', u'taiwan', u'socks5', u'anonymous'...] 

or text nodes single string:

complete_text = " ".join([i.text.strip() in final_list]) 

output:

'182.235.38.81 40748 tw taiwan socks5 anonymous yes 14 seconds ago ...' 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -