python 3.x - Replace text tags inside txt file python3 -
im trying make proxy scrapper, code:
import bs4 urllib.request import request, urlopen bs4 import beautifulsoup soup import lxml contextlib import redirect_stdout meh=[] pathf = '/home/user/tests.txt' url = request('https://www.path.to/table', headers={'user-agent': 'mozilla/5.0'}) page_html = urlopen(url).read() page_soup = soup(page_html, features="xml") final = page_soup.tbody meh.append(final) open(pathf, 'w') f: redirect_stdout(f): print(meh[0].text.strip())
now want text show in more readable way, because this:
12.183.20.3615893usunited statessocks5anonymousyes11 seconds ago220.133.97.7445657twtaiwansocks5anonymousyes11 seconds ago
how can turn text more readable file? like:
12.183.20.36 15893 united states socks5 anonymous yes 11 seconds ago (new line) ...
here actual output without '.text.strip()' format after jsbeautifier trip if helps https://ghostbin.com/paste/g56qe
you can extract td
elements list instead of extracting complete table body:
final_list = page_soup.findall('td')
and list of text nodes:
list_of_text_nodes = [td.text.strip() td in final_list]
output:
[u'182.235.38.81', u'40748', u'tw', u'taiwan', u'socks5', u'anonymous'...]
or text nodes single string:
complete_text = " ".join([i.text.strip() in final_list])
output:
'182.235.38.81 40748 tw taiwan socks5 anonymous yes 14 seconds ago ...'
Comments
Post a Comment