python 3.x - Replace text tags inside txt file python3 -

February 15, 2015

im trying make proxy scrapper, code:

import bs4 urllib.request import request, urlopen bs4 import beautifulsoup soup import lxml contextlib import redirect_stdout  meh=[]  pathf = '/home/user/tests.txt'  url = request('https://www.path.to/table', headers={'user-agent': 'mozilla/5.0'})  page_html = urlopen(url).read()  page_soup = soup(page_html, features="xml")  final = page_soup.tbody  meh.append(final)  open(pathf, 'w') f:     redirect_stdout(f):         print(meh[0].text.strip())

now want text show in more readable way, because this:

12.183.20.3615893usunited statessocks5anonymousyes11 seconds ago220.133.97.7445657twtaiwansocks5anonymousyes11 seconds ago

how can turn text more readable file? like:

12.183.20.36 15893 united states socks5 anonymous yes 11 seconds ago (new line) ...

here actual output without '.text.strip()' format after jsbeautifier trip if helps https://ghostbin.com/paste/g56qe

you can extract td elements list instead of extracting complete table body:

final_list = page_soup.findall('td')

and list of text nodes:

list_of_text_nodes = [td.text.strip() td in final_list]

output:

[u'182.235.38.81', u'40748', u'tw', u'taiwan', u'socks5', u'anonymous'...]

or text nodes single string:

complete_text = " ".join([i.text.strip() in final_list])

output:

'182.235.38.81 40748 tw taiwan socks5 anonymous yes 14 seconds ago ...'

Search This Blog

How Y

python 3.x - Replace text tags inside txt file python3 -

Comments

Post a Comment

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

reflection - How to access the object-members of an object declaration in kotlin -

php - Doctrine Query Builder Error on Join: [Syntax Error] line 0, col 87: Error: Expected Literal, got 'JOIN' -