Text file record count using pool class in python -

February 15, 2014

i have program list , read files in directory , counts total number of records present in files concurrently.

when i'm runnning below code list of worker thread names counts coming in chunk counting of records multiple files going parallel.

import multiprocessing mp import time import os path = '/home/vaibhav/desktop/input_python'  def process_line(f):     print(mp.current_process())     #print("process id = " , os.getpid(f))     print(sum(1 line in f))  filename in os.listdir(path):     print(filename)      if __name__ == "__main__":          open('/home/vaibhav/desktop/input_python/'+ filename, "r+") source_file:             # chunk work batches              p = mp.pool()             results = p.map(process_line, source_file)  start_time = time.time() print("my program took", time.time() - start_time, "to run")

current output

<forkprocess(forkpoolworker-54, started daemon)> 73 <forkprocess(forkpoolworker-55, started daemon)> <forkprocess(forkpoolworker-56, started daemon)> <forkprocess(forkpoolworker-53, started daemon)> 73 1 <forkprocess(forkpoolworker-53, started daemon)> 79 <forkprocess(forkpoolworker-54, started daemon)> <forkprocess(forkpoolworker-56, started daemon)> <forkprocess(forkpoolworker-55, started daemon)> 79 77 77

is there way around can total records count of files like

file1.txt total_recordcount ... filen.txt  total_recordcount

update got solution , pasted answer in comments section.

counting lines in text file should not cpu-bound, therefore not candidate threading. might want use thread pool processing multiple independent files, single file, here's way count lines should fast:

import pandas pd data = pd.read_table(source_file, dtype='s1', header=none, usecols=[0]) count = len(data)

what parse first character (s1) dataframe, , check length. parser implemented in c, there no slow python loop required. should provide close best possible speed, limited disk subsystem.

this sidesteps original problem completely, because single count per file.

Search This Blog

How Y

Text file record count using pool class in python -

Comments

Post a Comment

Popular posts from this blog

angular - DownloadURL return null in below code -

meteor - inserting data to database gives error "insert failed: Method '/texts/insert' not found" -

html - unterminated string literal “onclick” event in anchor -