python - ParserError with panda read_csv -


i'm trying read txt file different number of columns per row. here's beginning of file:

60381 6 1 0.270 0.30 0.30 0.70 0.70 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 2 0.078 0.30 0.30 0.70 0.70 5.387 5.312 5.338 4.463 4.675 4.275 4.238 3.562 3.175 3.925 4.950 4.762 6 0.241 0.30 0.60 0.70 0.40 3.700 3.200 2.738 2.325 1.250 0.975 1.175 1.950 2.488 3.613 3.987 3.950 7 0.357 0.30 0.60 0.70 0.40 1.212 1.125 1.050 0.950 0.663 0.488 0.425 0.512 0.637 0.900 1.112 1.188 8 0.031 0.30 0.70 0.70 0.30 0.225 0.213 0.200 0.175 0.200 0.213 0.375 0.887 0.975 0.512 0.262 0.262 10 0.022 0.30 0.80 0.70 0.20 0.712 0.700 0.738 0.550 0.513 0.688 0.613 0.600 0.850 0.812 0.800 0.775 60382 5 6 0.197 0.30 0.60 0.70 0.40 3.700 3.200 2.738 2.325 1.250 0.975 1.175 1.950 2.488 3.613 3.987 3.950 7 0.413 0.30 0.60 0.70 0.40 1.212 1.125 1.050 0.950 0.663 0.488 0.425 0.512 0.637 0.900 1.112 1.188 8 0.016 0.30 0.70 0.70 0.30 0.225 0.213 0.200 0.175 0.200 0.213 0.375 0.887 0.975 0.512 0.262 0.262 10 0.111 0.30 0.80 0.70 0.20 0.712 0.700 0.738 0.550 0.513 0.688 0.613 0.600 0.850 0.812 0.800 0.775 11 0.263 0.30 0.50 0.70 0.50 1.812 1.388 1.087 0.825 0.538 0.400 0.338 0.400 0.500 0.925 0.962 1.100 

i've tried using pandas read_csv read it:

import pandas pd data = pd.read_csv('./myfile.txt',header=none,sep='\s') 

which gives following error:

parsererror: expected 6 fields in line 3, saw 12. error possibly due quotes being ignored when multi-char delimiter used. 

so file doesn't have multi-char delimiter or quotation marks. i've tried solution found in forum, suggested using:

data = pd.read_csv(open('./myfile.txt','r'), header=none,encoding='utf-8', engine='c') 

although solves error above, array i'm presented not use space delimiter of columns, , output has 1 column:

data output

how should read file in order column each value? don't mind if there nan values fill rest.

if you've managed data in single column, can use series.str.split() workaround issue.

here example sample data provided (you can use string or regex delimiter in split()) :

df[0].str.split(' ', expand=true)       0      1      2      3      4      5      6      7      8      9   \ 0  0.270   0.30   0.30   0.70   0.70   none   none   none   none   none 1  4.988  4.988  4.988  4.988  4.988  4.988  4.988  4.988  4.988  4.988 

if this, might create dataframe pd.dataframe(open(...).readlines()) or that, since don't benefit @ read_csv(), , file isn't standard csv file.

# f stringio of sample data simulate file df = pd.dataframe(line.strip().split(' ') line in f)         0      1      2      3      4      5      6      7      8      9   \ 0   60381      6   none   none   none   none   none   none   none   none 1       1  0.270   0.30   0.30   0.70   0.70   none   none   none   none 2   4.988  4.988  4.988  4.988  4.988  4.988  4.988  4.988  4.988  4.988 3       2  0.078   0.30   0.30   0.70   0.70   none   none   none   none 4   5.387  5.312  5.338  4.463  4.675  4.275  4.238  3.562  3.175  3.925 

of course, can fix input file making sure every line contains same number of columns, solve parsererror issue.


Comments

Popular posts from this blog

What is happening when Matlab is starting a "parallel pool"? -

angular - DownloadURL return null in below code -

php - Cannot override Laravel Spark authentication with own implementation -