python - ParserError with panda read_csv -
i'm trying read txt file different number of columns per row. here's beginning of file:
60381 6 1 0.270 0.30 0.30 0.70 0.70 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 2 0.078 0.30 0.30 0.70 0.70 5.387 5.312 5.338 4.463 4.675 4.275 4.238 3.562 3.175 3.925 4.950 4.762 6 0.241 0.30 0.60 0.70 0.40 3.700 3.200 2.738 2.325 1.250 0.975 1.175 1.950 2.488 3.613 3.987 3.950 7 0.357 0.30 0.60 0.70 0.40 1.212 1.125 1.050 0.950 0.663 0.488 0.425 0.512 0.637 0.900 1.112 1.188 8 0.031 0.30 0.70 0.70 0.30 0.225 0.213 0.200 0.175 0.200 0.213 0.375 0.887 0.975 0.512 0.262 0.262 10 0.022 0.30 0.80 0.70 0.20 0.712 0.700 0.738 0.550 0.513 0.688 0.613 0.600 0.850 0.812 0.800 0.775 60382 5 6 0.197 0.30 0.60 0.70 0.40 3.700 3.200 2.738 2.325 1.250 0.975 1.175 1.950 2.488 3.613 3.987 3.950 7 0.413 0.30 0.60 0.70 0.40 1.212 1.125 1.050 0.950 0.663 0.488 0.425 0.512 0.637 0.900 1.112 1.188 8 0.016 0.30 0.70 0.70 0.30 0.225 0.213 0.200 0.175 0.200 0.213 0.375 0.887 0.975 0.512 0.262 0.262 10 0.111 0.30 0.80 0.70 0.20 0.712 0.700 0.738 0.550 0.513 0.688 0.613 0.600 0.850 0.812 0.800 0.775 11 0.263 0.30 0.50 0.70 0.50 1.812 1.388 1.087 0.825 0.538 0.400 0.338 0.400 0.500 0.925 0.962 1.100 i've tried using pandas read_csv read it:
import pandas pd data = pd.read_csv('./myfile.txt',header=none,sep='\s') which gives following error:
parsererror: expected 6 fields in line 3, saw 12. error possibly due quotes being ignored when multi-char delimiter used. so file doesn't have multi-char delimiter or quotation marks. i've tried solution found in forum, suggested using:
data = pd.read_csv(open('./myfile.txt','r'), header=none,encoding='utf-8', engine='c') although solves error above, array i'm presented not use space delimiter of columns, , output has 1 column:
how should read file in order column each value? don't mind if there nan values fill rest.
if you've managed data in single column, can use series.str.split() workaround issue.
here example sample data provided (you can use string or regex delimiter in split()) :
df[0].str.split(' ', expand=true) 0 1 2 3 4 5 6 7 8 9 \ 0 0.270 0.30 0.30 0.70 0.70 none none none none none 1 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 if this, might create dataframe pd.dataframe(open(...).readlines()) or that, since don't benefit @ read_csv(), , file isn't standard csv file.
# f stringio of sample data simulate file df = pd.dataframe(line.strip().split(' ') line in f) 0 1 2 3 4 5 6 7 8 9 \ 0 60381 6 none none none none none none none none 1 1 0.270 0.30 0.30 0.70 0.70 none none none none 2 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 4.988 3 2 0.078 0.30 0.30 0.70 0.70 none none none none 4 5.387 5.312 5.338 4.463 4.675 4.275 4.238 3.562 3.175 3.925 of course, can fix input file making sure every line contains same number of columns, solve parsererror issue.

Comments
Post a Comment