python - Delete lines in a dataframe based on lookup in another dataframe -
i work 2 dataframes. want remove lines in first dataframe based on match in one.
in df1 have 2 columns (called type1 & type2) + flag. want delete lines flag = true & type1 & type2 match combination in df2.
import pandas pd import numpy np df1 = pd.dataframe(np.random.randint(0,10,size=(100, 2)),columns = ["type1","type2"]) df1["flag"] = np.random.randint(0,10,size=(100))>6 df1.head() type1 type2 flag 0 8 5 false 1 1 6 false 2 9 2 false 3 0 9 true 4 2 9 false df2 = pd.dataframe(np.random.randint(0,10,size=(100, 2)),columns = ["type1","type2"]) df2.head() type1 type2 0 0 9 1 7 8 2 5 1 3 3 3 4 3 2
for example here line in df1 index=3 should deleted flag=true , (0,9) exists in df2.
use merge
1 df , filter boolean indexing
- need values in df1
(left_only
) , false
in flag
, rows both
true
deleted.
#on parameter omitted if matched column same in both df df3 = pd.merge(df1, df2, how='left', indicator=true) #if multiple matched columns #df3 = pd.merge(df1, df2, how='left', indicator=true, on = ['type1','type2']) print (df3) type1 type2 flag _merge 0 8 5 false left_only 1 1 6 false left_only 2 9 2 false left_only 3 0 9 true both 4 2 9 false left_only df3 = df3.loc[(df3['_merge'] == 'left_only') & (~df3['flag']), ['type1','type2']] print (df3) type1 type2 0 8 5 1 1 6 2 9 2 4 2 9
also possible create mask , filter df1
(if many columns):
m = (df3['_merge'] == 'left_only') & (~df3['flag']) df1 = df1[m] print (df1) type1 type2 flag 0 8 5 false 1 1 6 false 2 9 2 false 4 2 9 false
Comments
Post a Comment