apache spark sql - SparkSQL : same query returns different result -
i encountered weird problem. wanted data dataframe , insert permanent hive table , index elasticsearch.query simple select * result*
, loop through each row , insert es. , simple insert <hive_table> select * result
, got different result. check created 3 different temprorary table this
spark.sql("select * qtycontribution").join(getrevenuecontribution(spark,table2), "item").join(finaluniqueitem(spark), "item").registertemptable("hola"); spark.sql("select * qtycontribution").join(getrevenuecontribution(spark,table2), "item").join(finaluniqueitem(spark), "item").registertemptable("hola1"); spark.sql("select * qtycontribution").join(getrevenuecontribution(spark,table2), "item").join(finaluniqueitem(spark), "item").registertemptable("hola2");
each query same tables different. ,
dataset<row> dframe1 = spark.sql("select * hola"); row[] row1 = (row[]) dframe1.collect(); int q=1; for(row s : row1){ system.out.println(s.get(0)+" =======df1======= "+ q++); } dataset<row> dframe2 = spark.sql("select * hola1"); row[] row2 = (row[]) dframe2.collect(); int w=1; for(row s : row2){ system.out.println(s.get(0)+" =======df2======= "+ w++); } dataset<row> dframe3 = spark.sql("select * hola2"); row[] row3 = (row[]) dframe3.collect(); int e=1; for(row s : row2){ system.out.println(s.get(0)+" =======df3======= "+ e++); }
and result this
bm8942 =======df1======= 1723 bm8942 =======df2======= 1733 bm8942 =======df3======= 1733
and es did
dataset<row> dframe = spark.sql("select * hola1"); row[] row = (row[]) dframe.collect(); int = 1; (row r : row) { bulkrequest.add(client.prepareindex("twitter1234", "use1", string.valueof(i)) .setsource(jsonbuilder() .startobject() .field("item", r.get(0)) .field("qty_contrib", r.get(1)) .field("division", r.get(2)) .field("rev_contrib", r.get(3)) .field("bp", r.get(4)) .endobject() ) ); system.out.println(i++ +" ==== "+r.get(0)); }
and got
1534 ==== bm8942
what's happening ?
Comments
Post a Comment