apache spark - Unbagging a dataset in pyspark -


i have dataset looks this.

(34521658, 0001-01-01, 2500-01-01, 2 , a, y, 15, p, a, 4776, 4776, 4776, {(11, p, a, 4776,4766, 4776), (12, p, a, 4776,4766, 4776), (13, p, a, 4776,4766, 4776)}) 

and want un-bag make

(34521658, 0001-01-01, 2500-01-01, 2 , a, y, 15, p, a, 4776, 4776, 4776, 11, p, a, 4776,4766, 4776)     (34521658, 0001-01-01, 2500-01-01, 2 , a, y, 15, p, a, 4776, 4776, 4776, 12, p, a, 4776,4766, 4776) (34521658, 0001-01-01, 2500-01-01, 2 , a, y, 15, p, a, 4776, 4776, 4776, 13, p, a, 4776,4766, 4776) 

.

how in pyspark?

as suggested in comment flatmap or explode can used. here how can using explode sql function (explode, name says, expand array or map column more rows) keep meaningful columns sake of simplifying approach. assuming first column id , columns want explode named bag, here how initial dataset

+--------+--------------------+ |      id|                 bag| +--------+--------------------+ |34521658|[[11,p,a,4776,476...| +--------+--------------------+ 

the schema dataset :

scala> df.printschema root  |-- id: integer (nullable = true)  |-- bag: array (nullable = true)  |    |-- element: struct (containsnull = true)  |    |    |-- _1: integer (nullable = true)  |    |    |-- _2: string (nullable = true)  |    |    |-- _3: string (nullable = true)  |    |    |-- _4: integer (nullable = true)  |    |    |-- _5: integer (nullable = true)  |    |    |-- _6: integer (nullable = true) 

note bag column array of elements. on colum can apply explode function this:

df.withcolumn("bag", explode($"bag")) 

the resulting dataset/dataframe is:

+--------+--------------------+ |      id|                 bag| +--------+--------------------+ |34521658|[11,p,a,4776,4766...| |34521658|[12,p,a,4776,4766...| |34521658|[13,p,a,4776,4766...| +--------+--------------------+ 

hope helps


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -