apache spark - Unbagging a dataset in pyspark -

August 15, 2013

i have dataset looks this.

(34521658, 0001-01-01, 2500-01-01, 2 , a, y, 15, p, a, 4776, 4776, 4776, {(11, p, a, 4776,4766, 4776), (12, p, a, 4776,4766, 4776), (13, p, a, 4776,4766, 4776)})

and want un-bag make

(34521658, 0001-01-01, 2500-01-01, 2 , a, y, 15, p, a, 4776, 4776, 4776, 11, p, a, 4776,4766, 4776)     (34521658, 0001-01-01, 2500-01-01, 2 , a, y, 15, p, a, 4776, 4776, 4776, 12, p, a, 4776,4766, 4776) (34521658, 0001-01-01, 2500-01-01, 2 , a, y, 15, p, a, 4776, 4776, 4776, 13, p, a, 4776,4766, 4776)

how in pyspark?

as suggested in comment flatmap or explode can used. here how can using explode sql function (explode, name says, expand array or map column more rows) keep meaningful columns sake of simplifying approach. assuming first column id , columns want explode named bag, here how initial dataset

+--------+--------------------+ |      id|                 bag| +--------+--------------------+ |34521658|[[11,p,a,4776,476...| +--------+--------------------+

the schema dataset :

scala> df.printschema root  |-- id: integer (nullable = true)  |-- bag: array (nullable = true)  |    |-- element: struct (containsnull = true)  |    |    |-- _1: integer (nullable = true)  |    |    |-- _2: string (nullable = true)  |    |    |-- _3: string (nullable = true)  |    |    |-- _4: integer (nullable = true)  |    |    |-- _5: integer (nullable = true)  |    |    |-- _6: integer (nullable = true)

note bag column array of elements. on colum can apply explode function this:

df.withcolumn("bag", explode($"bag"))

the resulting dataset/dataframe is:

+--------+--------------------+ |      id|                 bag| +--------+--------------------+ |34521658|[11,p,a,4776,4766...| |34521658|[12,p,a,4776,4766...| |34521658|[13,p,a,4776,4766...| +--------+--------------------+

hope helps

Search This Blog

How Y

apache spark - Unbagging a dataset in pyspark -

Comments

Post a Comment

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

reflection - How to access the object-members of an object declaration in kotlin -

php - Doctrine Query Builder Error on Join: [Syntax Error] line 0, col 87: Error: Expected Literal, got 'JOIN' -