Split a dataset in training and test in one line in scala spark -
that's not important point, know, know if can save 2 lines of code.
i have dataset inputdata want split in 2 parts. i'm using method randomsplit of dataset class. however, forced use 3 lines of code doing this:
val sets = inputdata.randomsplit(array[double](0.7, 0.3), 18) val training = sets(0) val test = sets(1)
ideally, like
val (training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18)
but code not compile due error:
error:(146, 13) constructor cannot instantiated expected type; found : (t1, t2) required: array[org.apache.spark.sql.dataset[org.apache.spark.sql.row]]
is possible achieve want?
pattern match array:
val array(training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18)
or longer (but still single expression)
val (training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18) match { case array(training, test) => (training, test) }
please remember cannot validated compiler , can fail on runtime matcherror
.
Comments
Post a Comment