Split a dataset in training and test in one line in scala spark -


that's not important point, know, know if can save 2 lines of code.

i have dataset inputdata want split in 2 parts. i'm using method randomsplit of dataset class. however, forced use 3 lines of code doing this:

    val sets = inputdata.randomsplit(array[double](0.7, 0.3), 18)     val training = sets(0)     val test = sets(1) 

ideally, like

    val (training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18) 

but code not compile due error:

error:(146, 13) constructor cannot instantiated expected type; found   : (t1, t2) required: array[org.apache.spark.sql.dataset[org.apache.spark.sql.row]] 

is possible achieve want?

pattern match array:

val array(training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18) 

or longer (but still single expression)

val (training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18) match {    case array(training, test) => (training, test)    } 

please remember cannot validated compiler , can fail on runtime matcherror.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -