Split a dataset in training and test in one line in scala spark -

April 15, 2015

that's not important point, know, know if can save 2 lines of code.

i have dataset inputdata want split in 2 parts. i'm using method randomsplit of dataset class. however, forced use 3 lines of code doing this:

    val sets = inputdata.randomsplit(array[double](0.7, 0.3), 18)     val training = sets(0)     val test = sets(1)

ideally, like

    val (training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18)

but code not compile due error:

error:(146, 13) constructor cannot instantiated expected type; found   : (t1, t2) required: array[org.apache.spark.sql.dataset[org.apache.spark.sql.row]]

is possible achieve want?

pattern match array:

val array(training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18)

or longer (but still single expression)

val (training, test) = inputdata.randomsplit(array[double](0.7, 0.3), 18) match {    case array(training, test) => (training, test)    }

please remember cannot validated compiler , can fail on runtime matcherror.

Search This Blog

How Y

Split a dataset in training and test in one line in scala spark -

Comments

Post a Comment

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

reflection - How to access the object-members of an object declaration in kotlin -

python - TypeError('Unrecognized keyword arguments: ' + str(kwargs)) # Validate user data -