scala - Spark-Running Batch Job with 15 minutes interval -

i using scala, tried spark streaming, if chance streaming job crashed more 15 minutes, generate data loss.

so want know, how manually keep checkpoints in batch job?

the directories of input data looks following

data --> 20170818 --> (timestamp) --> (many .json files)

the data uploaded every 5 minutes.

thanks!

you may use readstream feature in structured streaming monitor directory , pick new files. spark automatically handles checkpointing , tracking you.

val ds = spark.readstream   .format("text")   .option("maxfilespertrigger", 1)   .load(logdirectory)

How Y