scala - Spark 2.2.0 - unable to read recursively into directory structure -


problem summary: unable read nested subdirectories using spark program, despite setting required hadoop configuration (see attempted). error pasted below.

any appreciated.

version: spark 2.2.0

input directory layout:

/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet /user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet 

...

input directory parameter passed:

/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/*/*

attempted (1):

set parameter in code...

val sparksession: sparksession =sparksession.builder().master("yarn").getorcreate()  //recursive glob support & loglevel import sparksession.implicits._sparksession.sparkcontext.hadoopconfiguration.setboolean("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", true) 

did not see configuration in place in spark ui.

attempted (2):

passed config cli - spark-submit, , set in code (see below).

spark-submit --conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true \... 

i see configuration in spark ui, same error – cannot traverse directory structure..

code:

//spark session val sparksession: sparksession=sparksession.builder().master("yarn").getorcreate()  //recursive glob support val conf= new sparkconf() val clirecursiveglobconf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive") import sparksession.implicits._  sparksession.sparkcontext.hadoopconfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", clirecursiveglobconf) 

error & overall output:

full error @ - https://gist.github.com/airawat/77fbdb821410a5a87dfd29ffaf60fdf9

17/08/18 15:59:29 info state.statestorecoordinatorref: registered  statestorecoordinator endpoint exception in thread "main" java.io.filenotfoundexception: file /user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=*/* not exist. 


Comments

Popular posts from this blog

What is happening when Matlab is starting a "parallel pool"? -

angular - DownloadURL return null in below code -

php - Cannot override Laravel Spark authentication with own implementation -