scala - Spark 2.2.0 - unable to read recursively into directory structure -
problem summary: unable read nested subdirectories using spark program, despite setting required hadoop configuration (see attempted). error pasted below.
any appreciated.
version: spark 2.2.0
input directory layout:
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet /user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet ...
input directory parameter passed:
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/*/*
attempted (1):
set parameter in code...
val sparksession: sparksession =sparksession.builder().master("yarn").getorcreate() //recursive glob support & loglevel import sparksession.implicits._sparksession.sparkcontext.hadoopconfiguration.setboolean("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", true) did not see configuration in place in spark ui.
attempted (2):
passed config cli - spark-submit, , set in code (see below).
spark-submit --conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true \... i see configuration in spark ui, same error – cannot traverse directory structure..
code:
//spark session val sparksession: sparksession=sparksession.builder().master("yarn").getorcreate() //recursive glob support val conf= new sparkconf() val clirecursiveglobconf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive") import sparksession.implicits._ sparksession.sparkcontext.hadoopconfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", clirecursiveglobconf) error & overall output:
full error @ - https://gist.github.com/airawat/77fbdb821410a5a87dfd29ffaf60fdf9
17/08/18 15:59:29 info state.statestorecoordinatorref: registered statestorecoordinator endpoint exception in thread "main" java.io.filenotfoundexception: file /user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=*/* not exist.
Comments
Post a Comment