scala - How to use s3 with Apache spark 2.2 in the Spark shell -
i'm trying load data amazon aws s3 bucket, while in spark shell.
i have consulted following resources:
parsing files amazon s3 apache spark
how access s3a:// files apache spark?
i have downloaded , unzipped apache spark 2.2.0. in conf/spark-defaults have following (note replaced access-key , secret-key):
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.s3afilesystem spark.hadoop.fs.s3a.access.key=access-key  spark.hadoop.fs.s3a.secret.key=secret-key i have downloaded hadoop-aws-2.8.1.jar , aws-java-sdk-1.11.179.jar mvnrepository, , placed them in jars/ directory. start spark shell:
bin/spark-shell --jars jars/hadoop-aws-2.8.1.jar,jars/aws-java-sdk-1.11.179.jar in shell, here how try load data s3 bucket:
val p = spark.read.textfile("s3a://sparkcookbook/person") and here error results:
java.lang.noclassdeffounderror: org/apache/hadoop/fs/globalstoragestatistics$storagestatisticsprovider   @ java.lang.class.forname0(native method)   @ java.lang.class.forname(class.java:348)   @ org.apache.hadoop.conf.configuration.getclassbynameornull(configuration.java:2134)   @ org.apache.hadoop.conf.configuration.getclassbyname(configuration.java:2099)   @ org.apache.hadoop.conf.configuration.getclass(configuration.java:2193)   @ org.apache.hadoop.fs.filesystem.getfilesystemclass(filesystem.java:2654)   @ org.apache.hadoop.fs.filesystem.createfilesystem(filesystem.java:2667)   @ org.apache.hadoop.fs.filesystem.access$200(filesystem.java:94)   @ org.apache.hadoop.fs.filesystem$cache.getinternal(filesystem.java:2703)   @ org.apache.hadoop.fs.filesystem$cache.get(filesystem.java:2685)   @ org.apache.hadoop.fs.filesystem.get(filesystem.java:373)   @ org.apache.hadoop.fs.path.getfilesystem(path.java:295) when instead try start spark shell follows:
bin/spark-shell --packages org.apache.hadoop:hadoop-aws:2.8.1 then 2 errors: 1 when interperter starts, , when try load data. here first:
:: problems summary :: :::: errors     unknown resolver null      unknown resolver null      unknown resolver null      unknown resolver null      unknown resolver null      unknown resolver null   :: use verbose or debug message level more details and here second:
val p = spark.read.textfile("s3a://sparkcookbook/person") java.lang.illegalaccesserror: tried access method org.apache.hadoop.metrics2.lib.mutablecounterlong.<init>(lorg/apache/hadoop/metrics2/metricsinfo;j)v class org.apache.hadoop.fs.s3a.s3ainstrumentation   @ org.apache.hadoop.fs.s3a.s3ainstrumentation.streamcounter(s3ainstrumentation.java:195)   @ org.apache.hadoop.fs.s3a.s3ainstrumentation.streamcounter(s3ainstrumentation.java:216)   @ org.apache.hadoop.fs.s3a.s3ainstrumentation.<init>(s3ainstrumentation.java:139)   @ org.apache.hadoop.fs.s3a.s3afilesystem.initialize(s3afilesystem.java:174)   @ org.apache.hadoop.fs.filesystem.createfilesystem(filesystem.java:2669)   @ org.apache.hadoop.fs.filesystem.access$200(filesystem.java:94)   @ org.apache.hadoop.fs.filesystem$cache.getinternal(filesystem.java:2703)   @ org.apache.hadoop.fs.filesystem$cache.get(filesystem.java:2685)   @ org.apache.hadoop.fs.filesystem.get(filesystem.java:373)   @ org.apache.hadoop.fs.path.getfilesystem(path.java:295)   @ org.apache.spark.sql.execution.datasources.datasource.hasmetadata(datasource.scala:301)   @ org.apache.spark.sql.execution.datasources.datasource.resolverelation(datasource.scala:344)   @ org.apache.spark.sql.dataframereader.load(dataframereader.scala:152)   @ org.apache.spark.sql.dataframereader.text(dataframereader.scala:506)   @ org.apache.spark.sql.dataframereader.textfile(dataframereader.scala:542)   @ org.apache.spark.sql.dataframereader.textfile(dataframereader.scala:515) could suggest how working? thanks.
if using apache spark 2.2.0, should use hadoop-aws-2.7.3.jar , aws-java-sdk-1.7.4.jar.
$ spark-shell --jars jars/hadoop-aws-2.7.3.jar,jars/aws-java-sdk-1.7.4.jar after that, when try load data s3 bucket in shell, able so.
Comments
Post a Comment