scala - How to use s3 with Apache spark 2.2 in the Spark shell -
i'm trying load data amazon aws s3 bucket, while in spark shell.
i have consulted following resources:
parsing files amazon s3 apache spark
how access s3a:// files apache spark?
i have downloaded , unzipped apache spark 2.2.0. in conf/spark-defaults
have following (note replaced access-key
, secret-key
):
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.s3afilesystem spark.hadoop.fs.s3a.access.key=access-key spark.hadoop.fs.s3a.secret.key=secret-key
i have downloaded hadoop-aws-2.8.1.jar
, aws-java-sdk-1.11.179.jar
mvnrepository, , placed them in jars/
directory. start spark shell:
bin/spark-shell --jars jars/hadoop-aws-2.8.1.jar,jars/aws-java-sdk-1.11.179.jar
in shell, here how try load data s3 bucket:
val p = spark.read.textfile("s3a://sparkcookbook/person")
and here error results:
java.lang.noclassdeffounderror: org/apache/hadoop/fs/globalstoragestatistics$storagestatisticsprovider @ java.lang.class.forname0(native method) @ java.lang.class.forname(class.java:348) @ org.apache.hadoop.conf.configuration.getclassbynameornull(configuration.java:2134) @ org.apache.hadoop.conf.configuration.getclassbyname(configuration.java:2099) @ org.apache.hadoop.conf.configuration.getclass(configuration.java:2193) @ org.apache.hadoop.fs.filesystem.getfilesystemclass(filesystem.java:2654) @ org.apache.hadoop.fs.filesystem.createfilesystem(filesystem.java:2667) @ org.apache.hadoop.fs.filesystem.access$200(filesystem.java:94) @ org.apache.hadoop.fs.filesystem$cache.getinternal(filesystem.java:2703) @ org.apache.hadoop.fs.filesystem$cache.get(filesystem.java:2685) @ org.apache.hadoop.fs.filesystem.get(filesystem.java:373) @ org.apache.hadoop.fs.path.getfilesystem(path.java:295)
when instead try start spark shell follows:
bin/spark-shell --packages org.apache.hadoop:hadoop-aws:2.8.1
then 2 errors: 1 when interperter starts, , when try load data. here first:
:: problems summary :: :::: errors unknown resolver null unknown resolver null unknown resolver null unknown resolver null unknown resolver null unknown resolver null :: use verbose or debug message level more details
and here second:
val p = spark.read.textfile("s3a://sparkcookbook/person") java.lang.illegalaccesserror: tried access method org.apache.hadoop.metrics2.lib.mutablecounterlong.<init>(lorg/apache/hadoop/metrics2/metricsinfo;j)v class org.apache.hadoop.fs.s3a.s3ainstrumentation @ org.apache.hadoop.fs.s3a.s3ainstrumentation.streamcounter(s3ainstrumentation.java:195) @ org.apache.hadoop.fs.s3a.s3ainstrumentation.streamcounter(s3ainstrumentation.java:216) @ org.apache.hadoop.fs.s3a.s3ainstrumentation.<init>(s3ainstrumentation.java:139) @ org.apache.hadoop.fs.s3a.s3afilesystem.initialize(s3afilesystem.java:174) @ org.apache.hadoop.fs.filesystem.createfilesystem(filesystem.java:2669) @ org.apache.hadoop.fs.filesystem.access$200(filesystem.java:94) @ org.apache.hadoop.fs.filesystem$cache.getinternal(filesystem.java:2703) @ org.apache.hadoop.fs.filesystem$cache.get(filesystem.java:2685) @ org.apache.hadoop.fs.filesystem.get(filesystem.java:373) @ org.apache.hadoop.fs.path.getfilesystem(path.java:295) @ org.apache.spark.sql.execution.datasources.datasource.hasmetadata(datasource.scala:301) @ org.apache.spark.sql.execution.datasources.datasource.resolverelation(datasource.scala:344) @ org.apache.spark.sql.dataframereader.load(dataframereader.scala:152) @ org.apache.spark.sql.dataframereader.text(dataframereader.scala:506) @ org.apache.spark.sql.dataframereader.textfile(dataframereader.scala:542) @ org.apache.spark.sql.dataframereader.textfile(dataframereader.scala:515)
could suggest how working? thanks.
if using apache spark 2.2.0, should use hadoop-aws-2.7.3.jar
, aws-java-sdk-1.7.4.jar
.
$ spark-shell --jars jars/hadoop-aws-2.7.3.jar,jars/aws-java-sdk-1.7.4.jar
after that, when try load data s3 bucket in shell, able so.
Comments
Post a Comment