java - What is the correct way to know where a file is located on HDFS cluster? -
i need develop own job executor (it not homework) leveraging on datanode locality.
i have cluster (2 datanodes) of hadoop 2.7.1 .
(see http://jugsi.blogspot.it/2017/08/configuring-hadoop-271-on-windows-w-ssh.html)
my code:
public static void main(string[] args) throws exception { configuration conf = new configuration(); //conf.set("fs.default.name", "hdfs://localhost:9000"); system.out.println(filesystem.get(conf)); check(conf, "hdfs://localhost:19000/license.txt"); check(conf, "hdfs://localhost:19000/test.txt"); check(conf, "hdfs://localhost:19000/doesnexist.txt"); } public static void check(configuration conf, string path) throws exception { try { uri uri = uri.create (path); system.out.println(path); filesystem file = filesystem.get(uri, conf); path p = new path(uri); system.out.println(file); blocklocation[] locations = file.getfileblocklocations(p, 0, 128*1024*1024*1024); (blocklocation blocklocation : locations) { system.out.println(blocklocation); } fsdatainputstream in = file.open(p); byte[] buffer = new byte[50]; (int i=0; i<10; i++) { //truncate file int rsz = in.read(buffer, 0, buffer.length); if (rsz < 0) break; system.out.print(new string(buffer, 0, rsz)); } system.out.println("\n"); } catch (exception e) { // todo auto-generated catch block e.printstacktrace(); } }
replication 1 and
on master:
hadoop fs -put license.txt /
on slave:
hadoop fs -put test.txt /
and works,
hdfs://localhost:19000/license.txt dfs[dfsclient[clientname=dfsclient_nonmapreduce_1132409281_1, ugi=212442540 (auth:simple)]] 0,15429,master
and
hdfs://localhost:19000/test.txt dfs[dfsclient[clientname=dfsclient_nonmapreduce_1132409281_1, ugi=212442540 (auth:simple)]] 0,4,slave
but seems me workaround. how spark or yarn ask file location in order put job close (into) datanode?
Comments
Post a Comment