java - What is the correct way to know where a file is located on HDFS cluster? -


i need develop own job executor (it not homework) leveraging on datanode locality.

i have cluster (2 datanodes) of hadoop 2.7.1 .

enter image description here (see http://jugsi.blogspot.it/2017/08/configuring-hadoop-271-on-windows-w-ssh.html)

my code:

public static void main(string[] args) throws exception {     configuration conf = new configuration();     //conf.set("fs.default.name", "hdfs://localhost:9000");     system.out.println(filesystem.get(conf));      check(conf, "hdfs://localhost:19000/license.txt");     check(conf, "hdfs://localhost:19000/test.txt");     check(conf, "hdfs://localhost:19000/doesnexist.txt");  }  public static void check(configuration conf, string path) throws exception {     try {         uri uri = uri.create (path);         system.out.println(path);         filesystem file = filesystem.get(uri, conf);         path p = new path(uri);         system.out.println(file);          blocklocation[] locations = file.getfileblocklocations(p, 0, 128*1024*1024*1024);         (blocklocation blocklocation : locations) {             system.out.println(blocklocation);         }          fsdatainputstream in = file.open(p);         byte[] buffer = new byte[50];         (int i=0; i<10; i++) { //truncate file             int rsz = in.read(buffer, 0, buffer.length);             if (rsz < 0)                 break;            system.out.print(new string(buffer, 0, rsz));         }          system.out.println("\n");     } catch (exception e) {         // todo auto-generated catch block         e.printstacktrace();     } }  

replication 1 and

on master:

hadoop fs -put license.txt / 

on slave:

hadoop fs -put test.txt / 

and works,

hdfs://localhost:19000/license.txt dfs[dfsclient[clientname=dfsclient_nonmapreduce_1132409281_1, ugi=212442540 (auth:simple)]] 0,15429,master

and

hdfs://localhost:19000/test.txt dfs[dfsclient[clientname=dfsclient_nonmapreduce_1132409281_1, ugi=212442540 (auth:simple)]] 0,4,slave

but seems me workaround. how spark or yarn ask file location in order put job close (into) datanode?


Comments

Popular posts from this blog

What is happening when Matlab is starting a "parallel pool"? -

php - Cannot override Laravel Spark authentication with own implementation -

Qt QGraphicsScene is not accessable from QGraphicsView (on Qt 5.6.1) -