java - What is the correct way to know where a file is located on HDFS cluster? -

i need develop own job executor (it not homework) leveraging on datanode locality.

i have cluster (2 datanodes) of hadoop 2.7.1 .

enter image description here (see

my code:

public static void main(string[] args) throws exception {     configuration conf = new configuration();     //conf.set("", "hdfs://localhost:9000");     system.out.println(filesystem.get(conf));      check(conf, "hdfs://localhost:19000/license.txt");     check(conf, "hdfs://localhost:19000/test.txt");     check(conf, "hdfs://localhost:19000/doesnexist.txt");  }  public static void check(configuration conf, string path) throws exception {     try {         uri uri = uri.create (path);         system.out.println(path);         filesystem file = filesystem.get(uri, conf);         path p = new path(uri);         system.out.println(file);          blocklocation[] locations = file.getfileblocklocations(p, 0, 128*1024*1024*1024);         (blocklocation blocklocation : locations) {             system.out.println(blocklocation);         }          fsdatainputstream in =;         byte[] buffer = new byte[50];         (int i=0; i<10; i++) { //truncate file             int rsz =, 0, buffer.length);             if (rsz < 0)                 break;            system.out.print(new string(buffer, 0, rsz));         }          system.out.println("\n");     } catch (exception e) {         // todo auto-generated catch block         e.printstacktrace();     } }  

replication 1 and

on master:

hadoop fs -put license.txt / 

on slave:

hadoop fs -put test.txt / 

and works,

hdfs://localhost:19000/license.txt dfs[dfsclient[clientname=dfsclient_nonmapreduce_1132409281_1, ugi=212442540 (auth:simple)]] 0,15429,master


hdfs://localhost:19000/test.txt dfs[dfsclient[clientname=dfsclient_nonmapreduce_1132409281_1, ugi=212442540 (auth:simple)]] 0,4,slave

but seems me workaround. how spark or yarn ask file location in order put job close (into) datanode?


Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -