amazon web services - Nutch crawler not scaling for large urls -


i trying set nutch crawler on amazon emr cluster 2 master nodes, scalable. seed url list 10000 urls, crawler gets stuck on fetch phase in map-reduce job @ around 90 percent. ran fine 5000 urls. there configuration might missing?

go mapreduce ui , check logs fetch phase. contain clue went wrong.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -