amazon web services - Nutch crawler not scaling for large urls -


i trying set nutch crawler on amazon emr cluster 2 master nodes, scalable. seed url list 10000 urls, crawler gets stuck on fetch phase in map-reduce job @ around 90 percent. ran fine 5000 urls. there configuration might missing?

go mapreduce ui , check logs fetch phase. contain clue went wrong.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

Python Tornado package error when running server -

Qt QGraphicsScene is not accessable from QGraphicsView (on Qt 5.6.1) -