amazon web services - Nutch crawler not scaling for large urls -


i trying set nutch crawler on amazon emr cluster 2 master nodes, scalable. seed url list 10000 urls, crawler gets stuck on fetch phase in map-reduce job @ around 90 percent. ran fine 5000 urls. there configuration might missing?

go mapreduce ui , check logs fetch phase. contain clue went wrong.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

reflection - How to access the object-members of an object declaration in kotlin -

php - Doctrine Query Builder Error on Join: [Syntax Error] line 0, col 87: Error: Expected Literal, got 'JOIN' -