amazon web services - Nutch crawler not scaling for large urls -
i trying set nutch crawler on amazon emr cluster 2 master nodes, scalable. seed url list 10000 urls, crawler gets stuck on fetch phase in map-reduce job @ around 90 percent. ran fine 5000 urls. there configuration might missing?
go mapreduce ui , check logs fetch phase. contain clue went wrong.
Comments
Post a Comment