performance - How do I efficiently find the number of tweets & retweets in a time span using R? (TwitteR package) -
i want find out number of tweets, favourites , retweets (cummulative enough) of uk general election candidates of several parties (>2000 candidates) in 2 months before election. far have tried make loop using twitter's usertimeline, , (in loop, because don't know how save otherwise) saving number of tweets , retweets , favourites.
current list twitter usernames. i'm programming newby, please don't hate:
tweetsy.2017 <- function(x){ 1 = usertimeline(x, n =3200, includerts = true,excludereplies=false) onedf = twlisttodf(one) oneperiod = subset(onedf, created >= as.posixct('2017-04-18 00:00:00') & created <= as.posixct('2017-06-08 23:59:00')) #61 days oneperiod2 = oneperiod[oneperiod$isretweet == false,] ro = nrow(oneperiod) f = sum(oneperiod$favoritecount) re = sum(oneperiod$retweetcount) output = list(ro, f, re) return(output) #sys.sleep(100) } tweets.2017 = lapply(current, tweetsy.2017) my problem is, takes long , gives no intermediate data. also, seems inefficient download tweets number of them. oh, , put sleep there in case reach api limit, seems code slow reach anyway.
does have better idea? have tried mclapply , parlapply haven't managed them running..
wrapped loop, can have intermediate results. works fine now!
for(i in 1:nrow(current)){ print(paste("row number ", , " of ", nrow(twitter_data))) id <- twitter_data[i, 1] print(as.vector(id)) ab[[i]] <- tweetsy.2017(id) print("process sleeps few seconds due twitter api security issues , continue") sys.sleep(9) }
Comments
Post a Comment