Web Scraping using rvest in R -
i have been trying scrap information url in r using rvest package:
url <-'https://eprocure.gov.in/cppp/tendersfullview/id%3dnde4mty4ma%3d%3d/zmvhyzk5nwvimwm1ntdmzgmxywyzn2jkytu1ymq5nzu%3d/mtuwmjk3mtg4nq%3d%3d'
but not able correctly identity xpath after using selector plugin.
the code using fetching first table follows:
detail_data <- read_html(url) detail_data_raw <- html_nodes(detail_data, xpath='//*[@id="edit-t- fullview"]/table[2]/tbody/tr[2]/td/table') detail_data_fine <- html_table(detail_data_raw)
when try above code, detail_data_raw results in {xml_nodeset (0)} , consequently detail_data_fine empty list()
the information interested in scrapping under headers:
organisation details
tender details
critical dates
work details
tender inviting authority details
any or ideas in going wrong , how rectify welcome.
your example url isn't working anyone, if you're looking data particular tender, then:
library(rvest) library(stringi) library(tidyverse) pg <- read_html("https://eprocure.gov.in/mmp/tendersfullview/id%3d2262207") html_nodes(pg, xpath=".//table[@class='viewtablebg']/tr/td[1]") %>% html_text(trim=true) %>% stri_replace_last_regex("\ +:$", "") %>% stri_replace_all_fixed(" ", "_") %>% stri_trans_tolower() -> tenders_cols html_nodes(pg, xpath=".//table[@class='viewtablebg']/tr/td[2]") %>% html_text(trim=true) %>% as.list() %>% set_names(tenders_cols) %>% flatten_df() %>% glimpse() ## observations: 1 ## variables: 15 ## $ organisation_name <chr> "delhi jal board" ## $ organisation_type <chr> "state govt. , ut" ## $ tender_reference_number <chr> "short nit. no.20 (item no.1) ee ... ## $ tender_title <chr> "short nit. no.20 (item no.1)" ## $ product_category <chr> "civil works" ## $ tender_fee <chr> "rs.500" ## $ tender_type <chr> "open/advertised" ## $ epublished_date <chr> "18-aug-2017 05:15 pm" ## $ document_download_start_date <chr> "18-aug-2017 05:15 pm" ## $ bid_submission_start_date <chr> "18-aug-2017 05:15 pm" ## $ work_description <chr> "replacement of settled deep sewe... ## $ pre_qualification <chr> "please refer tender documents." ## $ tender_document <chr> "https://govtprocurement.delhi.go... ## $ name <chr> "executive engineer (north)-ii" ## $ address <chr> "executive engineer (north)-ii\r\...
seems work fine w/o installing python , using selenium.
Comments
Post a Comment