Web Scraping using rvest in R -


i have been trying scrap information url in r using rvest package:

url <-'https://eprocure.gov.in/cppp/tendersfullview/id%3dnde4mty4ma%3d%3d/zmvhyzk5nwvimwm1ntdmzgmxywyzn2jkytu1ymq5nzu%3d/mtuwmjk3mtg4nq%3d%3d' 

but not able correctly identity xpath after using selector plugin.

the code using fetching first table follows:

detail_data <- read_html(url) detail_data_raw <- html_nodes(detail_data, xpath='//*[@id="edit-t- fullview"]/table[2]/tbody/tr[2]/td/table') detail_data_fine <- html_table(detail_data_raw) 

when try above code, detail_data_raw results in {xml_nodeset (0)} , consequently detail_data_fine empty list()

the information interested in scrapping under headers:

organisation details

tender details

critical dates

work details

tender inviting authority details

any or ideas in going wrong , how rectify welcome.

your example url isn't working anyone, if you're looking data particular tender, then:

library(rvest) library(stringi) library(tidyverse)  pg <- read_html("https://eprocure.gov.in/mmp/tendersfullview/id%3d2262207")  html_nodes(pg, xpath=".//table[@class='viewtablebg']/tr/td[1]") %>%    html_text(trim=true) %>%    stri_replace_last_regex("\ +:$", "") %>%    stri_replace_all_fixed(" ", "_") %>%    stri_trans_tolower() -> tenders_cols  html_nodes(pg, xpath=".//table[@class='viewtablebg']/tr/td[2]") %>%    html_text(trim=true) %>%    as.list() %>%    set_names(tenders_cols) %>%    flatten_df() %>%    glimpse() ## observations: 1 ## variables: 15 ## $ organisation_name            <chr> "delhi jal board" ## $ organisation_type            <chr> "state govt. , ut" ## $ tender_reference_number      <chr> "short nit. no.20 (item no.1) ee ... ## $ tender_title                 <chr> "short nit. no.20 (item no.1)" ## $ product_category             <chr> "civil works" ## $ tender_fee                   <chr> "rs.500" ## $ tender_type                  <chr> "open/advertised" ## $ epublished_date              <chr> "18-aug-2017 05:15 pm" ## $ document_download_start_date <chr> "18-aug-2017 05:15 pm" ## $ bid_submission_start_date    <chr> "18-aug-2017 05:15 pm" ## $ work_description             <chr> "replacement of settled deep sewe... ## $ pre_qualification            <chr> "please refer tender documents." ## $ tender_document              <chr> "https://govtprocurement.delhi.go... ## $ name                         <chr> "executive engineer (north)-ii" ## $ address                      <chr> "executive engineer (north)-ii\r\... 

seems work fine w/o installing python , using selenium.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

c# - Asp.net web api : redirect unauthorized requst to forbidden page -