Rcrawler::Crawling

Published by onesixx on

Rcrawler로 crawling와 scraping 둘다 가능하다.

https://www.pluralsight.com/guides/web-crawling-in-r
R을 이용한 퀀트 투자 포트폴리오 만들기 – Chapter 4 크롤링 이해하기

특정 웹사이트 크롤링하기

CSS나 Xpath를 사용해서 URLs 찾기

R
full script with comment
R

Xpath

//*[@id=”contentarea_left”]/ul/li[1]/dl/dd[1]/a ==> //*[@id=”contentarea_left”]/ul/li/dl/dd/a

playing it safely with a Local HTTP server

terms of service

terms of service crawling facebook
https://about.fb.com/news/2020/10/taking-legal-action-against-data-scraping/

robots.txt

웹크롤링하려는웹사이트의메인페이지에서사전에‘robots.txt’를
확인해야하며,특히수집한데이터를영업에사용할목적이라면반드시법률
검토를진행하시기바랍니다

https://warm-uk.tistory.com/39 [데이터 크롤링] OPEN API 네이버 검색 데이터 crawling 하기_이해하기

Crawl Gently

Rcrawler( url, no_cores=1 , RequestsDelay=2, MaxDepth = 1)

  • process갯수조절 : no_cores=1
  • 천천히 : RequestsDelay = 2
  • Limit depth : MaxDepth = 1

https://rdrr.io/github/salimk/Rcrawler/f/README.md

view(INDEX)

Categories: quant

onesixx

Blog Owner

guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x