Server/Python

λͺ¨κ°μ½” 파이썬 크둀링 15일차

thals0 2022. 4. 8. 17:10
728x90

#15. 동적 크둀링 β‘€

πŸ“Œ μ‹ κ·œ νšŒμ› κ²Œμ‹œνŒ

XPath μ‚¬μš© 이유 

1. id, class에 ꡬ애받지 μ•Šκ³  크둀링을 μ§„ν–‰ν•  수 μžˆλ‹€. 

2, html μš”μ†Œλ“€μ˜ νŒ¨ν„΄μ„ νŒŒμ•…ν•˜κΈ° 쉽닀

 

βœ… XPath의 νŒ¨ν„΄

μ΅œμƒλ‹¨ κΈ€μ˜ XPath

/html/body/div[1]/div/div[4]/table/tbody/tr[1]/td[1]/div[3]/div/a

κ·Έ λ°”λ‘œ μ•„λž˜ κ²Œμ‹œκΈ€μ˜ XPath

/html/body/div[1]/div/div[4]/table/tbody/tr[2]/td[1]/div[3]/div/a

μ„Έλ²ˆμ§Έ κ²Œμ‹œκΈ€μ˜ XPath

/html/body/div[1]/div/div[4]/table/tbody/tr[3]/td[1]/div[3]/div/a

 

쀑간에 μ‘΄μž¬ν•˜λŠ” tr νƒœκ·Έμ˜ λ²ˆν˜Έκ°€ ν•˜λ‚˜μ”© μ¦κ°€ν•˜κ³  μžˆλŠ” 것을 확인

-> 이처럼 νŒ¨ν„΄μ„ νŒŒμ•…ν•˜κΈ°κ°€ ꡉμž₯히 쉬움

 

ν•œ νŽ˜μ΄μ§€μ˜ λͺ¨λ“  데이터λ₯Ό λ°›μ•„μ˜€λŠ” μ½”λ“œλ₯Ό κ΅¬ν˜„ν•˜ λ³΄μ•˜λ‹€. 

 

βœ… νŒŒμ΄μ¬ μ½”λ“œ κ΅¬ν˜„

μ˜€λŠ˜μ€ κ²Œμ‹œκΈ€μ˜ XPath νŒ¨ν„΄μ„ μ‚¬μš©ν•΄,

μœ„ κ³Όμ • 사이에 λ’€λ‘œκ°€κΈ°λ₯Ό μ‚½μž…ν•΄ 쀄것이닀. 

 

결과적으둜 λ‘œκ·ΈμΈ - κ²Œμ‹œνŒ 이동 - κ²Œμ‹œκΈ€ μ§„μž… - λ‚΄μš©μΆ”μΆœ - λ’€λ‘œκ°€κΈ° - κ²Œμ‹œκΈ€ μ§„μž… - ...

μˆœμ„œλ‘œ λ°˜λ³΅λœλ‹€. 

 

2️⃣ λ’€λ‘œ κ°€κΈ°

driver.back()

 

3️⃣ XPath νŒ¨ν„΄μ„ μ‚¬μš©ν•΄ 반볡문 κ΅¬ν˜„

 

XPath νŒ¨ν„΄μ„ μ‚¬μš©ν•΄ 

15번째 κ²Œμ‹œκΈ€κΉŒμ§€ λͺ¨λ“  λ‚΄μš©μ„ 받아와 보자 

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import time

chrome_driver = ChromeDriverManager().install()
service = Service(chrome_driver)
driver = webdriver.Chrome(service=service)

login_url = "https://nid.naver.com/nidlogin.login"
driver.get(login_url)

time.sleep(2)

my_id = "ID"
my_pw = "PW"

driver.execute_script("document.getElementsByName('id')[0].value = '" + my_id + "'")
driver.execute_script("document.getElementsByName('pw')[0].value = '" + my_pw + "'")

time.sleep(1)

button = driver.find_element(By.ID, "log.login")
button.click()
time.sleep(1)

comu_url = "https://cafe.naver.com/codeuniv"
driver.get(comu_url)
time.sleep(1)

menu = driver.find_element(By.ID, "menuLink90")
menu.click()
time.sleep(1)

for i in range(1, 16):
    xpath = (
        "/html/body/div[1]/div/div[4]/table/tbody/tr[" + str(i) + "]/td[1]/div[3]/div/a"
    )

    driver.switch_to.frame("cafe_main")
    time.sleep(1)

    writing = driver.find_element(By.XPATH, xpath)
    writing.click()
    time.sleep(1)

    content = driver.find_element(By.CSS_SELECTOR, "div.se-component-content").text
    print(content)

    driver.back()
    time.sleep(1)

driver.close()

 

μ—¬κΈ°μ„œ μ£Όμ˜ν•  점은 

넀이버 μΉ΄νŽ˜λŠ” ν”„λ ˆμž„μœΌλ‘œ κ΅¬μ„±λ˜μ–΄ μžˆμ–΄ 

κΈ°λ³Έ 크둀링 λ°©μ‹μœΌλ‘œλŠ” 검색이 λ˜μ§€ μ•ŠλŠ”λ‹€ 

λ”°λΌμ„œ 반볡문 내뢀에 swith_to.frame ν•¨μˆ˜λ₯Ό λ„£μ–΄, κ²Œμ‹œνŒμœΌλ‘œ λŒμ•„κ°ˆ λ•Œλ§ˆλ‹€

μ…€λ ˆλ‹ˆμ›€μ˜ 탐색 λ²”μœ„λ₯Ό cafe_main으둜 λ³€κ²½ν•΄μ£Όμ–΄μ•Ό ν•œλ‹€.

 

β­μ •λ¦¬ν•˜κΈ°β­

βœ” λ’€λ‘œκ°€κΈ°

βœ” XPath의 νŒ¨ν„΄ νŒŒμ•…

 

728x90