[chatgpt] chatgpt에 web 스크래핑 코드 생성요청 하기

IT/AI

[chatgpt] chatgpt에 web 스크래핑 코드 생성요청 하기

고잉킴 2023. 10. 19. 16:29

chatgpt에 python 코드를 짜달라고 하는 것은 아주 간단한 일입니다.

물론 얼마나 정확하게 요구사항을 전달하느냐는 좀 다른 문제긴하죠.

앞서 python 으로 네이버 증권에 특정 주식 일별 시세를 받아오는 코드를 짜봤는데요.

https://gniogolb.tistory.com/13

[python] 네이버 증권 web crawling 해보기

오늘도 잡주에 물려있는 고잉킴입니다. 테마주 크리 타고 잠깐 올랐던 주식에 물려서 빼지도 못하고 외면하고 있습니다. 헛헛한 마음에 이미 많은 블로그에서 공유되었지만 일별 시세를 web crawl

gniogolb.tistory.com

이걸 chatgpt는 어떻게 짰는지 한번 공유해드리겠습니다.

참고 : https://oxylabs.io/blog/chatgpt-web-scraping

How to Use ChatGPT for Web Scraping in 2023

ChatGPT web scraping is quickly gaining momentum. Let’s see how you can use ChatGPT to write code for web scraping, along with some tips and tricks.

oxylabs.io

chatgpt에 web 스크래핑 코드 생성요청 하기

1. openai.com 에서 Try ChatGPT 클릭 후 요청

제가 요청한건 아래와 같습니다.

"web scraper 코드를 짜줘 target 은 네이버 증권, 요청 이유는 타겟 페이지에 데이터를 전부 스크랩 하고 싶다"

요구사항이 정확해야 하는게 네이버 증권에 접근할 시엔 필수 header값이 있었죠.

처음에는 이 부분을 추가하지 않아서 동작하기 않았습니다.

그랬더니 생성해준 코드는 아래와 같습니다.

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Base URL and code for the target page
base_url = "https://finance.naver.com/item/sise_day.naver?code=111111"
page_num = 1  # Initial page number
data_list = []  # List to store scraped data

# Headers with User-Agent information
headers = {'User-Agent': 'Mozilla/5.0'}

while True:
    # Construct the URL for the current page
    url = f"{base_url}&page={page_num}"
    
    # Send a GET request to the URL with headers
    response = requests.get(url, headers=headers)
    
    # Parse the HTML content of the page
    soup = BeautifulSoup(response.content, "html.parser")
    
    # Find the table containing the desired data
    table = soup.find("table", {"class": "type2"})
    
    # Check if the table exists on the current page
    if table:
        # Extract data from the table rows
        rows = table.find_all("tr")[1:]  # Exclude the header row
        
        for row in rows:
         columns = row.find_all("td")  
         if len(columns) >= 2:  # Ensure there are at least 2 columns (Date and Closing Price)
            date = columns[0].text.strip()
            closing_price = columns[1].text.strip()
        
            # Add the data to the list
            data_list.append([date, closing_price])
    else:
        # If the table is not found, exit the loop
        break
    
    # Move to the next page
    page_num += 1

# Create a DataFrame from the scraped data
df = pd.DataFrame(data_list, columns=["Date", "Closing Price"])

# Print the scraped data or perform further processing
print(df)
# Optionally, you can save the data to a CSV file
# df.to_csv("scraped_data.csv", index=False)

아주 잘 동작하고 제가 짰던 코드에서는 마지막 페이지 숫자를 받아와서 그걸로 for문을 돌리는 방식이었는데요.

chatgpt는 while문을 활용하였네요.

2. 정리

항상 chatgpt를 사용할 때마다 할루시네이션(주어진 데이터나 맥락에 근거하지 않은 잘못된 정보나 허위 정보를 생성하는 것을 의미 - 출처 : https://www.sedaily.com/NewsView/29QQ49U8UC) 이 우려되는데 확실히 코드를 짜는 건 정확도가 높아 보입니다.

물론 이런 코드를 검증하고 확인하는 절차는 필요하겠지만 역시 chatgpt는 대박이네요~!

이번에도 정확하게 요구사항을 전달할 수 있는 프롬프트 엔지니어링 영역의 중요성을 새삼 깨닫게 되네요.

감사합니다.

'IT > AI' 카테고리의 다른 글

[chatgpt] 커스텀 GPT 만들어보기 (0)	2023.11.14
[AI] Vector database chroma 사용해보기 (2)	2023.10.31
[python] embedchain을 사용하여 유튜브 내용 분석하기 (0)	2023.10.20

현재글[chatgpt] chatgpt에 web 스크래핑 코드 생성요청 하기

read_html is deprecated, chatbot, air-gap k3s, flask로웹서버올리기, chatgpt에 코드 생성 요청, 라이즈신인상받아라, embedchain, wsl2 linux, wsl2 iptables, 오프라인에서 k3s 설치, 아파트전세가율, GPTS, streamlit chatbot, streamlit, k3s 설치하기, chroma DB, python flask, embedchain 한글 가능, f-droid, GPT builder,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

시간 많은 휴직러의 IT것저것

[chatgpt] chatgpt에 web 스크래핑 코드 생성요청 하기