Studi Kasus Web Scraping : Price Tracker Toko Online

Studi Kasus Web Scraping : Price Tracker Toko Online

Studi Kasus: Price Tracker Toko Online

Analisis Kebutuhan

Latar Belakang :
Sebuah usaha retail ingin memantau pergerakan harga produk kompetitor di tiga marketplace (Tokopedia, Shopee, Bukalapak) untuk:

  • Membandingkan harga produk sejenis
  • Mendeteksi diskon/promosi
  • Membuat strategi penetapan harga

Data yang Dibutuhkan :

  • Nama produk
  • Harga normal & harga diskon
  • Rating dan ulasan
  • Stok tersedia
  • Nama toko

Solusi Teknis

Arsitektur Solusi :

mermaid
flowchart TD
    A[Target Website] --> B[Scraper Tokopedia]
    A --> C[Scraper Shopee]
    A --> D[Scraper Bukalapak]
    B --> E[Database]
    C --> E
    D --> E
    E --> F[Analisis Data]
    F --> G[Dashboard]

Teknologi :

  • BeautifulSoup untuk parsing HTML
  • Selenium untuk render JavaScript
  • Proxy Rotator untuk hindari blokir
  • PostgreSQL untuk penyimpanan data
  • Airflow untuk penjadwalan

Implementasi Kode

a. Scraper Tokopedia (BeautifulSoup)

python
import requests
from bs4 import BeautifulSoup
import pandas as pd

def scrape_tokopedia(keyword):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    }
    
    params = {
        'q': keyword,
        'page': 1
    }
    
    products = []
    
    try:
        response = requests.get(
            'https://www.tokopedia.com/search',
            params=params,
            headers=headers
        )
        soup = BeautifulSoup(response.text, 'html.parser')
        
        items = soup.select('[data-testid="master-product-card"]')
        
        for item in items[:5]:  # Ambil 5 produk pertama
            name = item.select_one('[data-testid="linkProductName"]').text
            price = item.select_one('[data-testid="linkProductPrice"]').text
            shop = item.select_one('[data-testid="linkShopName"]').text
            
            products.append({
                'platform': 'Tokopedia',
                'produk': name,
                'harga': price.replace('Rp', '').strip(),
                'toko': shop
            })
            
    except Exception as e:
        print(f"Error: {e}")
    
    return pd.DataFrame(products)

# Contoh penggunaan
df_tokped = scrape_tokopedia('xiaomi redmi note 12')
print(df_tokped.head())

Scraper Shopee (Selenium)

python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

def scrape_shopee(keyword):
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    
    driver = webdriver.Chrome(options=chrome_options)
    driver.get(f"https://shopee.co.id/search?keyword={keyword}")
    
    products = []
    
    try:
        # Scroll halaman untuk render JS
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(3)
        
        items = driver.find_elements(By.CSS_SELECTOR, '.shopee-search-item-result__item')
        
        for item in items[:5]:
            name = item.find_element(By.CSS_SELECTOR, '.Cve6sh').text
            price = item.find_element(By.CSS_SELECTOR, '.ZEgDH9').text
            shop = item.find_element(By.CSS_SELECTOR, '_6HeM6Z').text
            
            products.append({
                'platform': 'Shopee',
                'produk': name,
                'harga': price.replace('Rp', '').replace('.', '').strip(),
                'toko': shop
            })
            
    finally:
        driver.quit()
    
    return pd.DataFrame(products)

# Contoh penggunaan
df_shopee = scrape_shopee('xiaomi redmi note 12')
print(df_shopee.head())

Penyimpanan ke Database

python
import psycopg2
from sqlalchemy import create_engine

def save_to_db(df):
    engine = create_engine('postgresql://user:password@localhost:5432/price_tracker')
    
    try:
        df.to_sql(
            'product_prices',
            engine,
            if_exists='append',
            index=False
        )
        print("Data berhasil disimpan!")
    except Exception as e:
        print(f"Error menyimpan data: {e}")

# Gabungkan data dari semua platform
df_all = pd.concat([df_tokped, df_shopee])
save_to_db(df_all)

Output Program

Contoh Dataframe :

| platform  | produk                     | harga     | toko           |
|-----------|----------------------------|-----------|----------------|
| Tokopedia | Xiaomi Redmi Note 12 6/128 | 2,499,000 | Xiaomi Official|
| Shopee    | Redmi Note 12 Pro 5G       | 3,199,000 | Mi Store       |
| Bukalapak | Redmi Note 12 4/64         | 2,100,000 | Gadget Store   |

Visualisasi Harga :

python
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
df_all['harga'] = df_all['harga'].str.replace('[^0-9]', '', regex=True).astype(int)
df_all = df_all.sort_values('harga')

plt.barh(
    df_all['produk'].str[:20] + '...',  # Potong teks agar muat
    df_all['harga'],
    color=['skyblue' if 'Tokopedia' in x else 'orange' for x in df_all['platform']]
)
plt.title('Perbandingan Harga Produk')
plt.xlabel('Harga (Rp)')
plt.grid(axis='x')
plt.show()

Tantangan dan Solusi

Tantangan Solusi Implementasi
Anti-bot detection Rotasi User-Agent + Proxy
Dynamic content Selenium + explicit wait
Captcha Layanan captcha solving (2Captcha)
Struktur HTML berbeda Multiple parser untuk tiap marketplace
Data tidak konsisten Data cleaning dengan Pandas

Contoh Penanganan Dynamic Content :

python
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Tunggu sampai elemen muncul
element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, ".product-item"))
)

Pengembangan Lebih Lanjut

a. Penjadwalan dengan Airflow

python
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

default_args = {
    'owner': 'data_team',
    'start_date': datetime(2023, 1, 1)
}

dag = DAG(
    'price_monitoring',
    default_args=default_args,
    schedule_interval='@daily'
)

def run_scrapers():
    df1 = scrape_tokopedia('xiaomi')
    df2 = scrape_shopee('xiaomi')
    save_to_db(pd.concat([df1, df2]))

task = PythonOperator(
    task_id='scrape_marketplaces',
    python_callable=run_scrapers,
    dag=dag
)

b. Sistem Alert Telegram

python
import telegram

bot = telegram.Bot(token='YOUR_TOKEN')

def send_alert(product, price, threshold):
    message = f"""
    ⚠️ **PRICE ALERT** ⚠️
    Produk: {product}
    Harga turun: Rp {price:,}
    Di bawah threshold: Rp {threshold:,}
    """
    bot.send_message(chat_id='GROUP_ID', text=message)

# Contoh trigger alert
if current_price < threshold_price:
    send_alert(product_name, current_price, threshold_price)

Kesimpulan

Studi kasus ini menunjukkan :

  1. Implementasi scraper multi-platform
  2. Teknik handling berbagai jenis website
  3. Penyimpanan data terstruktur
  4. Visualisasi perbandingan harga
  5. Arsitektur sistem monitoring harga otomatis

Poin Penting :

  • Gunakan teknik scraping yang bertanggung jawab
  • Implementasikan error handling yang robust
  • Simpan data historis untuk analisis tren
  • Otomatisasi proses untuk efisiensi

Dengan sistem ini, bisnis dapat :

  • Memantau pergerakan harga kompetitor secara real-time
  • Mendeteksi pola diskon/promosi
  • Membuat strategi penetapan harga yang kompetitif

6 Comments

  1. I’ve been following this blog for years and it’s amazing to see how much it has grown and evolved Congratulations on all your success!

  2. This blog is a great resource for anyone looking to live a more mindful and intentional life Thank you for providing valuable advice and tips

Leave a Reply

Your email address will not be published. Required fields are marked *