# newsletter-events-update-event > Manually refresh specific event pages to update event data - Author: Aniket Panjwani - Repository: aniketpanjwani/local_media_tools - Version: 20251226012256 - Stars: 39 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/aniketpanjwani/local_media_tools - Web: https://mule.run/skillshub/@@aniketpanjwani/local_media_tools~newsletter-events-update-event:20251226012256 --- --- name: newsletter-events-update-event description: Manually refresh specific event pages to update event data --- ## Purpose Re-scrape specific event page URLs to update event data in the database. Use this when event details have changed (date, time, venue, etc.) or to verify event information. ## When to Use - Event details have changed and need updating - You want to verify/refresh event data from the source - A page was scraped but events weren't extracted correctly ## URL Handling - URLs are normalized before lookup (trailing slashes, tracking params stripped) - The normalized URL must match what's in the `scraped_pages` table - If URL not found in database, it will be scraped as new ## Multi-Event Pages Some pages contain multiple events (e.g., weekly schedules). When updating: - ALL events with matching `source_url` are updated - New events on the page are created - Events no longer on the page are NOT deleted (they may have passed) Which event page(s) do you want to refresh? **Examples:** - `https://hvmag.com/events/jazz-night-jan-20` - `https://example.com/event/123 https://example.com/event/456` Provide the URL(s): ## Step 1: Parse URLs Extract all URLs from user input. ```python import re url_pattern = r'https?://[^\s<>"\']+' urls = re.findall(url_pattern, user_input) if not urls: print("ERROR: No valid URLs found in input") # STOP HERE ``` ## Step 2: Normalize and Lookup ```python from scripts.url_utils import normalize_url from schemas.sqlite_storage import SqliteStorage from pathlib import Path db_path = Path.home() / ".config" / "local-media-tools" / "data" / "events.db" storage = SqliteStorage(db_path) # Normalize URLs url_map = {normalize_url(url): url for url in urls} # Check which URLs exist in scraped_pages for normalized_url, original_url in url_map.items(): # Find the source_name for this URL (query scraped_pages) page_record = None with storage._connection() as conn: row = conn.execute( "SELECT source_name, url, scraped_at FROM scraped_pages WHERE url = ?", (normalized_url,), ).fetchone() if row: page_record = dict(row) if page_record: print(f"ℹ Found: {original_url}") print(f" Source: {page_record['source_name']}") print(f" Last scraped: {page_record['scraped_at']}") else: print(f"⚠ Not found in database: {original_url}") print(f" Will be scraped as new URL") ``` ## Step 3: Re-scrape Pages ```python from scripts.scrape_firecrawl import FirecrawlClient, FirecrawlError client = FirecrawlClient() scraped_pages = [] for normalized_url, original_url in url_map.items(): try: page = client.scrape_url(original_url) scraped_pages.append({ "normalized_url": normalized_url, "original_url": original_url, "markdown": page.get("markdown", ""), "title": page.get("title", ""), }) print(f"✓ Scraped: {original_url}") except FirecrawlError as e: print(f"✗ Failed to scrape {original_url}: {e}") ``` ## Step 4: Extract Events (Claude) For each scraped page, analyze the markdown and extract events. **For each page:** 1. Read the markdown content 2. Extract event details (title, date, time, venue, description, price, etc.) 3. Create Event objects with `source_url` set to the original URL ```python from schemas.event import Event, Venue, EventSource event = Event( title=extracted_title, venue=Venue(name=venue_name, address=venue_address), event_date=parsed_date, start_time=parsed_time, source=EventSource.WEB_AGGREGATOR, source_url=page["original_url"], description=description, price=price, ticket_url=ticket_url, confidence=0.9, # Higher confidence for manual refresh needs_review=False, # User explicitly requested this update ) ``` ## Step 5: Update Database For each page, update or create events, then update the scraped_pages record. ```python from schemas.event import EventCollection for page in scraped_pages: events_from_page = events_by_url.get(page["original_url"], []) # 1. Save/update events if events_from_page: collection = EventCollection(events=events_from_page) result = storage.save(collection) print(f" → {len(events_from_page)} events: {result.saved} new, {result.updated} updated") else: print(f" → No events extracted from page") # 2. Update scraped_pages record # Determine source_name (from existing record or ask user) if page_record: source_name = page_record["source_name"] else: # For new URLs, try to infer source from config or use domain from urllib.parse import urlparse source_name = urlparse(page["original_url"]).netloc storage.save_scraped_page( source_name=source_name, url=page["normalized_url"], events_count=len(events_from_page), ) print(f"\n✓ Update complete") ``` ## Step 6: Report Results Display summary of what was updated: | URL | Events Found | New | Updated | |-----|--------------|-----|---------| | hvmag.com/events/jazz | 1 | 0 | 1 | | example.com/schedule | 5 | 2 | 3 | - [ ] All URLs parsed from input - [ ] URLs normalized for consistent lookup - [ ] Pages re-scraped via Firecrawl - [ ] Events extracted from markdown - [ ] Events saved BEFORE URL marked as scraped - [ ] scraped_pages records updated with new timestamp - [ ] Summary shown to user