# data-processing-skill > Processes user behavior events and store catalog data to generate behavioral summaries and product/discount embeddings in a vector database. - Author: Macbook Air - Unifynd - Repository: mansiUnifynd/moonshot-skills-only - Version: 20260127164105 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/mansiUnifynd/moonshot-skills-only - Web: https://mule.run/skillshub/@@mansiUnifynd/moonshot-skills-only~data-processing-skill:20260127164105 --- --- name: data-processing-skill description: Processes user behavior events and store catalog data to generate behavioral summaries and product/discount embeddings in a vector database. --- # Data Processing Skill Batch processes behavioral and catalog data to produce structured summaries and vector embeddings used in retrieval and personalization workflows. ## Your Mission Process and embed two types of data: 1. **User Events** → Behavioral Summaries → Vector Embeddings 2. **Store Information** → Product/Discount Embeddings ## Workflow ### Phase 1: Process User Data ```bash # Run the complete user processing pipeline python3 .claude/skills/data-processing-skill/scripts/process_all_users.py ``` This will: 1. Fetch all distinct user_ids from `user_data` table 2. For each user: - Fetch all events - Fetch existing summary (if any) - Generate/update structured behavioral summary - Upsert to `user_summary` table (for monitoring) - Embed summary using `all-MiniLM-L6-v2` - Upsert embedding to `user_summary_embeddings` ### Phase 2: Process Store Data ```bash # Run the complete store processing pipeline python3 /.claude/skills/data-processing-skill/scripts/embed_products.py ``` This will: 1. Fetch all stores from `store_information` table 2. For each store: - Extract individual products from `products_json` - Extract discounts from `discounts_available` - Embed each product (one embedding per product) - Embed each discount (one embedding per discount) - Upsert to `store_product_embeddings` ## Individual Operations You can also run individual operations if needed: ### Fetch Data ```bash # Get all distinct users python3 .claude/skills/data-processing-skill/scripts/fetch_user_events.py # Get events for a specific user python3 .claude/skills/data-processing-skill/scripts/fetch_user_events.py # Get all stores python3 .claude/skills/data-processing-skill/scripts/fetch_store_data.py ``` ### Generate Summary for Single User ```bash python3 .claude/skills/data-processing-skill/scripts/generate_summary.py ``` ### Embed Single User Summary ```bash python3 .claude/skills/data-processing-skill/scripts/embed_user_summary.py ``` ### Process Single Store ```bash python3 .claude/skills/data-processing-skill/scripts/embed_products.py ``` ## Execution Guidelines 1. **Use TodoWrite** to track progress when processing multiple items 2. **Report results** clearly showing: - Number of users/stores processed - Number of embeddings created - Any errors encountered 3. **Handle errors gracefully** - continue processing other items if one fails 4. **Monitor output** - all scripts output JSON with status information ## Expected Output After processing, you should report: ``` User Processing Results: - Total users: X - Successfully processed: Y - Skipped (no events): Z - Errors: N Store Processing Results: - Total stores: X - Products embedded: Y - Discounts embedded: Z - Errors: N ```