# vlm-embed > Manage VLM (Vision-Language Model) embedding service and jobs. Use this skill to start the Flask embedding service, run embedding jobs, or train the AE3D autoencoder. Invoke with /vlm-embed. - Author: Daniel Sim - Repository: wellcomecollection/wc_simd - Version: 20260109153212 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/wellcomecollection/wc_simd - Web: https://mule.run/skillshub/@@wellcomecollection/wc_simd~vlm-embed:20260109153212 --- --- name: vlm-embed description: Manage VLM (Vision-Language Model) embedding service and jobs. Use this skill to start the Flask embedding service, run embedding jobs, or train the AE3D autoencoder. Invoke with /vlm-embed. --- # VLM Embedding Pipeline This skill manages the VLM embedding pipeline which generates 1536-dimensional embeddings for images using the GME-Qwen2-VL-2B-Instruct model. ## Architecture The pipeline separates model loading from Spark orchestration: 1. **Flask Service** (`vlm_embed_service.py`): Loads model once, serves `/embed` endpoint 2. **Spark Job** (`vlm_embed.py`): Reads from Hive tables, calls service, writes embeddings back to Hive ## Hive Tables - **Input**: `images_without_text_renderings` (default) - **Output**: `images_without_text_renderings_vlm_embeddings` (default) - **Sharded output**: `{output_table}_shard_NNNN` when using `--num-shards` ## Start Embedding Service ```bash # Start the Flask service on port 8081 python -m wc_simd.vlm_embed_service --host 0.0.0.0 --port 8081 # Or with specific GPU CUDA_VISIBLE_DEVICES=0 python -m wc_simd.vlm_embed_service --host 0.0.0.0 --port 8081 ``` The service exposes: - `GET /health` - Health check with model info - `POST /embed` - Embed images via `{urls: [...]}` or `{images_b64: [...]}` ## Run Embedding Job ```bash # Run Spark job to embed all images python -m wc_simd.vlm_embed --endpoint http://127.0.0.1:8081/embed # Custom input/output tables python -m wc_simd.vlm_embed \ --input-table my_images \ --output-table my_images_embedded \ --endpoint http://127.0.0.1:8081/embed ``` ## Sharded Processing (Resumable) For large-scale processing with checkpoint/resume: ```bash # Split into 100 shards, process shard 0 python -m wc_simd.vlm_embed --num-shards 100 --shard-id 0 # Process all shards sequentially (omit --shard-id) python -m wc_simd.vlm_embed --num-shards 100 # Resume: existing shard tables are skipped by default python -m wc_simd.vlm_embed --num-shards 100 --shard-id 5 # Force recompute (drop and recreate) python -m wc_simd.vlm_embed --num-shards 100 --shard-id 5 --no-skip-existing ``` ## Prefetch Mode Download images in Spark workers and send base64 bytes to service (useful when service has limited network access): ```bash python -m wc_simd.vlm_embed \ --prefetch-images \ --prefetch-workers 8 \ --endpoint http://127.0.0.1:8081/embed ``` ## Multi-Instance Parallelism Run multiple Spark jobs on different machines, each processing a subset: ```bash # Machine 1 python -m wc_simd.vlm_embed --instances 4 --instance-no 0 # Machine 2 python -m wc_simd.vlm_embed --instances 4 --instance-no 1 # etc. ``` ## Key Options | Option | Default | Description | |--------|---------|-------------| | `--input-table` | `images_without_text_renderings` | Source Hive table | | `--output-table` | `images_without_text_renderings_vlm_embeddings` | Destination table | | `--endpoint` | `http://127.0.0.1:8081/embed` | Flask service URL | | `--batch-size` | 16 | Images per service call | | `--num-partitions` | 320 | Spark partitions | | `--num-shards` | None | Split into N shard tables | | `--shard-id` | None | Process specific shard | | `--skip-existing` | True | Skip existing output tables | | `--prefetch-images` | False | Download in Spark, send bytes | ## Train AE3D Autoencoder Reduce 1536-dim embeddings to 3D for visualization: ```bash # Export embeddings from Hive/Parquet to NumPy python -m wc_simd.vlm_embed_train_data \ --input-glob "path/to/embeddings/*.parquet" \ --output-npy data/vlm_embed/embeddings.npy \ --output-index data/vlm_embed/index.parquet # Train autoencoder python -m wc_simd.vlm_embed_ae train \ --data data/vlm_embed/embeddings.npy \ --out runs/ae3d # Inference (project to 3D) python -m wc_simd.vlm_embed_ae infer \ --model runs/ae3d/ae.pt \ --data embeddings.npy \ --out projected.npy ``` ## Troubleshooting ### transformers Version Conflict The GME-Qwen2-VL model requires `transformers<4.52.0`. If you see errors like: ``` transformers<4.52.0 is required for normal functioning of this module, but found transformers==4.57.3 ``` Fix by pinning the version: ```bash pip install 'transformers>=4.37.0,<4.52.0' ``` Or in requirements.txt/Dockerfile: ``` transformers>=4.37.0,<4.52.0 ``` ### 401 Auth Errors Some images require authentication (content advisory). These will have `embed_error` set in the output table. ### GPU Memory The model requires significant GPU memory. Use a GPU with at least 16GB VRAM for the 2B model. ### Service Connection If Spark workers can't reach the service: - Local: use `http://127.0.0.1:8081/embed` - Docker workers to host: use `http://172.19.0.1:8081/embed` (Docker gateway IP)