# embed

> Generate text embeddings using Qwen3 models via HuggingFace TEI. Use this skill to embed texts, configure the embedding service, or batch process documents. Invoke with /embed.

- Author: Daniel Sim
- Repository: wellcomecollection/wc_simd
- Version: 20260109153212
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/wellcomecollection/wc_simd
- Web: https://mule.run/skillshub/@@wellcomecollection/wc_simd~embed:20260109153212

---

---
name: embed
description: Generate text embeddings using Qwen3 models via HuggingFace TEI. Use this skill to embed texts, configure the embedding service, or batch process documents. Invoke with /embed.
---

# Text Embedding

This skill manages text embedding generation using Qwen3 models via HuggingFace Text Embeddings Inference (TEI).

## Architecture

- **Remote Service**: HuggingFace TEI running on GPU instance
- **Client**: HTTP client with exponential backoff and batching
- **Output**: 1536-dimensional vectors
- **Storage**: Hive/Parquet tables or numpy arrays

## Start TEI Service

On a GPU instance:
```bash
# Using Docker
docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id Alibaba-NLP/gte-Qwen2-1.5B-instruct

# Or with specific model
docker run --gpus all -p 8080:80 \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id Qwen/Qwen3-Embedding-0.6B
```

## Python Client Usage

```python
from wc_simd.embed import EmbedServiceClient

# Initialize client
client = EmbedServiceClient(endpoint="http://gpu-host:8080/embed")

# Embed single text
vector = client.embed(["Hello world"])[0]

# Embed batch
vectors = client.embed(["text1", "text2", "text3"])
```

## PySpark Integration

```python
from wc_simd.embed import create_embed_udf

# Create UDF for Spark
embed_udf = create_embed_udf(endpoint="http://172.19.0.1:8080/embed")

# Apply to DataFrame
df_with_embeddings = df.withColumn("embedding", embed_udf("text_column"))
```

## Text Chunking

For long texts, use the chunker before embedding:

```python
from wc_simd.embed import TextChunker

chunker = TextChunker(chunk_size=1000, overlap=200)
chunks = chunker.split(long_text)

# Embed chunks
embeddings = client.embed(chunks)
```

## Elasticsearch Indexing

```python
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

es = Elasticsearch(["http://localhost:9200"])

# Index with dense vector
actions = [
    {
        "_index": "text_embeddings",
        "_source": {
            "text": chunk,
            "embedding": embedding.tolist(),
            "work_id": work_id
        }
    }
    for chunk, embedding in zip(chunks, embeddings)
]

bulk(es, actions)
```

## Configuration

| Parameter | Default | Description |
|-----------|---------|-------------|
| `endpoint` | Required | TEI service URL |
| `batch_size` | 32 | Texts per request |
| `max_retries` | 3 | Retry attempts |
| `timeout` | 30 | Request timeout (seconds) |

## Models

| Model | Dimensions | Notes |
|-------|------------|-------|
| Qwen3-Embedding-0.6B | 1024 | Fast, lightweight |
| gte-Qwen2-1.5B-instruct | 1536 | Higher quality |
| gme-Qwen2-VL | 1536 | Vision-language (use vlm-embed skill) |

## Troubleshooting

### Connection Errors
Ensure TEI service is running and accessible. From Docker Spark, use `172.19.0.1` (gateway IP).

### Rate Limiting
Increase batch size or add delays between requests.

### OOM on GPU
Reduce batch size or use a smaller model.