# processor > Process documents into RAG database. Use when user wants to chunk, embed, or index files into a vector database for semantic search. - Author: SIslamMun - Repository: grc-iit/Phagocyte - Version: 20260113182708 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/grc-iit/Phagocyte - Web: https://mule.run/skillshub/@@grc-iit/Phagocyte~processor:20260113182708 --- --- name: processor description: Process documents into RAG database. Use when user wants to chunk, embed, or index files into a vector database for semantic search. --- # Document Processing This skill helps you process documents, codebases, and papers into a searchable RAG (Retrieval-Augmented Generation) database using LanceDB. ## Quick Start ```bash # 1. Check that services are running uv run processor check # 2. Process files into database uv run processor process ./input -o ./lancedb # 3. Verify results uv run processor stats ./lancedb ``` ## Common Use Cases ### Process a codebase ```bash uv run processor process ./my-project -o ./code_db --content-type code ``` ### Process papers/documents ```bash uv run processor process ./papers -o ./papers_db ``` ### Incremental updates (skip unchanged files) ```bash uv run processor process ./input -o ./lancedb --incremental ``` ### High-quality embeddings (slower, better retrieval) ```bash uv run processor process ./input -o ./lancedb --text-profile high --code-profile high ``` ## Embedding Profiles | Type | Profile | Model | Dimensions | Use Case | |------|---------|-------|------------|----------| | text | low | Qwen3-Embedding-0.6B | 1024 | Fast, good quality | | text | medium | Qwen3-Embedding-4B | 2560 | Balanced | | text | high | Qwen3-Embedding-8B | 4096 | Maximum quality | | code | low | jina-code-0.5b | 896 | Fast code search | | code | high | jina-code-1.5b | 1536 | Best code search | ## Key Options | Option | Values | Description | |--------|--------|-------------| | `--embedder` | ollama, transformers | Embedding backend | | `--text-profile` | low, medium, high | Text embedding quality | | `--code-profile` | low, high | Code embedding quality | | `--table-mode` | separate, unified, both | Table organization | | `--incremental/--full` | - | Skip unchanged files | | `--content-type` | auto, code, paper, markdown | Force content detection | ## MCP Server Start the processor MCP server for programmatic access: ```bash uv run processor-mcp ``` Configure in Claude Desktop (`claude_desktop_config.json`): ```json { "mcpServers": { "processor": { "command": "uv", "args": ["run", "processor-mcp"], "cwd": "/path/to/processor" } } } ``` ### Available MCP Tools - `process_documents` - Process files into LanceDB - `check_services` - Check backend availability - `setup_models` - Download embedding models - `get_db_stats` - Database statistics - `export_db` - Export database ## Troubleshooting ### "Model not found" error ```bash uv run processor setup # Download required models ``` ### Ollama not running ```bash ollama serve # Start Ollama server ``` ### Check available models ```bash uv run processor check ```