# dgx-spark-expert > Comprehensive expert knowledge for NVIDIA DGX Spark workstation. Use when users ask about DGX Spark hardware, software, playbooks, AI Workbench, fine-tuning, inference, troubleshooting, known issues, container workflows, multi-node setup, or any development task on DGX Spark/GB10 Grace Blackwell systems. Triggers on mentions of "DGX Spark", "GB10", "Grace Blackwell desktop", "Spark workstation", or related NVIDIA AI workstation topics. - Author: Carlos Crespo Macaya - Repository: macayaven/docker-neural-memory - Version: 20260109171134 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/macayaven/docker-neural-memory - Web: https://mule.run/skillshub/@@macayaven/docker-neural-memory~dgx-spark-expert:20260109171134 --- --- name: dgx-spark-expert description: Comprehensive expert knowledge for NVIDIA DGX Spark workstation. Use when users ask about DGX Spark hardware, software, playbooks, AI Workbench, fine-tuning, inference, troubleshooting, known issues, container workflows, multi-node setup, or any development task on DGX Spark/GB10 Grace Blackwell systems. Triggers on mentions of "DGX Spark", "GB10", "Grace Blackwell desktop", "Spark workstation", or related NVIDIA AI workstation topics. --- # DGX Spark Expert Expert guidance for NVIDIA DGX Spark AI workstation development. ## Quick Reference | Resource | URL | |----------|-----| | Playbooks Hub | https://build.nvidia.com/spark | | User Guide | https://docs.nvidia.com/dgx/dgx-spark/ | | Support | https://www.nvidia.com/en-us/support/dgx-spark/ | | Forums | https://forums.developer.nvidia.com/c/accelerated-computing/dgx-spark-gb10 | | GitHub Playbooks | https://github.com/NVIDIA/dgx-spark-playbooks | ## Key System Facts - **Architecture**: ARM64 (not x86) — use ARM64 binaries/containers - **Memory**: 128GB unified (shared CPU/GPU via UMA) - **Performance**: 1 PFLOP FP4 with sparsity - **Max Model Size**: ~200B parameters (single), ~405B (two Sparks stacked) - **OS**: DGX OS (Ubuntu-based with NVIDIA stack) ## Reference Files Load these for detailed information: - `references/hardware-specs.md` — Full specs, UMA details, Spark stacking - `references/playbooks-index.md` — All 25+ official playbooks with links - `references/known-issues.md` — Troubleshooting, diagnostics, support - `references/software-stack.md` — DGX OS, containers, frameworks, tools - `references/ai-workbench.md` — AI Workbench projects, RAG, agents ## Common Workflows ### Run Inference **Quick local chat**: Use Ollama + Open WebUI playbook ```bash # Install Ollama curl -fsSL https://ollama.com/install.sh | sh ollama run llama3.2 ``` **Production serving**: Use vLLM or TRT-LLM playbooks ```bash # vLLM example pip install vllm python -m vllm.entrypoints.openai.api_server \ --model meta-llama/Llama-3.1-8B-Instruct ``` ### Fine-tune a Model 1. **Quick experiments**: LLaMA Factory or Unsloth playbooks 2. **Production**: NeMo playbook 3. **Image models**: FLUX Dreambooth playbook See `references/playbooks-index.md` for all options. ### Create AI Workbench Project ```bash # Clone official RAG project nvwb project clone https://github.com/NVIDIA/workbench-example-agentic-rag # Start project cd workbench-example-agentic-rag nvwb start ``` See `references/ai-workbench.md` for detailed workflow. ### Connect Two Sparks 1. Connect via QSFP/CX7 cable 2. Configure netplan on both nodes 3. Exchange SSH keys 4. Install NCCL and run tests 5. See "Connect Two Sparks" and "NCCL" playbooks ### Troubleshoot Issues 1. Check `references/known-issues.md` first 2. Common fixes: - Memory issues: `sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches` - Check memory: `free -h` (not nvidia-smi for memory) - Driver issues: `sudo systemctl status nvidia-persistenced` 3. Forums: https://forums.developer.nvidia.com/c/accelerated-computing/dgx-spark-gb10 ## UMA Memory Management DGX Spark uses Unified Memory Architecture — CPU and GPU share 128GB. **Key points**: - `nvidia-smi` memory display may show "Not Supported" (expected) - Use `free -h` for actual memory status - Flush buffer cache if memory pressure: `sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches` - Models up to ~200B parameters can run locally ## Container Patterns ```bash # Standard GPU container docker run --gpus all --runtime nvidia # With shared memory (required for many ML frameworks) docker run --gpus all --shm-size=16g # Mount HuggingFace cache docker run --gpus all \ -v $HOME/.cache/huggingface:/root/.cache/huggingface \ # NGC container example docker pull nvcr.io/nvidia/pytorch:24.01-py3 ``` ## ARM64 Compatibility DGX Spark runs ARM64, not x86. When installing software: - Use `aarch64` or `arm64` package versions - NGC CLI must be ARM64 Linux version - Some x86-only tools require alternatives or won't work - Check NGC for ARM64-compatible containers ## Playbook Selection Guide | Goal | Recommended Playbook | |------|---------------------| | Chat with local LLM | Open WebUI + Ollama | | Serve LLM API | vLLM or TRT-LLM | | Fine-tune LLM | LLaMA Factory (quick) or NeMo (production) | | Build RAG app | RAG in AI Workbench | | Generate images | Comfy UI | | AI coding assistant | Vibe Coding | | Remote access | Tailscale | | Data science | CUDA-X Data Science | | Large models (>200B) | Connect Two Sparks | ## Decision Logic **User asks about inference** → Check model size, recommend vLLM (high throughput) or TRT-LLM (optimized latency), or Ollama for simple use. **User asks about fine-tuning** → Assess complexity: LLaMA Factory/Unsloth for experiments, NeMo for production, PyTorch for custom needs. **User asks about AI Workbench** → Load `references/ai-workbench.md`, guide through project creation/cloning. **User reports error/issue** → Load `references/known-issues.md`, check if known issue, provide diagnostic commands. **User asks about specs/capabilities** → Load `references/hardware-specs.md`, provide relevant details. **User wants specific playbook** → Load `references/playbooks-index.md`, provide direct link and summary.