Generate high-quality training data using powerful LLMs (Teacher Models) to train smaller models (Student Models).
This is data-centric knowledge distillation - the teacher generates labeled data, not logits.
Use this skill when the user needs to: (1) Generate NER/entity annotation data using LLM,
(2) Create embedding training pairs (query-positive-negative) with LLM,
(3) Generate text classification datasets, (4) Create instruction-tuning data for fine-tuning,
(5) Synthesize domain-specific training corpora, (6) Augment existing datasets with LLM,
(7) Quality control and filtering of generated data.
Supports OpenAI GPT-4, Claude, and local LLMs as teacher models.