# gemini-image-generation

> Generate or edit images using the Gemini API (Nano Banana / Nano Banana Pro). Use when you need to select Gemini image models, craft text-to-image or image-to-image requests, set response modalities and imageConfig (aspect ratio/size), send inline or File API images, parse inlineData/base64 outputs, and save images to files via REST or SDKs.

- Author: Jiayao Yu
- Repository: jyu-lt/agent-skills
- Version: 20260206154159
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-07
- Source: https://github.com/jyu-lt/agent-skills
- Web: https://mule.run/skillshub/@@jyu-lt/agent-skills~gemini-image-generation:20260206154159

---

---
name: gemini-image-generation
description: Generate or edit images using the Gemini API (Nano Banana / Nano Banana Pro). Use when you need to select Gemini image models, craft text-to-image or image-to-image requests, set response modalities and imageConfig (aspect ratio/size), send inline or File API images, parse inlineData/base64 outputs, and save images to files via REST or SDKs.
---

# Gemini Image Generation

## Overview
Generate and edit images via the Gemini API using text prompts and optional reference images. Use this skill to select models, build requests (REST or SDK), and decode image outputs.

## Workflow
1. Clarify goal and inputs
- Gather the prompt, style constraints, desired aspect ratio/size, and expected output count (not guaranteed).
- Collect input images and decide whether to send inline data (<20MB total request) or use the File API for larger or reusable assets.

2. Pick the model
- `gemini-2.5-flash-image` (Nano Banana): speed and low latency for high-volume tasks.
- `gemini-3-pro-image-preview` (Nano Banana Pro): higher fidelity and more complex edits.

3. Build the request
- Use `generateContent` with `contents[].parts` containing `text` and optional `inline_data` (REST) or image parts (SDK).
- Set `generationConfig.responseModalities` to `['Image']` for image-only outputs, or include `Text` when you also want captions.
- Set `generationConfig.imageConfig.aspectRatio` and optional `imageSize` (`1K`, `2K`, `4K`) to control output sizing.

4. Send via SDK or REST
- SDK: use `google-genai` client.
- REST: POST to `https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent` with the `x-goog-api-key` header.

5. Parse outputs
- Read `candidates[0].content.parts`; image parts include `inlineData` or `inline_data` with base64 data and mime type.
- Decode and save images to disk; keep text parts as captions or logs.

## Script
Use `scripts/gemini_image.py` to build and optionally send requests.
- `--dry-run` prints a ready-to-send JSON payload and curl command.
- `--run` sends the request and writes outputs to the target directory.

## Limits and tips
- Image generation does not support audio or video inputs.
- Output count is not guaranteed; handle 1..N images.
- Input image limits differ by model (Flash Image ~3 inputs; Pro Image Preview up to 5 high-fidelity and 14 total).
- Generated images include a SynthID watermark.
- For on-image text, generate the exact text first, then request the image with that text.

## References
- See `references/gemini-image-generation.md` for API fields, request/response shapes, and configuration details.