# phase2-python

> Python FastAPI AIサービス構築。Gemini OCRロジック実装、PDF/Excel/Word処理、APIエンドポイント作成。

- Author: nakano
- Repository: git-hub-nakano/document-ai-ocr
- Version: 20260128194217
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/git-hub-nakano/document-ai-ocr
- Web: https://mule.run/skillshub/@@git-hub-nakano/document-ai-ocr~phase2-python:20260128194217

---

---
name: phase2-python
description: Python FastAPI AIサービス構築。Gemini OCRロジック実装、PDF/Excel/Word処理、APIエンドポイント作成。
allowed-tools: Bash, Write, Read, Glob, Grep
model: claude-sonnet-4-20250514
---

# Phase 2: Python AI Service Skill

## Overview

Google Gemini APIを使用したOCR処理サービスをFastAPIで構築します。

## When to Use

- `/phase2` コマンド実行時
- "Python サービスを構築" という指示があった時
- "OCR ロジックを実装" という指示があった時

## Prerequisites

- Phase 1 完了済み
- python-ai-service/ ディレクトリ存在

## File Structure

```
python-ai-service/
├── Dockerfile
├── main.py
├── requirements.txt
└── services/
    ├── __init__.py
    └── gemini_service.py
```

## Tasks

### 1. Create requirements.txt

```text
# FastAPI
fastapi==0.109.0
uvicorn[standard]==0.27.0

# Google Generative AI
google-generativeai==0.8.0

# Document Processing
PyPDF2==3.0.1
pdf2image==1.17.0
python-docx==1.1.0
openpyxl==3.1.2

# Utilities
httpx==0.26.0
python-dotenv==1.0.0
pydantic==2.5.3
```

### 2. Create main.py

**Key Components**:
- FastAPI application setup
- CORS middleware for Laravel
- Health check endpoint: `GET /health`
- OCR endpoint: `POST /ocr`
- Upload endpoint: `POST /ocr/upload`

**Request Model**:
```python
class OCRRequest(BaseModel):
    base64_data: str
    mime_type: str
    language: str = "ja"
    comment: Optional[str] = ""
```

**Response Model**:
```python
class OCRResponse(BaseModel):
    success: bool
    markdown: Optional[str] = None
    error: Optional[str] = None
```

### 3. Create gemini_service.py

**Core Logic**:
- GeminiOCRService class
- Async OCR processing
- File type routing (image/pdf/excel/word/text)
- Retry logic for rate limits
- PDF batch processing (3 pages/batch)

**Supported MIME Types**:
```python
SUPPORTED_MIME_TYPES = [
    'image/jpeg', 'image/png', 'image/webp', 'image/gif',
    'application/pdf',
    'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    'application/vnd.ms-excel',
    'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
    'application/msword',
    'text/plain', 'text/markdown'
]
```

**System Prompt Template**:
```python
def _get_system_prompt(self, language: str, comment: str) -> str:
    return f"""
# System Identity
あなたは「Document AI OCR Engine」です。

# Context Variables
- @Language: {language}
- @UserContext: {comment or 'None'}

# Core Principles
1. ソースグラウンディング: 元文書の情報のみ出力
2. 視覚的再現性: テーブル、図表を適切に変換

# Output Format
Markdownコードのみを出力。前置き不要。
"""
```

### 4. Create Dockerfile

```dockerfile
FROM python:3.11-slim
WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    poppler-utils curl && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
EXPOSE 8001

RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"]
```

## Verification

### Local Test (without Docker)

```powershell
cd python-ai-service

# 仮想環境作成
python -m venv venv
.\venv\Scripts\Activate.ps1

# 依存関係インストール
pip install -r requirements.txt

# 環境変数設定
$env:GEMINI_API_KEY = "your_key_here"

# サーバー起動
uvicorn main:app --reload --port 8001

# 別ターミナルでテスト
curl http://localhost:8001/health
```

**Expected Response**:
```json
{
  "status": "healthy",
  "service": "Document AI OCR",
  "version": "2.0.0"
}
```

## Code Patterns

### Async Gemini Call with Retry

```python
async def _call_gemini(self, content, system_prompt, max_retries=3):
    for attempt in range(max_retries + 1):
        try:
            response = await asyncio.to_thread(
                self.model.generate_content,
                [system_prompt] + content
            )
            return self._clean_response(response.text)
        except Exception as e:
            if self._is_retryable(e) and attempt < max_retries:
                await asyncio.sleep(2 ** attempt)
                continue
            raise
```

### PDF Batch Processing

```python
async def _process_pdf(self, base64_data, system_prompt):
    reader = PdfReader(io.BytesIO(base64.b64decode(base64_data)))
    total_pages = len(reader.pages)
    
    if total_pages <= PDF_BATCH_SIZE:
        return await self._call_gemini([...], system_prompt)
    
    # Batch processing for large PDFs
    results = []
    for i in range(0, total_pages, PDF_BATCH_SIZE):
        chunk = await self._process_pdf_chunk(reader, i, system_prompt)
        results.append(chunk)
    
    return "\n\n".join(results)
```

## Error Handling

| Error | Cause | Solution |
|-------|-------|----------|
| GEMINI_API_KEY not set | 環境変数未設定 | .env ファイル確認 |
| 429 Too Many Requests | レート制限 | 自動リトライ（指数バックオフ） |
| Unsupported file type | 非対応MIME | エラーメッセージ返却 |

## Handoff to Phase 3

Phase 2 完了後、以下を Phase 3 (laravel-agent) に引き継ぎ:

- API エンドポイント仕様
  - `GET /health` - ヘルスチェック
  - `POST /ocr` - OCR処理（JSON）
  - `POST /ocr/upload` - OCR処理（multipart）
- リクエスト/レスポンス形式
- タイムアウト推奨値（120秒）

## Notes

- Gemini API キーは Google AI Studio で取得
- PDF処理には poppler-utils が必要（Dockerfile に含む）
- 大容量ファイルはバッチ処理で対応