# phase2-python > Python FastAPI AIサービス構築。Gemini OCRロジック実装、PDF/Excel/Word処理、APIエンドポイント作成。 - Author: nakano - Repository: git-hub-nakano/document-ai-ocr - Version: 20260128194217 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/git-hub-nakano/document-ai-ocr - Web: https://mule.run/skillshub/@@git-hub-nakano/document-ai-ocr~phase2-python:20260128194217 --- --- name: phase2-python description: Python FastAPI AIサービス構築。Gemini OCRロジック実装、PDF/Excel/Word処理、APIエンドポイント作成。 allowed-tools: Bash, Write, Read, Glob, Grep model: claude-sonnet-4-20250514 --- # Phase 2: Python AI Service Skill ## Overview Google Gemini APIを使用したOCR処理サービスをFastAPIで構築します。 ## When to Use - `/phase2` コマンド実行時 - "Python サービスを構築" という指示があった時 - "OCR ロジックを実装" という指示があった時 ## Prerequisites - Phase 1 完了済み - python-ai-service/ ディレクトリ存在 ## File Structure ``` python-ai-service/ ├── Dockerfile ├── main.py ├── requirements.txt └── services/ ├── __init__.py └── gemini_service.py ``` ## Tasks ### 1. Create requirements.txt ```text # FastAPI fastapi==0.109.0 uvicorn[standard]==0.27.0 # Google Generative AI google-generativeai==0.8.0 # Document Processing PyPDF2==3.0.1 pdf2image==1.17.0 python-docx==1.1.0 openpyxl==3.1.2 # Utilities httpx==0.26.0 python-dotenv==1.0.0 pydantic==2.5.3 ``` ### 2. Create main.py **Key Components**: - FastAPI application setup - CORS middleware for Laravel - Health check endpoint: `GET /health` - OCR endpoint: `POST /ocr` - Upload endpoint: `POST /ocr/upload` **Request Model**: ```python class OCRRequest(BaseModel): base64_data: str mime_type: str language: str = "ja" comment: Optional[str] = "" ``` **Response Model**: ```python class OCRResponse(BaseModel): success: bool markdown: Optional[str] = None error: Optional[str] = None ``` ### 3. Create gemini_service.py **Core Logic**: - GeminiOCRService class - Async OCR processing - File type routing (image/pdf/excel/word/text) - Retry logic for rate limits - PDF batch processing (3 pages/batch) **Supported MIME Types**: ```python SUPPORTED_MIME_TYPES = [ 'image/jpeg', 'image/png', 'image/webp', 'image/gif', 'application/pdf', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 'application/vnd.ms-excel', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'application/msword', 'text/plain', 'text/markdown' ] ``` **System Prompt Template**: ```python def _get_system_prompt(self, language: str, comment: str) -> str: return f""" # System Identity あなたは「Document AI OCR Engine」です。 # Context Variables - @Language: {language} - @UserContext: {comment or 'None'} # Core Principles 1. ソースグラウンディング: 元文書の情報のみ出力 2. 視覚的再現性: テーブル、図表を適切に変換 # Output Format Markdownコードのみを出力。前置き不要。 """ ``` ### 4. Create Dockerfile ```dockerfile FROM python:3.11-slim WORKDIR /app RUN apt-get update && apt-get install -y --no-install-recommends \ poppler-utils curl && rm -rf /var/lib/apt/lists/* COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8001 RUN useradd -m appuser && chown -R appuser:appuser /app USER appuser CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"] ``` ## Verification ### Local Test (without Docker) ```powershell cd python-ai-service # 仮想環境作成 python -m venv venv .\venv\Scripts\Activate.ps1 # 依存関係インストール pip install -r requirements.txt # 環境変数設定 $env:GEMINI_API_KEY = "your_key_here" # サーバー起動 uvicorn main:app --reload --port 8001 # 別ターミナルでテスト curl http://localhost:8001/health ``` **Expected Response**: ```json { "status": "healthy", "service": "Document AI OCR", "version": "2.0.0" } ``` ## Code Patterns ### Async Gemini Call with Retry ```python async def _call_gemini(self, content, system_prompt, max_retries=3): for attempt in range(max_retries + 1): try: response = await asyncio.to_thread( self.model.generate_content, [system_prompt] + content ) return self._clean_response(response.text) except Exception as e: if self._is_retryable(e) and attempt < max_retries: await asyncio.sleep(2 ** attempt) continue raise ``` ### PDF Batch Processing ```python async def _process_pdf(self, base64_data, system_prompt): reader = PdfReader(io.BytesIO(base64.b64decode(base64_data))) total_pages = len(reader.pages) if total_pages <= PDF_BATCH_SIZE: return await self._call_gemini([...], system_prompt) # Batch processing for large PDFs results = [] for i in range(0, total_pages, PDF_BATCH_SIZE): chunk = await self._process_pdf_chunk(reader, i, system_prompt) results.append(chunk) return "\n\n".join(results) ``` ## Error Handling | Error | Cause | Solution | |-------|-------|----------| | GEMINI_API_KEY not set | 環境変数未設定 | .env ファイル確認 | | 429 Too Many Requests | レート制限 | 自動リトライ(指数バックオフ) | | Unsupported file type | 非対応MIME | エラーメッセージ返却 | ## Handoff to Phase 3 Phase 2 完了後、以下を Phase 3 (laravel-agent) に引き継ぎ: - API エンドポイント仕様 - `GET /health` - ヘルスチェック - `POST /ocr` - OCR処理(JSON) - `POST /ocr/upload` - OCR処理(multipart) - リクエスト/レスポンス形式 - タイムアウト推奨値(120秒) ## Notes - Gemini API キーは Google AI Studio で取得 - PDF処理には poppler-utils が必要(Dockerfile に含む) - 大容量ファイルはバッチ処理で対応