# speech-recognition > iOS speech recognition implementation using @react-native-voice/voice. Use when debugging transcription issues, modifying session handling, or understanding the accumulated text tracking mechanism. - Author: Takashi Matsumura - Repository: Takashi-Matsumura/webrtc-demo - Version: 20251211143619 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/Takashi-Matsumura/webrtc-demo - Web: https://mule.run/skillshub/@@Takashi-Matsumura/webrtc-demo~speech-recognition:20251211143619 --- --- name: speech-recognition description: iOS speech recognition implementation using @react-native-voice/voice. Use when debugging transcription issues, modifying session handling, or understanding the accumulated text tracking mechanism. --- # Speech Recognition Implementation ## Overview The `useSpeechRecognition` hook in `mobile/src/hooks/useSpeechRecognition.ts` handles real-time Japanese speech recognition on iOS. ## Key Concepts ### iOS Voice API Behavior - `@react-native-voice/voice` wraps iOS Speech Framework - **Accumulated text**: iOS returns ALL text since `Voice.start()`, not just new words - `onSpeechEnd` fires when iOS decides speech is complete (unpredictable timing) - No native "continuous" mode - must manually restart ### Session Management ``` ┌─────────────────────────────────────────────────────┐ │ Voice.start() │ │ ↓ │ │ onSpeechPartialResults → Update current session │ │ ↓ │ │ [2 sec silence] → Finalize session, track length │ │ ↓ │ │ New speech → Extract only NEW text (subtract old) │ │ ↓ │ │ onSpeechEnd → Reset counter, restart Voice │ └─────────────────────────────────────────────────────┘ ``` ## Critical Implementation Details ### 1. Accumulated Text Tracking ```typescript const lastFinalizedTextLengthRef = useRef(0); // When session finalizes (silence detected) lastFinalizedTextLengthRef.current += current.text.length; // Extract new text from accumulated result const extractNewText = (fullText: string): string => { if (lastFinalizedTextLengthRef.current === 0) return fullText; return fullText.substring(lastFinalizedTextLengthRef.current).trim(); }; ``` ### 2. Silence Detection ```typescript const SILENCE_TIMEOUT_MS = 2000; silenceTimerRef.current = setTimeout(() => { // Finalize current session setTranscripts(prev => prev.map(t => t.id === currentTranscriptIdRef.current ? { ...t, isFinal: true } : t )); currentTranscriptIdRef.current = null; }, SILENCE_TIMEOUT_MS); ``` ### 3. Auto-Restart on Speech End ```typescript const onSpeechEnd = useCallback(() => { // Reset counter when iOS naturally ends (buffer cleared) lastFinalizedTextLengthRef.current = 0; if (isListeningRef.current) { restartRecognition(0); } }, [restartRecognition]); ``` ## Visual Indicators In `Transcription.tsx`: - **Blue border**: Active session (`!entry.isFinal`) - **"入力中" label**: Speech in progress - Border disappears when session finalizes ## Common Issues & Solutions | Issue | Cause | Solution | |-------|-------|----------| | Previous text in new session | Length not accumulated | Use `+=` not `=` for lastFinalizedTextLengthRef | | App crash on Voice restart | Double Voice.start() | Don't restart Voice in silence timer - use text tracking | | "Speech recognition already started" | Multiple start calls | Check isListening before Voice.start() | | Empty sessions created | Empty text not filtered | Add `if (!transcript.trim()) return` | ## Debug Logging Key log messages to watch: ``` "Finalized text length (累積): X" → Length accumulating correctly "Session finalized, ready for new input" → Session closed "Auto-restarting speech recognition..." → Voice restarting after onSpeechEnd ``` ## Important: Do NOT - Restart Voice during silence timer (causes crashes) - Replace lastFinalizedTextLengthRef (must accumulate) - Ignore empty text results (creates phantom sessions)