# fact-check-tuner > Tune and optimize fact-checking in the research agent. Use when adjusting claim extraction, verification strictness, fixing fact-check loops, reducing API costs, or improving accuracy. - Author: rajathbharadwaj - Repository: rakshithvasudev/cc-langgraph-blog-writer - Version: 20260103002114 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/rakshithvasudev/cc-langgraph-blog-writer - Web: https://mule.run/skillshub/@@rakshithvasudev/cc-langgraph-blog-writer~fact-check-tuner:20260103002114 --- --- name: fact-check-tuner description: Tune and optimize fact-checking in the research agent. Use when adjusting claim extraction, verification strictness, fixing fact-check loops, reducing API costs, or improving accuracy. allowed-tools: Read, Edit, Grep --- # Fact-Checking Optimization ## Overview The fact-checking system is a two-node loop: ``` write_article → fact_check → [unverified?] → fix_claims → fact_check (max 3x) → [all verified] → approval_article ``` ## Key Files | File | Location | Purpose | |------|----------|---------| | `nodes.py` | `fact_check_node` (~line 463) | Extract & verify claims | | `nodes.py` | `fix_claims_node` (~line 543) | Rewrite unverified claims | | `prompts.py` | `EXTRACT_CLAIMS_*` | Claim extraction prompts | | `prompts.py` | `VERIFY_CLAIM_*` | Claim verification prompts | | `prompts.py` | `FIX_CLAIMS_*` | Article rewriting prompts | ## Tuning Parameters ### Max Fact-Check Iterations In `fact_check_node` (nodes.py ~line 520): ```python # Default: 3 iterations if unverified and fact_check_iteration < 3: return Command(goto="fix_claims") ``` **Trade-offs**: - Higher = more accurate but more API calls - Lower = faster but may have unverified claims ### Claim Extraction Strictness Edit `EXTRACT_CLAIMS_SYSTEM` in prompts.py: **More strict (fewer claims)**: ```python EXTRACT_CLAIMS_SYSTEM = """Extract ONLY claims that: 1. Cite specific statistics, dates, or numbers 2. Make cause-and-effect assertions 3. Attribute statements to named sources Skip: - General observations - Well-known facts - Opinions or analysis - Statements with hedging language ("may", "might", "could") """ ``` **Less strict (more claims)**: ```python EXTRACT_CLAIMS_SYSTEM = """Extract all factual assertions including: 1. Any statement presented as fact 2. Comparisons or rankings 3. Historical references 4. Technical specifications """ ``` ### Verification Threshold Edit `VERIFY_CLAIM_SYSTEM` in prompts.py: **Strict verification**: ```python VERIFY_CLAIM_SYSTEM = """Verify claims with HIGH standards: - Claim must have DIRECT support in sources - Numbers must match exactly - Attribution must be to the correct source - Partial support = NOT verified """ ``` **Lenient verification**: ```python VERIFY_CLAIM_SYSTEM = """Verify claims reasonably: - Accept claims supported by general consensus in sources - Minor number variations are acceptable - Paraphrased attributions count as verified - If claim is generally supported, mark as verified """ ``` ## Cost Optimization ### Current Cost Profile Per article: - `fact_check_node`: 2 LLM calls per claim (extract + verify each) - `fix_claims_node`: 1 LLM call to rewrite article - Worst case: 3 iterations × (N claims × 2 + 1) calls ### Optimization Strategies #### 1. Limit Claims Per Article In `EXTRACT_CLAIMS_USER`: ```python """Extract the TOP 10 most important factual claims. Prioritize claims with specific numbers or statistics.""" ``` #### 2. Batch Verification Modify `fact_check_node` to verify multiple claims in one call: ```python # Instead of verifying one claim at a time: for claim in claims: verify_response = await llm.ainvoke(verify_single_claim) # Verify all claims at once: verify_all_prompt = f"Verify these claims: {claims}\nSources: {sources}" verify_response = await llm.ainvoke(verify_all_prompt) ``` #### 3. Skip Obvious Facts Add to `EXTRACT_CLAIMS_SYSTEM`: ```python """Skip claims that are: - Common knowledge (e.g., "Water boils at 100°C") - Definitions of terms - Direct quotes already attributed """ ``` #### 4. Cache Similar Claims Add caching to `fact_check_node`: ```python # Simple in-memory cache _verification_cache = {} async def fact_check_node(state): claim_key = hash(claim["claim"]) if claim_key in _verification_cache: return _verification_cache[claim_key] # ... verify and cache result ``` ## Improving Accuracy ### Common False Negatives (valid claims marked unverified) **Problem**: Sources contain the information but LLM misses it **Fix**: Increase source context in `VERIFY_CLAIM_USER`: ```python # Instead of truncating at 500 chars sources_text = "\n".join( [f"- {s.get('title', '')}: {s.get('content', '')[:1000]}" for s in sources[:15]] # More sources, more content ) ``` ### Common False Positives (invalid claims marked verified) **Problem**: LLM verifies claims too loosely **Fix**: Add specific verification criteria: ```python VERIFY_CLAIM_SYSTEM = """To mark a claim as verified, you MUST find: 1. The exact fact stated in the claim 2. From a source that appears credible 3. With matching details (numbers, dates, names) If ANY part of the claim is unsupported, mark as NOT verified.""" ``` ## Debugging Fact-Check Issues ### See what claims are extracted Add logging to `fact_check_node`: ```python print(f"Extracted claims: {json.dumps(claims_data, indent=2)}") ``` ### See verification results ```python for result in results: print(f"Claim: {result['claim'][:50]}...") print(f" Verified: {result['is_verified']}") print(f" Issue: {result.get('issue', 'None')}") ``` ### Test claim extraction alone ```python import asyncio from research_agent.graph.nodes import fact_check_node state = { "article": "Your test article with claims...", "findings": [], "sources": [], } result = asyncio.run(fact_check_node(state)) ``` ## Quick Fixes ### "Too many claims flagged" → Make `VERIFY_CLAIM_SYSTEM` more lenient ### "Fact-check taking too long" → Reduce max iterations to 1-2 → Limit claims extracted to 5-10 ### "Claims still wrong after fixes" → Improve `FIX_CLAIMS_SYSTEM` instructions → Provide more source context to fix_claims_node ### "High API costs" → Batch verification calls → Lower claim extraction count → Use faster/cheaper model for verification