# ref-check > Verify BibTeX references for academic papers. Checks citation accuracy against Crossref and OpenAlex databases using the RefCheck_ai algorithm. Use when the user wants to check references, verify citations, validate a .bib file, or find potential citation errors. - Author: zhou_tianjian - Repository: Jaywalk18/academic-paper-tools - Version: 20260126115045 - Stars: 3 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/Jaywalk18/academic-paper-tools - Web: https://mule.run/skillshub/@@Jaywalk18/academic-paper-tools~ref-check:20260126115045 --- --- name: ref-check description: Verify BibTeX references for academic papers. Checks citation accuracy against Crossref and OpenAlex databases using the RefCheck_ai algorithm. Use when the user wants to check references, verify citations, validate a .bib file, or find potential citation errors. --- # RefCheck - BibTeX Reference Verification This skill verifies academic references using the RefCheck_ai algorithm - querying multiple databases and using fuzzy matching to find the best candidates. ## Activation Triggers Use this skill when the user: - Asks to "check references" or "verify citations" - Wants to validate a `.bib` file - Asks about potential citation errors - Mentions bibliography verification or reference checking ## Verification Process ### Step 1: Locate the BibTeX File Find `.bib` files in the project (common names: `main.bib`, `references.bib`, `ref.bib`) ### Step 2: Run the Verification Script **Preferred method** - Run the Python script for accurate batch verification. > **⚠️ IMPORTANT: This is a long-running task!** > - Each reference queries 3 databases with rate limiting (0.3s intervals) > - 40 references ≈ 120 API calls ≈ **60-90 seconds** > - **MUST run in background** with output to file, then check results after completion ```bash # Install dependencies first (if needed) pip install requests rapidfuzz bibtexparser # ✅ CORRECT: Run in BACKGROUND with output file python scripts/check_references.py --bib path/to/references.bib --output report.json & # ❌ WRONG: Don't run foreground - will timeout/hang # python scripts/check_references.py --bib path/to/references.bib ``` **Execution steps for Agent:** 1. Run script in **background** with `--output` flag 2. Inform user: "正在后台验证引用,预计需要 X 秒..." 3. Wait ~60-90 seconds (or check if output file exists) 4. Read the output JSON file 5. Generate human-readable report The script: 1. Parses BibTeX and normalizes entries (strips LaTeX, extracts first author surname) 2. Queries **3 databases** for each reference: Crossref, OpenAlex, Semantic Scholar 3. Uses **fuzzy matching** (rapidfuzz token_sort_ratio) to find best candidate from all results 4. Classifies as `verified`, `uncertain`, or `suspicious` based on RefCheck_ai algorithm ### Step 3: Interpret Results | Status | Criteria | |--------|----------| | **verified** | Title sim ≥90% + author match + year match (±1) | | **uncertain** | Some inconsistencies but not severe | | **suspicious** | Title sim <55% OR multiple severe mismatches | ### Step 4: Generate Report Script output format: ```markdown ## Reference Verification Report ### Summary - **Total references**: 45 - ✅ **Verified**: 38 (84%) - ⚠️ **Uncertain**: 5 (11%) - ❌ **Suspicious**: 2 (5%) ### Suspicious References (Need Attention) #### 1. `fabricated2023` - **BibTeX title**: "A Novel Approach to Everything" - **Issue**: No matching papers found in any database - **Action**: Verify this reference exists; may be fabricated #### 2. `smith2020deep` - **BibTeX title**: "Deep Learning Methods" - **Best match**: "Deep Learning Methods for Computer Vision" (sim: 0.72) - **Issue**: Title significantly different; year mismatch (2020 vs 2019) - **Action**: Verify correct paper; update title and year if needed ### Uncertain References (Review Recommended) #### 1. `jones2022neural` - **Issue**: Author name not found in matched paper's author list - **Suggestion**: Check if author order or name spelling is correct ### Verified References [List of verified references - can be collapsed] ### Suggested Corrections \`\`\`bibtex % smith2020deep - suggested correction: @article{smith2020deep, title = {Deep Learning Methods for Computer Vision}, author = {Smith, John and Doe, Jane}, year = {2019}, % Changed from 2020 journal = {IEEE TPAMI} } \`\`\` ``` ## Algorithm Details (RefCheck_ai) The script uses the RefCheck_ai algorithm: 1. **Multi-source search**: Query Crossref + OpenAlex + Semantic Scholar (if API key set) 2. **Candidate extraction**: Get top 5 candidates from each source (15 total) 3. **Fuzzy matching**: Use `rapidfuzz.fuzz.token_sort_ratio` for title similarity 4. **Best selection**: Score = title_sim × 0.90 + author × 0.05 + year × 0.05 5. **Classification**: Based on best match quality ```python # Scoring formula if title_sim >= 0.90 and author_hit and year_match: status = "verified" elif title_sim < 0.55: status = "suspicious" # Very low title match elif severe_count >= 2: status = "suspicious" # Multiple issues else: status = "uncertain" ``` ## Common Issues Detected | Issue | Description | Severity | |-------|-------------|----------| | Title mismatch | BibTeX title differs from official | High | | Year mismatch | Publication year off by ≥2 years | Medium | | Author mismatch | First author not in matched paper | Medium | | No candidates | Paper not found in any database | Critical | ## Environment Variables | Variable | Required | Description | |----------|----------|-------------| | `SEMANTIC_SCHOLAR_API_KEY` | No | Enables S2 search for better coverage | ## Dependencies ```bash pip install requests rapidfuzz bibtexparser ``` ## Example Interaction **User**: Check the references in release/main.bib **Agent Actions**: 1. **Start background job:** ```bash python scripts/check_references.py --bib release/main.bib --output refcheck_report.json & ``` 2. **Inform user:** > "正在后台验证 release/main.bib 中的引用,预计需要 60-90 秒。完成后我会生成报告。" 3. **Wait and check output file** (after ~60-90s): ```bash # Check if file exists and has content ls -la refcheck_report.json ``` 4. **Read results and summarize:** - Total verified/uncertain/suspicious counts - List suspicious entries with red flags - Suggest corrections for problematic references ## Manual Spot-Check (Optional) For quick spot-checks of individual references, Agent can manually query: ``` WebFetch: https://api.crossref.org/works?query.bibliographic=TITLE&rows=5 ``` Then compare the returned candidates with the BibTeX entry. But for batch verification, always prefer the Python script.