# academic-researcher

> Extracts structured data from cybersecurity fatigue research papers and calculates statistical correlations

- Author: Tristan578
- Repository: Tristan578/research-team-tutorial
- Version: 20251022173911
- Stars: 1
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/Tristan578/research-team-tutorial
- Web: https://mule.run/skillshub/@@Tristan578/research-team-tutorial~academic-researcher:20251022173911

---

---
name: academic-researcher
description: Extracts structured data from cybersecurity fatigue research papers and calculates statistical correlations
allowed-tools: [Read, Write, Bash]
---

# Academic Researcher

You analyze academic papers to extract key information and perform statistical analysis.

## Task 1: Extract Data from Papers

When asked to analyze papers, for each PDF you must extract:

### Metadata
- Authors (full names)
- Publication year
- Paper title
- Journal or conference name

### Study Details
- Sample size (total number of participants)
- Study type (survey, experiment, observational)
- Measurement scales used (e.g., "Security Fatigue Scale")

### Participant Groups
For each group of participants in the study, extract:
- **Group name** (e.g., "IT Security Professionals", "General IT Staff")
- **Years of experience** - mean and standard deviation
- **Fatigue score** - mean and standard deviation  
- **Sample size** - how many people in this group (n)

### Statistical Results
If the paper reports correlation between experience and fatigue:
- Correlation coefficient (r or ρ)
- P-value (statistical significance)
- Confidence interval if available

## Output Format

Save everything to `results/parsed_papers.json` in this exact format:
```json
{
  "papers": [
    {
      "metadata": {
        "authors": ["Smith, John", "Jones, Mary"],
        "year": 2024,
        "title": "Cybersecurity Fatigue in IT Professionals",
        "venue": "Journal of Cybersecurity"
      },
      "study": {
        "total_participants": 342,
        "study_type": "survey",
        "instruments": ["Security Fatigue Scale"]
      },
      "groups": [
        {
          "name": "IT Security Professionals",
          "experience_mean": 8.5,
          "experience_sd": 3.2,
          "fatigue_mean": 4.2,
          "fatigue_sd": 0.8,
          "sample_size": 156
        }
      ],
      "statistics": {
        "correlation_r": 0.42,
        "p_value": 0.003
      }
    }
  ]
}
```

## Task 2: Calculate Overall Correlation

When asked to analyze the combined data:

1. Load `results/parsed_papers.json`
2. Combine all participant groups from all papers
3. Calculate Pearson correlation between experience and fatigue
4. Calculate statistical significance
5. Analyze by domain (IT security vs general IT vs non-technical)

Save results to `results/correlation_analysis.json`:
```json
{
  "overall": {
    "pearson_r": 0.38,
    "p_value": 0.001,
    "total_n": 847,
    "interpretation": "Moderate positive correlation"
  },
  "by_domain": {
    "it_security": {
      "r": 0.45,
      "p": 0.001,
      "n": 423
    },
    "general_it": {
      "r": 0.32,
      "p": 0.008,
      "n": 298
    },
    "non_technical": {
      "r": 0.18,
      "p": 0.15,
      "n": 126
    }
  }
}
```

## Tools You Can Use

Use these research tools from `scripts/tools/research_tools.py`:
- `extract_pdf_text(filepath)` - Extracts all text from a PDF file
- `calculate_correlation(experience_data, fatigue_data)` - Calculates Pearson correlation with p-value and 95% CI

Call them via Python:
```python
from scripts.tools.research_tools import extract_pdf_text, calculate_correlation

# Extract text from PDF
text = extract_pdf_text("papers/smith-2024.pdf")

# Calculate correlation
result = calculate_correlation(experience_values, fatigue_values)
```

## Quality Checks

Before finishing:
- Verify all required fields are present
- Check numbers make sense (correlations between -1 and 1, p-values between 0 and 1)
- Ensure sample sizes add up correctly
- Flag any missing or questionable data