# Agent2_Repository_Analyzer > print(f"Found {len(ready_rows)} repositories to analyze") ``` - Author: hadarwayn - Repository: hadarwayn/L19-AI-Agents-Auto-Homework-Grading-System - Version: 20251204172734 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/hadarwayn/L19-AI-Agents-Auto-Homework-Grading-System - Web: https://mule.run/skillshub/@@hadarwayn/L19-AI-Agents-Auto-Homework-Grading-System~Agent2_Repository_Analyzer:20251204172734 --- # Agent 2: Repository Analyzer **Description**: Clone GitHub repositories, analyze Python files, calculate grades based on 150-line compliance ## Responsibilities 1. Read Excel1.xlsx (only "Ready" rows) 2. Clone repositories using multi-threading (5 workers) 3. Find all Python files (.py) 4. Count lines in each file 5. Calculate grade: 100 * (compliant_lines / total_lines) 6. Create Excel2.xlsx with grade data ## Prerequisites - Excel1.xlsx must exist in results/excel/ - Git must be installed - temp/repos/ directory exists ## Instructions ### Step 1: Read Excel1.xlsx ```python import openpyxl import git from pathlib import Path from concurrent.futures import ThreadPoolExecutor, as_completed import time # Load Excel1 wb = openpyxl.load_workbook('results/excel/Excel1.xlsx') ws = wb.active # Extract rows with status = "Ready" ready_rows = [] for row in ws.iter_rows(min_row=2, values_only=True): email_id, received_time, subject, sender, hashed, github_url, thread_id, status = row if status == "Ready": ready_rows.append({ 'email_id': email_id, 'github_url': github_url }) print(f"Found {len(ready_rows)} repositories to analyze") ``` ### Step 2: Multi-threaded Repository Cloning ```python def analyze_repository(email_id, github_url): """Analyze a single repository""" try: # Create unique directory for this repo repo_dir = Path(f"temp/repos/{email_id[:8]}") # Clone repository (timeout: 60 seconds) print(f"Cloning {github_url}...") repo = git.Repo.clone_from(github_url, repo_dir, depth=1) # Find all Python files python_files = list(repo_dir.rglob("*.py")) # Count lines total_lines = 0 compliant_lines = 0 file_count = 0 for py_file in python_files: try: with open(py_file, 'r', encoding='utf-8') as f: lines = len(f.readlines()) total_lines += lines # Compliant if <= 150 lines if lines <= 150: compliant_lines += lines file_count += 1 except Exception: continue # Calculate grade if total_lines > 0: grade = round(100 * (compliant_lines / total_lines), 2) else: grade = 0 return { 'email_id': email_id, 'github_url': github_url, 'total_files': file_count, 'total_lines': total_lines, 'compliant_lines': compliant_lines, 'grade': grade, 'status': 'Ready' } except git.GitCommandError: return { 'email_id': email_id, 'github_url': github_url, 'total_files': 0, 'total_lines': 0, 'compliant_lines': 0, 'grade': 0, 'status': 'Failed: clone' } except Exception as e: return { 'email_id': email_id, 'github_url': github_url, 'total_files': 0, 'total_lines': 0, 'compliant_lines': 0, 'grade': 0, 'status': f'Failed: {str(e)[:50]}' } # Process repositories in parallel (5 workers) results = [] with ThreadPoolExecutor(max_workers=5) as executor: futures = { executor.submit(analyze_repository, row['email_id'], row['github_url']): row for row in ready_rows } for future in as_completed(futures): result = future.result() results.append(result) print(f" ✓ {result['email_id'][:8]} - Grade: {result['grade']}") ``` ### Step 3: Create Excel2.xlsx ```python # Create workbook wb2 = openpyxl.Workbook() ws2 = wb2.active ws2.title = "Repository Analysis" # Headers headers = [ "email_id", "github_url", "total_files", "total_lines", "compliant_lines", "grade", "status" ] ws2.append(headers) # Add rows for result in results: ws2.append([ result['email_id'], result['github_url'], result['total_files'], result['total_lines'], result['compliant_lines'], result['grade'], result['status'] ]) # Save wb2.save('results/excel/Excel2.xlsx') ``` ### Step 4: Output ```python print(f"\n✅ Analysis complete!") print(f" - Total repositories: {len(results)}") print(f" - Successful: {sum(1 for r in results if r['status'] == 'Ready')}") print(f" - Failed: {sum(1 for r in results if 'Failed' in r['status'])}") print(f" - Average grade: {sum(r['grade'] for r in results) / len(results):.2f}") print(f"✅ Created Excel2.xlsx") ``` ## Grading Formula ``` Grade = 100 × (compliant_lines / total_lines) Where: - compliant_lines = sum of lines in files with ≤ 150 lines - total_lines = sum of all lines in all .py files ``` ## Example ``` Repository contains: - main.py: 45 lines ✅ (compliant) - utils.py: 200 lines ❌ (non-compliant) - helpers.py: 80 lines ✅ (compliant) Total lines = 45 + 200 + 80 = 325 Compliant lines = 45 + 80 = 125 Grade = 100 × (125 / 325) = 38.46 ``` ## Expected Output **Excel2.xlsx** with columns: - `email_id`: Links to Agent 1 - `github_url`: Repository URL - `total_files`: Count of .py files - `total_lines`: Sum of all lines - `compliant_lines`: Lines in files ≤150 - `grade`: 0-100 score - `status`: "Ready" or error message ## Error Handling - Clone timeout (60s): Set status = "Failed: clone" - Repository not found: Set status = "Failed: clone" - No Python files: grade = 0, status = "Ready" - File read errors: Skip file, continue - Thread failures: Isolated per repository ## Success Criteria - Excel2.xlsx exists - All ready rows from Excel1 processed - Grades calculated correctly - Multi-threading completed without crashes