# r-statistical-computing > Use when performing statistical analysis in R, creating ggplot2 visualizations, manipulating data with tidyverse (dplyr/tidyr), building statistical models, or generating R Markdown reports. Provides tidyverse patterns, ggplot2 templates, statistical tests, and time series analysis. - Author: Yannik Pitcan - Repository: pitcany/claude-config - Version: 20260103063103 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/pitcany/claude-config - Web: https://mule.run/skillshub/@@pitcany/claude-config~r-statistical-computing:20260103063103 --- --- name: r-statistical-computing description: Use when performing statistical analysis in R, creating ggplot2 visualizations, manipulating data with tidyverse (dplyr/tidyr), building statistical models, or generating R Markdown reports. Provides tidyverse patterns, ggplot2 templates, statistical tests, and time series analysis. --- # R Statistical Computing ## Overview R programming for statistical analysis, data visualization with ggplot2, and the tidyverse ecosystem. ## When to Use - Data manipulation with dplyr/tidyr - Creating ggplot2 visualizations - Statistical testing and regression - Time series forecasting with forecast/prophet - Building ML models with tidymodels ## Quick Reference | Task | Function | Example | |------|----------|---------| | Filter rows | `filter()` | `df %>% filter(age > 18)` | | Select columns | `select()` | `df %>% select(name, age)` | | Create column | `mutate()` | `df %>% mutate(age_group = cut(age, breaks))` | | Summarize | `summarise()` | `df %>% group_by(x) %>% summarise(mean = mean(y))` | | Reshape long | `pivot_longer()` | `pivot_longer(cols = starts_with("sales_"))` | | Reshape wide | `pivot_wider()` | `pivot_wider(names_from = month, values_from = sales)` | | Join tables | `left_join()` | `left_join(df1, df2, by = "id")` | | T-test | `t.test()` | `t.test(value ~ group, data = df)` | | Linear model | `lm()` | `lm(y ~ x1 + x2, data = df)` | | Plot | `ggplot()` | `ggplot(df, aes(x, y)) + geom_point()` | ## Core Patterns ### Tidyverse Data Manipulation ```r library(tidyverse) result <- data %>% filter(age > 18, status == "active") %>% mutate( age_group = case_when( age < 30 ~ "young", age < 50 ~ "middle", TRUE ~ "senior" ) ) %>% group_by(age_group) %>% summarise( count = n(), avg_revenue = mean(revenue, na.rm = TRUE), total_revenue = sum(revenue, na.rm = TRUE) ) %>% arrange(desc(total_revenue)) ``` ### ggplot2 Visualization ```r ggplot(data, aes(x = date, y = sales, color = category)) + geom_line(size = 1.2) + geom_point(size = 3, alpha = 0.7) + scale_color_brewer(palette = "Set2") + labs(title = "Sales Trends", x = "Date", y = "Sales ($)") + theme_minimal(base_size = 14) + theme(legend.position = "bottom") ``` ### Statistical Testing ```r # T-test t.test(value ~ group, data = data) # Linear regression model <- lm(sales ~ advertising + price, data = data) summary(model) # ANOVA with post-hoc model <- aov(value ~ category, data = data) TukeyHSD(model) ``` ### Time Series Forecasting ```r library(forecast) ts_data <- ts(data$value, start = c(2020, 1), frequency = 12) model <- auto.arima(ts_data) forecast_result <- forecast(model, h = 12) autoplot(forecast_result) ``` ## Common Mistakes | Mistake | Problem | Fix | |---------|---------|-----| | Forgetting `na.rm = TRUE` | NA propagates to result | `mean(x, na.rm = TRUE)` | | Using `=` in `aes()` | Should use `==` for comparison | `aes(color = category)` not `aes(color == category)` | | Not grouping before summarise | Summarises entire dataframe | Add `group_by()` before `summarise()` | | Modifying in loop | Slow, not idiomatic R | Use `mutate()` or `lapply()` | | Wrong data type for dates | Sorting/filtering fails | `mutate(date = as.Date(date))` | | Overwriting `data` variable | Confusing, error-prone | Use descriptive names: `sales_data` | ## Statistical Tests Quick Reference | Test | Use When | R Function | |------|----------|------------| | t-test | Compare 2 group means | `t.test(y ~ group)` | | Paired t-test | Before/after measurements | `t.test(before, after, paired = TRUE)` | | ANOVA | Compare 3+ group means | `aov(y ~ group)` | | Chi-square | Categorical association | `chisq.test(table(x, y))` | | Correlation | Linear relationship | `cor.test(x, y)` | | Wilcoxon | Non-parametric 2 groups | `wilcox.test(y ~ group)` | | Shapiro-Wilk | Test normality | `shapiro.test(x)` | ## Related Skills - **data-analysis-workflow** - Python equivalent with pandas/NumPy - **statistics-explainer** - Creating educational content about statistical concepts - **analytics-dashboard-builder** - Visualization patterns across languages --- **See r-reference.md for:** Complete tidyverse examples, advanced ggplot2, tidymodels ML workflows, time series with Prophet, R Markdown templates, and performance optimization.