Data Analysis for research with RStudio
Statistics for Research
Purpose:
Statistics helps in designing studies, analyzing data, testing hypotheses, and drawing conclusions.
Types of Research:
Descriptive – summarize data (mean, SD)
Inferential – test hypotheses, generalize to populations
Experimental – cause-effect (e.g., clinical trials)
Observational – analyze associations (e.g., stroke vs age)
Key Statistical Methods:
t-test / ANOVA – compare group means
Chi-square – categorical variable association
Linear Regression – predict continuous outcomes
Logistic Regression – predict binary outcomes (e.g., stroke risk)
Poisson/NB – for count data
Cox Regression / KM curve – survival/time-to-event
ARIMA / Time Series – trend forecasting (e.g., pollution, GDP)
Panel Data Models – economic/environmental data across time
Visualization Tools:
Histogram, Boxplot, Violin plot, Scatterplot, ROC, Survival curves
Important Concepts:
p-value, Confidence Interval, Effect Size, Correlation
Common Mistakes:
Misuse of p-values
Ignoring assumptions
Overfitting models
Not adjusting for confounders
Software Tools:
R (best for academic/statistical modeling)
SPSS, Stata, Python (also widely used)
Reporting:
Always include: sample size, effect size, p-values, confidence intervals, tables, and clear plots