Topics List
Summarizing Data: One Variable
One variable statistics for a quantitative variable: five number summary, percentile, mean, trimmed mean, range, interquartile range (IQR), variance, standard deviation, z-score
One variable statistics for a categorical variable: frequency, relative frequency
One variable plots and shape: histogram, dot plot, boxplot, pie chart, bar chart, distribution, skew, modes
Types of variables: categorical, ordinal, quantitative
Features of data and statistics: observations, observational units (cases), variables, missing values, outliers, robust statistics, the empirical rule
Summarizing Data:Relationships Between Variables
One quantitative and one categorical variable: side-by-side boxplots
Two categorical variables: contingency tables (also known as cross-tabulations or two-way tables) including marginal, joint, and conditional distributions; side-by-side and stacked bar plots (stacked bar plots for conditional distributions are sometimes called segmented bar plots); independent
Three categorical variables: Simpson’s Paradox
Two quantitative variables: explanatory and response variable, scatterplot, correlation
Probability: Events
Vocabulary of probability of events: random, probability distribution, event, complement, conditional probability, independence, mutually exclusive / disjoint
Methods for calculating probabilities of events and combinations of events: the complement rule, the conditional probability formula, multiplication rules for independent and non-independent events, addition rules for mutually exclusive and non-mutually exclusive events, tree diagrams, Bayes’ rule
Probability: Random Variables
Vocabulary: random variable, discrete, continuous, probability mass function, probability density function
Notation: X, x, μ, x̅, σ, s
Discrete probability distributions: discrete uniform, Bernoulli, binomial
Continuous probability distributions: probability density function, continuous uniform, normal
Expectation (expected value, mean) and variance (standard deviation): formula for discrete random variables, expectation and variance of linear transformations of random variables, expectation and variance of the average of independent and identically distributed random variables
Normal distributions: parameters and properties, the standard normal distribution, the normal probability table, normal quantile plots
Sampling Distributions
Sampling distributions: parameters and their estimators, unbiased, mean and variance of estimators of proportions and means
Long-run behaviour of estimators of proportions and means: Central Limit Theorem
Data Collection
Vocabulary related to collecting data to make inferences: population and samples, scientific versus anecdotal evidence, parameter versus statistic
Sampling ideas: simple random sample (SRS), stratified and cluster sampling, sampling bias, use of randomization in order to generalize results to a population
Types of studies: observational studies, experiments
Mechanisms to explain observed associations: causation, confounding, and alternative explanations for observed associations,
Experiments: randomization, factor, level, treatment, placebo, blinding, block, control, replication, experimental bias, use of randomization to allow causal conclusions
Confidence Intervals Part 1
Vocabulary of confidence intervals: confidence level, margin of error, lower and upper limits, critical value
Interpretation of confidence intervals: statistical inference as the process of drawing conclusions based on data; the purpose of confidence intervals; correct and incorrect interpretations of confidence intervals
Calculation of confidence intervals: large sample confidence interval for a proportion; determination of the required sample size for a specified margin of error for a confidence interval for a proportion; relation of sample size, confidence level, and estimated proportion to the width of a confidence interval
Confidence Intervals Part 2
Additional vocabulary of confidence intervals: robust, coverage
Interpretation of confidence intervals: relation of the role of independence, sample size, and the shape of the probability distribution of the data on the coverage of confidence intervals for proportions and means
Calculation of confidence intervals: confidence interval for a mean using the t-distribution
The t-distribution: properties of the t-distribution, degrees of freedom
The Process of Statistical Tests
The structure of statistical tests: specifying hypotheses, test statistic, P-value, interpreting and making conclusions based on P-values.
Vocabulary: null and alternative hypotheses, one- and two-sided hypotheses, test statistic, P-value, statistical significance.
Tests of proportions: large sample sampling distribution and test statistic for a single proportion.
Tests of means: sampling distribution and test statistic for a single mean.
The Effective Use of Statistical Tests
Vocabulary: significance level (alpha), power, Type I errors, Type II errors, practical versus statistical significance, multiple testing, data snooping, robust.
Power and Type I and Type II errors: the elements of statistical tests that affect power, including trade-offs.
The appropriate application and interpretation of statistical tests: how the data were collected, the initial examination of the data, the necessary assumptions and the robustness of testing procedures, the role of sample size, statistical versus practical significance, multiple tests and data snooping.
The relationship between confidence intervals and statistical tests: how two-sided confidence intervals are related to two-sided significance tests.
Comparing Two Groups
Matched pairs: how to recognize matched pairs situations, the advantage of matched pairs in comparing two treatments or conditions, why it is not appropriate to carry out analyses ignoring matched pairs, how to carry out a test for the difference in the mean between the treatments or conditions in a matched pairs setting.
Proportions from two independent samples: the sampling distribution for the difference between two proportions from independent samples, how to construct and interpret confidence intervals for the difference of two proportions from independent samples, how to carry out a statistical test for the equality of two proportions from independent samples.
Means from two independent samples: the sampling distribution for the difference between two means from independent samples, how to construct and interpret confidence intervals for the difference of two means from independent samples, how to carry out a statistical test for the equality of two means from independent samples under the assumption of equal variances (pooled) or the assumption of unequal variances (Welch-Satterthwaite).
The sign test: when it is appropriate, how to carry it out, how to interpret the results.
Simple Linear Regression
Vocabulary of regression: Dependent / response variable, independent / explanatory / predictor variable, fitted values, residuals
Estimating the model parameters: The method of least squares (minimizing the sum of the squares of the residuals) for calculating the intercept and slope of line of best fit
Interpreting the fitted line: Interpretation of the slope and intercept of a regression line, The relationship between the regression slope and correlation, The coefficient of determination (R2), Interpreting linear regression with log-transformed variables
Model diagnostics: Using residual plots (scatterplot of residuals versus the independent variable or versus the fitted values) to assess the suitability of the regression line to summarize the data particularly looking for curvature and non-constant variance, Normal quantile plots of the residuals to assess normality, The effect on the line of best fit of unusual points, Use of log transformations to satisfy model conditions
Inference for regression: Confidence interval and statistical test for the slope