Topics List

Summarizing Data: One Variable

One variable statistics for a quantitative variable: five number summary, percentile, mean, trimmed mean, range, interquartile range (IQR), variance, standard deviation, z-score

One variable statistics for a categorical variable: frequency, relative frequency

One variable plots and shape: histogram, dot plot, boxplot, pie chart, bar chart, distribution, skew, modes

Types of variables: categorical, ordinal, quantitative

Features of data and statistics: observations, observational units (cases), variables, missing values, outliers, robust statistics, the empirical rule

Summarizing Data:Relationships Between Variables

One quantitative and one categorical variable: side-by-side boxplots

Two categorical variables: contingency tables (also known as cross-tabulations or two-way tables) including marginal, joint, and conditional distributions; side-by-side and stacked bar plots (stacked bar plots for conditional distributions are sometimes called segmented bar plots); independent

Three categorical variables: Simpson’s Paradox

Two quantitative variables: explanatory and response variable, scatterplot, correlation

Probability: Events

Vocabulary of probability of events: random, probability distribution, event, complement, conditional probability, independence, mutually exclusive / disjoint

Methods for calculating probabilities of events and combinations of events: the complement rule, the conditional probability formula, multiplication rules for independent and non-independent events, addition rules for mutually exclusive and non-mutually exclusive events, tree diagrams, Bayes’ rule

Probability: Random Variables

Vocabulary: random variable, discrete, continuous, probability mass function, probability density function

Notation: X, x, μ, x̅, σ, s

Discrete probability distributions: discrete uniform, Bernoulli, binomial

Continuous probability distributions: probability density function, continuous uniform, normal

Expectation (expected value, mean) and variance (standard deviation): formula for discrete random variables, expectation and variance of linear transformations of random variables, expectation and variance of the average of independent and identically distributed random variables

Normal distributions: parameters and properties, the standard normal distribution, the normal probability table, normal quantile plots

Sampling Distributions

Sampling distributions: parameters and their estimators, unbiased, mean and variance of estimators of proportions and means

Long-run behaviour of estimators of proportions and means: Central Limit Theorem

Data Collection

Vocabulary related to collecting data to make inferences: population and samples, scientific versus anecdotal evidence, parameter versus statistic

Sampling ideas: simple random sample (SRS), stratified and cluster sampling, sampling bias, use of randomization in order to generalize results to a population

Types of studies: observational studies, experiments

Mechanisms to explain observed associations: causation, confounding, and alternative explanations for observed associations,

Experiments: randomization, factor, level, treatment, placebo, blinding, block, control, replication, experimental bias, use of randomization to allow causal conclusions

Confidence Intervals Part 1

Vocabulary of confidence intervals: confidence level, margin of error, lower and upper limits, critical value

Interpretation of confidence intervals: statistical inference as the process of drawing conclusions based on data; the purpose of confidence intervals; correct and incorrect interpretations of confidence intervals

Calculation of confidence intervals: large sample confidence interval for a proportion; determination of the required sample size for a specified margin of error for a confidence interval for a proportion; relation of sample size, confidence level, and estimated proportion to the width of a confidence interval

Confidence Intervals Part 2

Additional vocabulary of confidence intervals: robust, coverage

Interpretation of confidence intervals: relation of the role of independence, sample size, and the shape of the probability distribution of the data on the coverage of confidence intervals for proportions and means

Calculation of confidence intervals: confidence interval for a mean using the t-distribution

The t-distribution: properties of the t-distribution, degrees of freedom

The Process of Statistical Tests

The structure of statistical tests: specifying hypotheses, test statistic, P-value, interpreting and making conclusions based on P-values.

Vocabulary: null and alternative hypotheses, one- and two-sided hypotheses, test statistic, P-value, statistical significance.

Tests of proportions: large sample sampling distribution and test statistic for a single proportion.

Tests of means: sampling distribution and test statistic for a single mean.

The Effective Use of Statistical Tests

Vocabulary: significance level (alpha), power, Type I errors, Type II errors, practical versus statistical significance, multiple testing, data snooping, robust.

Power and Type I and Type II errors: the elements of statistical tests that affect power, including trade-offs.

The appropriate application and interpretation of statistical tests: how the data were collected, the initial examination of the data, the necessary assumptions and the robustness of testing procedures, the role of sample size, statistical versus practical significance, multiple tests and data snooping.

The relationship between confidence intervals and statistical tests: how two-sided confidence intervals are related to two-sided significance tests.

Comparing Two Groups

Matched pairs: how to recognize matched pairs situations, the advantage of matched pairs in comparing two treatments or conditions, why it is not appropriate to carry out analyses ignoring matched pairs, how to carry out a test for the difference in the mean between the treatments or conditions in a matched pairs setting.

Proportions from two independent samples: the sampling distribution for the difference between two proportions from independent samples, how to construct and interpret confidence intervals for the difference of two proportions from independent samples, how to carry out a statistical test for the equality of two proportions from independent samples.

Means from two independent samples: the sampling distribution for the difference between two means from independent samples, how to construct and interpret confidence intervals for the difference of two means from independent samples, how to carry out a statistical test for the equality of two means from independent samples under the assumption of equal variances (pooled) or the assumption of unequal variances (Welch-Satterthwaite).

The sign test: when it is appropriate, how to carry it out, how to interpret the results.

Simple Linear Regression

Vocabulary of regression: Dependent / response variable, independent / explanatory / predictor variable, fitted values, residuals

Estimating the model parameters: The method of least squares (minimizing the sum of the squares of the residuals) for calculating the intercept and slope of line of best fit

Interpreting the fitted line: Interpretation of the slope and intercept of a regression line, The relationship between the regression slope and correlation, The coefficient of determination (R2), Interpreting linear regression with log-transformed variables

Model diagnostics: Using residual plots (scatterplot of residuals versus the independent variable or versus the fitted values) to assess the suitability of the regression line to summarize the data particularly looking for curvature and non-constant variance, Normal quantile plots of the residuals to assess normality, The effect on the line of best fit of unusual points, Use of log transformations to satisfy model conditions

Inference for regression: Confidence interval and statistical test for the slope