The study of population health often involves the use of observational data from existing data sets, complex survey designs and longitudinal follow–up. Ordinary regression analysis, familiar to most researchers and practitioners is inadequate for analyzing such data and answering important questions about the relationship of risk factors to health. Nonstatisticians such as epidemiologists and health services researchers require a working knowledge of the sophisticated modeling techniques used by professional statisticians. Quantitative Methods in Population Health provides an accessible guide for students in an applied statistics sequence as well as for practicing researchers and professionals.
Mari Palta s timely text assumes some background in mathematics and in applied statistics and regression analysis, but little knowledge of statistical theory. The Statistical Analysis System® (SAS) is an ubiquitous tool in the field, and some basic knowledge of its structure is assumed. Each topic starts with an explanation of the theoretical background that is necessary for understanding the technique as well as for establishing a basis to adopt more advanced methods or software in the future. The author endeavors to keep the material immediately applicable by providing detailed instructions for how to run and interpret procedures in SAS. Topics covered include:
- Regression analysis with weights
- Unequal variance
- Correlated and longitudinal outcomes
- Mixed effects
- Generalized linear models
- Generalized estimating equations
SAS commands for applying the methods including PROC REG, PROC MIXED, and PROC GENMOD are provided, and each section includes real–life examples. Quantitative Methods in Population Health proves a seamless meshing of the theoretical and practical in this vital field.
I.1 Newborn Lung Project.
I.2 Wisconsin Diabetes Registry.
I.3 Wisconsin Sleep Cohort Study.
1 Review of Ordinary Linear Regression and Its Assumptions.
1.1 The Ordinary Linear Regression Equation and Its Assumptions.
1.1.1 Straight–Line Relationship.
1.1.2 Equal Variance Assumption.
1.1.3 Normality Assumption.
1.1.4 Independence Assumption.
1.2 A Note on How the Least–Squares Estimators are Obtained.
Output Packet I: Examples of Ordinary Regression Analyses.
2 The Maximum Likelihood Approach to Ordinary Regression.
2.1 Maximum Likelihood Estimation.
2.3 Properties of Maximum Likelihood Estimators.
2.4 How to Obtain a Residual Plot with PROC MIXED.
Output Packet II: Using PROC MIXED and Comparisons to PROC RE G.
3 Reformulating Ordinary Regression Analysis in Matrix Notation.
3.1 Writing the Ordinary Regression Equation in Matrix Notation.
3.2 Obtaining the Least–Squares Estimator in Matrix Notation.
3.2.1 Example: Matrices in Regression Analysis.
3.3 List of Matrix Operations to Know.
4 Variance Matrices and Linear Transformations.
4.1 Variance and Correlation Matrices.
4.2 How to Obtain the Variance of a Linear Transformation.
4.2.1 Two Variables.
4.2.2 Many Variables.
5 Variance Matrices of Estimators of Regression Coefficients.
5.1 Usual Standard Error of Least–Squares Estimator of Regression Slope in Nonmatrix Formulation.
5.2 Standard Errors of Least–Squares Regression Estimators in Matrix Notation.
5.3 The Large Sample Variance Matrix of Maximum Likelihood Estimators.
5.4 Tests and Confidence Intervals.
5.4.1 Example–Comparing PROC REG and PROC MIXED.
6 Dealing with Unequal Variance Around the Regression Line.
6.1 Ordinary Least Squares with Unequal Variance.
6.2 Analysis Taking Unequal Variance into Account.
6.2.1 The Functional Transformation Approach.
6.2.2 The Linear Transformation Approach.
6.2.3 Standard Errors of Weighted Regression Estimators.
Output Packet III: Applying the Empirical Option to Adjust Standard Errors.
Output Packet IV: Analyses with Transformation of the Outcome Variable to Equalize Residual Variance.
Output Packet V: Weighted Regression Analyses of GHb Data on Age.
7 Application of Weighting with Probability Sampling and Nonresponse.
7.1 Sample Surveys with Unequal Probability Sampling.
7.2 Examining the Impact of Nonresponse.
7.2.1 Example (of Reweighting as Well as Some SAS Manipulations).
7.2.2 A Few Comments on Weighting by a Variable Versus Including it in the Regression Model.
Output Packet VI: Survey and Missing Data Weights.
8 Principles in Dealing with Correlated Data.
8.1 Analysis of Correlated Data by Ordinary Unweighted Least–Squares Estimation.
8.1.2 Deriving the Variance Estimator.
8.2 Specifying Correlation and Variance Matrices.
8.3 The Least–Squares Equation Incorporating Correlation.
8.3.1 Another Application of the Spectral Theorem.
8.4 Applying the Spectral Theorem to the Regression Analysis of Correlated Data.
8.5 Analysis of Correlated Data by Maximum Likelihood.
8.5.1 Non equal Variance.
8.5.2 Correlated Errors.
Output Packet VII: Analysis of Longitudinal Data in Wisconsin Sleep Cohort.
9 A Further Study of How the Transformation Works with Correlated Data.
9.1 Why Would ?W and ?B Differ?
9.2 How the Between– and Within–Individual Estimators are Combined.
9.3 How to Proceed in Practice.
Output Packet VIII: Investigating and Fitting Within– and Between–Individual Effects.
10 Random Effects.
10.1 Random Intercept.
10.2 Random Slopes.
10.3 Obtaining The Best Estimates of Individual Intercepts and Slopes.
Output Packet IX: Fitting Random Effects Models.
11 The Normal Distribution and Likelihood Revisited.
11.1 PROC GENMOD.
Output Packet X: Introducing PROC GENMOD.
12 The Generalization to Non–normal Distributions.
12.1 The Exponential Family.
12.1.1 The Binomial Distribution.
12.1.2 The Poisson Distribution.
12.2 Score Equations for the Exponential Family and the Canonical Link.
12.3 Other Link Functions.
13 Modeling Binomial and Binary Outcomes.
13.1 A Brief Review of Logistic Regression.
13.1.1 Example: Review of the Output from PROC LOGIST.
13.2 Analysis of Binomial Data in the Generalized Linear Models Framework.
13.2.1 Example of Logistic Regression with Binary Outcome.
13.2.2 Example with Binomial Outcome.
13.2.3 Some More Examples of Goodness–of–Fit Tests.
13.3 Other Links for Binary and Binomial Data.
Output Packet XI: Logistic Regression Analysis with PROC LOGIST and PROC GENMOD.
Output Packet XII: Analysis of Grouped Binomial Data.
Output Packet XIII: Some Goodness–of–Fit Tests for Binomial Outcome.
Output Packet XIV: Three Link Functions for Binary Outcome.
Output Packet XV: Poisson Regression.
Output Packet XVI: Dealing with Overdispersion in Rates.
14 Modeling Poisson Outcomes The Analysis of Rates.
14.1 Review of Rates.
14.1.1 Relationship Between Rate and Risk.
14.2 Regression Analysis.
14.3 Example with Cancer Mortality Rates.
14.3.1 Example with Hospitalization of Infants.
14.4.1 Fitting a Dispersion Parameter.
14.4.2 Fitting a Different Distribution.
14.4.3 Using Robust Standard Errors.
14.4.4 Applying Adjustments for Over Dispersion to the Examples.
Output Packet XV: Poisson Regression.
15 Modeling Correlated Outcomes with Generalized Estimating Equations.
15.1 A Brief Review and Reformulation of the Normal Distribution, Least Squares and Likelihood.
15.2 Further Developments for the Exponential Family.
15.3 How are the Generalized Estimating Equations Justified?
15.3.1 Analysis of Longitudinal Systolic Blood Pressure by PROC MIXED and GENMOD.
15.3.2 Analysis of Longitudinal Hypertension Data by PROC GENMOD.
15.3.3 Analysis of Hospitalizations Among VLBW Children Up to Age 5.
15.4 Another Way to Deal with Correlated Binary Data.
Output Packet XVII: Mixed Versus GENMOD for Longitudinal SBP and Hypertension Data.
Output Packet XVIII: Longitudinal Analysis of Rates.
Output Packet XIX: Conditional Logistic Regression of Hypertension Data.
Appendix: Matrix Operations.
A.1 Adding Matrices.
A.2 Multiplying Matrices by a Number.
A.3 Multiplying Matrices by Each Other.
A.4 The Inverse of a Matrix.