Correctly understanding and using medical statistics is a key skill for all medical students and health professionals.

In an informal and friendly style, *Medical Statistics from Scratch* provides a practical foundation for everyone whose first interest is probably not medical statistics. Keeping the level of mathematics to a minimum, it clearly illustrates statistical concepts and practice with numerous real-world examples and cases drawn from current medical literature.

*Medical Statistics from Scratch* is an ideal learning partner for all medical students and health professionals needing an accessible introduction, or a friendly refresher, to the fundamentals of medical statistics.

Preface to the 4th Edition xix

Preface to the 3rd Edition xxi

Preface to the 2nd Edition xxiii

Preface to the 1st Edition xxv

Introduction xxvii

**I Some Fundamental Stuff 1**

**1 First things first – the nature of data 3**

Variables and data 3

Where are we going …? 5

The good, the bad, and the ugly – types of variables 5

Categorical data 6

Nominal categorical data 6

Ordinal categorical data 7

Metric data 8

Discrete metric data 8

Continuous metric data 9

How can I tell what type of variable I am dealing with? 10

The baseline table 11

**II Descriptive Statistics 15**

**2 Describing data with tables 17**

Descriptive statistics. What can we do with raw data? 18

Frequency tables – nominal data 18

The frequency distribution 19

Relative frequency 20

Frequency tables – ordinal data 20

Frequency tables – metric data 22

Frequency tables with discrete metric data 22

Cumulative frequency 24

Frequency tables with continuous metric data – grouping the raw data 25

Open‐ended groups 27

Cross‐tabulation – contingency tables 28

Ranking data 30

**3 Every picture tells a story – describing data with charts 31**

Picture it! 32

Charting nominal and ordinal data 32

The pie chart 32

The simple bar chart 34

The clustered bar chart 35

The stacked bar chart 37

Charting discrete metric data 39

Charting continuous metric data 39

The histogram 39

The box (and whisker) plot 42

Charting cumulative data 44

The cumulative frequency curve with discrete metric data 44

The cumulative frequency curve with continuous metric data 44

Charting time‐based data – the time series chart 47

The scatterplot 48

The bubbleplot 49

**4 Describing data from its shape 51**

The shape of things to come 51

Skewness and kurtosis as measures of shape 52

Kurtosis 55

Symmetric or mound‐shaped distributions 56

Normalness – the Normal distribution 56

Bimodal distributions 58

Determining skew from a box plot 59

**5 Measures of location – Numbers R us 62**

Numbers, percentages, and proportions 62

Preamble 63

N umbers, percentages, and proportions 64

Handling percentages – for those of us who might need a reminder 65

Summary measures of location 67

The mode 68

The median 69

The mean 70

Percentiles 71

Calculating a percentile value 72

What is the most appropriate measure of location? 73

**6 Measures of spread – Numbers R us – (again) 75**

Preamble 76

The range 76

The interquartile range (IQR) 76

Estimating the median and interquartile range from the cumulative frequency curve 77

The boxplot (also known as the box and whisker plot) 79

Standard deviation 82

Standard deviation and the Normal distribution 84

Testing for Normality 86

Using SPSS 86

Using Minitab 87

Transforming data 88

**7 Incidence, prevalence, and standardisation 92**

Preamble 93

The incidence rate and the incidence rate ratio (IRR) 93

The incidence rate ratio 94

Prevalence 94

A couple of difficulties with measuring incidence and prevalence 97

Some other useful rates 97

Crude mortality rate 97

Case fatality rate 98

Crude maternal mortality rate 99

Crude birth rate 99

Attack rate 99

Age‐specific mortality rate 99

Standardisation – the age‐standardised mortality rate 101

The direct method 102

The standard population and the comparative mortality ratio (CMR) 103

The indirect method 106

The standardised mortality rate 107

**III The Confounding Problem 111**

**8 Confounding – like the poor, (nearly) always with us 113**

Preamble 114

What is confounding? 114

Confounding by indication 117

Residual confounding 119

Detecting confounding 119

Dealing with confounding – if confounding is such a problem, what can we do about it? 120

Using restriction 120

Using matching 121

Frequency matching 121

One‐to‐one matching 121

Using stratification 122

Using adjustment 122

Using randomisation 122

**IV Design and Data 125**

**9 Research design – Part I: Observational study designs 127**

Preamble 128

Hey ho! Hey ho! it’s off to work we go 129

Types of study 129

Observational studies 130

Case reports 130

Case series studies 131

Cross‐sectional studies 131

Descriptive cross‐sectional studies 132

Confounding in descriptive cross‐sectional studies 132

Analytic cross‐sectional studies 133

Confounding in analytic cross‐sectional studies 134

From here to eternity – cohort studies 135

Confounding in the cohort study design 139

Back to the future – case–control studies 139

Confounding in the case–control study design 141

Another example of a case–control study 142

Comparing cohort and case–control designs 143

Ecological studies 144

The ecological fallacy 145

**10 Research design – Part II: getting stuck in – experimental studies 146**

Clinical trials 147

Randomisation and the randomised controlled trial (RCT) 148

Block randomisation 149

Stratification 149

Blinding 149

The crossover RCT 150

Selection of participants for an RCT 153

Intention to treat analysis (ITT) 154

**11 Getting the participants for your study: ways of sampling 156**

From populations to samples – statistical inference 157

Collecting the data – types of sample 158

The simple random sample and its offspring 159

The systematic random sample 159

The stratified random sample 160

The cluster sample 160

Consecutive and convenience samples 161

How many participants should we have? Sample size 162

Inclusion and exclusion criteria 162

Getting the data 163

**V Chance Would Be a Fine Thing 165**

**12 The idea of probability 167**

Preamble 167

Calculating probability – proportional frequency 168

Two useful rules for simple probability 169

Rule 1. The multiplication rule for independent events 169

Rule 2. The addition rule for mutually exclusive events 170

Conditional and Bayesian statistics 171

Probability distributions 171

Discrete versus continuous probability distributions 172

The binomial probability distribution 172

The Poisson probability distribution 173

The Normal probability distribution 174

**13 Risk and odds 175**

Absolute risk and the absolute risk reduction (ARR) 176

The risk ratio 178

The reduction in the risk ratio (or relative risk reduction (RRR)) 178

A general formula for the risk ratio 179

Reference value 179

N umber needed to treat (NNT) 180

What happens if the initial risk is small? 181

Confounding with the risk ratio 182

Odds 183

Why you can’t calculate risk in a case–control study 185

The link between probability and odds 186

The odds ratio 186

Confounding with the odds ratio 189

Approximating the risk ratio from the odds ratio 189

**VI The Informed Guess – An Introduction to Confidence Intervals 191**

**14 Estimating the value of a single population parameter – the idea of confidence intervals 193**

Confidence interval estimation for a population mean 194

The standard error of the mean 195

How we use the standard error of the mean to calculate a confidence interval for a population mean 197

Confidence interval for a population proportion 200

Estimating a confidence interval for the median of a single population 203

**15 Using confidence intervals to compare two population parameters 206**

What’s the difference? 207

Comparing two *independent *population means 207

An example using birthweights 208

Assessing the evidence using the confidence interval 211

Comparing two *paired *population means 215

Within‐subject and between‐subject variations 215

Comparing two *independent *population proportions 217

Comparing two *independent *population medians – the Mann–Whitney rank sums method 219

Comparing two *matched *population medians – the Wilcoxon signed‐ranks method 220

**16 Confidence intervals for the ratio of two population parameters 224**

Getting a confidence interval for the *ratio *of two independent population means 225

Confidence interval for a population risk ratio 226

Confidence intervals for a population odds ratio 229

Confidence intervals for hazard ratios 232

**VII Putting it to the Test 235**

**17 Testing hypotheses about the difference between two population parameters 237**

Answering the question 238

The hypothesis 238

The null hypothesis 239

The hypothesis testing process 240

The p‐value and the decision rule 241

A brief summary of a few of the commonest tests 242

Using the *p*‐value to compare the means of two independent populations 244

Interpreting computer hypothesis test results for the difference in two independent population means – the two‐sample *t *test 245

Output from Minitab – two‐sample *t *test of difference in mean birthweights of babies born to white mothers and to non‐white mothers 245

Output from SPSS_: two‐sample *t *test of difference in mean birthweights of babies born to white mothers and to non‐white mothers 246

Comparing the means of two paired populations – the matched‐pairs *t *test 248

Using *p*‐values to compare the medians of two independent populations: the Mann–Whitney rank‐sums test 248

How the Mann–Whitney test works 249

Correction for multiple comparisons 250

The Bonferroni correction for multiple testing 250

Interpreting computer output for the Mann–Whitney test 252

With Minitab 252

With SPSS 252

Two matched medians – the Wilcoxon signed‐ranks test 254

Confidence intervals versus hypothesis testing 254

What could possibly go wrong? 255

Types of error 256

The power of a test 257

Maximising power – calculating sample size 258

Rule of thumb 1. Comparing the means of two independent populations (metric data) 258

Rule of thumb 2. Comparing the proportions of two independent populations (binary data) 259

**18 The Chi‐squared (χ 2) test – what, why, and how? 261**

Of all the tests in all the world – you had to walk into my hypothesis testing procedure 262

Using chi‐squared to test for related‐ness or for the equality of proportions 262

Calculating the chi‐squared statistic 265

Using the chi-squared statistic 267

Yate’s correction (continuity correction) 268

Fisher’s exact test 268

The chi‐squared test with Minitab 269

The chi‐squared test with SPSS 270

The chi‐squared test for trend 272

SPSS output for chi‐squared trend test 274

**19 Testing hypotheses about the ratio of two population parameters 276**

Preamble 276

The chi‐squared test with the risk ratio 277

The chi‐squared test with odds ratios 279

The chi‐squared test with hazard ratios 281

**VIII Becoming Acquainted 283**

**20 Measuring the association between two variables 285**

Preamble – plotting data 286

Association 287

The scatterplot 287

The correlation coefficient 290

Pearson’s correlation coefficient 290

Is the correlation coefficient statistically significant in the population? 292

Spearman’s rank correlation coefficient 294

**21 Measuring agreement 298**

To agree or not agree: that is the question 298

Cohen’s kappa (*κ*) 300

Some shortcomings of kappa 303

Weighted kappa 303

Measuring the agreement between two metric continuous variables, the Bland–Altmann plot 303

**IX Getting into a Relationship 307**

**22 Straight line models: linear regression 309**

Health warning! 310

Relationship and association 310

A causal relationship – explaining variation 312

Refresher – finding the equation of a straight line from a graph 313

The linear regression model 314

First, is the relationship linear? 315

Estimating the regression parameters – the method of ordinary least squares (OLS) 316

Basic assumptions of the ordinary least squares procedure 317

Back to the example – is the relationship statistically significant? 318

Using SPSS to regress birthweight on mother’s weight 318

Using Minitab 319

Interpreting the regression coefficients 320

Goodness‐of‐fit, *R2 *320

Multiple linear regression 322

Adjusted goodness‐of‐fit: *R̄*2** **324

Including nominal covariates in the regression model: design variables and coding 326

Building your model. Which variables to include? 327

Automated variable selection methods 328

Manual variable selection methods 329

Adjustment and confounding 330

Diagnostics – checking the basic assumptions of the multiple linear regression model 332

Analysis of variance 333

**23 Curvy models: logistic regression 334**

A second health warning! 335

The binary outcome variable 335

Finding an appropriate model when the outcome variable is binary 335

The logistic regression model 337

Estimating the parameter values 338

Interpreting the regression coefficients 338

Have we got a significant result? statistical inference in the logistic regression model 340

The Odds Ratio 341

The multiple logistic regression model 343

Building the model 344

Goodness‐of‐fit 346

**24 Counting models: Poisson regression 349**

Preamble 350

Poisson regression 350

The Poisson regression equation 351

Estimating β1 and β2 with the estimators *b*0 and *b*1 352

Interpreting the estimated coefficients of a Poisson regression, *b*0 and *b*1 352

Model building – variable selection 355

Goodness‐of‐fit 357

Zero‐inflated Poisson regression 358

Negative binomial regression 359

Zero‐inflated negative binomial regression 361

**X Four More Chapters 363**

**25 Measuring survival 365**

Preamble 366

Censored data 366

A simple example of survival in a single group 366

Calculating survival probabilities and the proportion surviving: the Kaplan–Meier table 368

The Kaplan–Meier curve 369

Determining median survival time 369

Comparing survival with two groups 370

The log‐rank test 371

An example of the log‐rank test in practice 372

The hazard ratio 372

The proportional hazards (Cox’s) regression model – introduction 373

The proportional hazards (Cox’s) regression model – the detail 376

Checking the assumptions of the proportional hazards model 377

An example of proportional hazards regression 377

**26 Systematic review and meta‐analysis 380**

Introduction 381

Systematic review 381

The forest plot 383

Publication and other biases 384

The funnel plot 386

Significance tests for bias – Begg’s and Egger’s tests 387

Combining the studies: meta‐analysis 389

The problem of heterogeneity – the Q and I2 tests 389

**27 Diagnostic testing 393**

Preamble 393

The measures – sensitivity and specificity 394

The positive prediction and negative prediction values (PPV and NPV) 395

The sensitivity–specificity trade‐off 396

Using the ROC curve to find the optimal sensitivity versus specificity trade‐off 397

**28 Missing data 400**

The missing data problem 400

Types of missing data 403

Missing completely at random (MCAR) 403

Missing at Random (MAR) 403

Missing not at random (MNAR) 404

Consequences of missing data 405

Dealing with missing data 405

Do nothing – the wing and prayer approach 406

List‐wise deletion 406

Pair‐wise deletion 407

Imputation methods – simple imputation 408

Replacement by the Mean 408

Last observation carried forward 409

Regression‐based imputation 410

Multiple imputation 411

Full Information Maximum Likelihood (FIML) and other methods 412

Appendix: Table of random numbers 414

References 415

Solutions to Exercises 424

Index 457