+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)

PRINTER FRIENDLY

Item Response Theory. Edition No. 1

  • ID: 5211778
  • Book
  • June 2021
  • 400 Pages
  • John Wiley and Sons Ltd
To date, much of the application of IRT has been in the field of educational measurement, where for example, IRT has been used extensively by the Educational Testing Service for the development of scholastic aptitude tests.  IRT has played a major role in all major college and graduate school admission tests (SAT, ACT, GRE, GMAT, MCAT, …).  Unlike traditional tests based on classical test theory that summarizes the test result by a simple counting operation of number of correct responses, IRT provides model-based measurements in which the difficulty of the items, discrimination of high and low levels of the underlying latent variable(s) and the corresponding ability of the respondents can be estimated.  In IRT scoring of tests, a certain number of items can be arbitrarily added, deleted, or replaced without losing comparability of scores on the scale. Only the precision of measurement at some points on the scale is affected. This property of scaled measurement, as opposed to counts of events, is the most salient advantage of IRT over classical methods of educational and psychological measurement.  The evolution of IRT is now going beyond educational measurement.  Recent advances in multidimensional extensions of IRT and computerized adaptive testing are leading to major advances in patient reported outcome measures of physical and emotional well being.  In mental health research, IRT is now leading to a major paradigm shift in the screening and measurement of mental health disorders, substance abuse and suicidality, one of the leading causes of death in the world.  Multidimensional IRT extends the tools used to evaluate essentially unidimensional constructs such as mathematical ability to the measurement of complex traits such as depression, anxiety and psychosis.  In the next five years we expect that the use of multidimensional IRT for the measurement of complex traits will extend to other areas of health sciences and to problems in marketing research and practice where rapid adaptive tests administered through the internet will be able to precisiely measure consumer affinity for different products, events, and market sectors.  The methods described in this book will provide the foundation for these future developments.
Note: Product cover images may vary from those shown

Preface xvii

Acknowledgments xix

1 Foundations 1

1.1 The Logic of Item Response Theory 3

1.2 Model-based Data Analysis 4

1.3 Origins 5

1.3.1 Psychometric Scaling 6

1.3.2 Classical Test Theory 9

1.3.3 Contributions fromStatistics 10

1.4 The Population Concept in IRT 11

1.5 Generalizability Theory 14

2 Selected Mathematical and Statistical Results 21

2.1 Points, Point Sets, and Set Operations 21

2.2 Probability 24

2.3 Sampling 25

2.4 Joint, Conditional, and Marginal Probability 26

2.5 Probability Distributions and Densities 28

2.6 Describing Distributions 32

2.7 Functions of RandomVariables 34

2.7.1 Linear Functions 34

2.7.2 Nonlinear Functions 37

2.8 Elements ofMatrix Algebra 37

2.8.1 PartitionedMatrices 41

2.8.2 The Kronecker Product 42

2.8.3 Row and ColumnMatrices 43

2.8.4 Matrix Inversion 43

2.9 Determinants 45

2.10 Matrix Differentiation 45

2.10.1 Scalar Functions of Vector Variables 46

2.10.2 Vector Functions of a Vector Variable 47

2.10.3 Scalar Functions of aMatrix Variable 48

2.10.4 Chain Rule for Scalar Functions of a Matrix Variable 49

2.10.5 Matrix Functions of aMatrix Variable 49

2.10.6 Derivatives of a Scalar Function with Respect to a SymmetricMatrix 50

2.10.7 Second-order Differentiation 52

2.11 Theory of Estimation 53

2.11.1 Analysis of Variance 56

2.11.2 Estimating VarianceComponents 57

2.12 MaximumLikelihoodEstimation (MLE) 59

2.12.1 Likelihood Functions 59

2.12.2 The LikelihoodEquations 60

2.12.3 Examples of Maximum Likelihood Estimation 60

2.12.4 SamplingDistribution of the Estimator 62

2.12.5 The Fisher-scoring Solution of the Likelihood Equations 63

2.12.6 Properties of the Maximum Likelihood Estimator (MLE) 63

2.12.7 Constrained Estimation 64

2.12.8 Admissibility 64

2.13 Bayes Estimation 65

2.14 TheMaximumA Posteriori (MAP) Estimator 68

2.15 Marginal Maximum Likelihood Estimation (MMLE) 69

2.15.1 TheMarginal Likelihood Equations 70

2.15.2 Application in the “Normal-Normal” Case 72

2.15.3 The EMSolution 75

2.15.4 The Fisher-scoring Solution 75

2.16 Probit and LogitAnalysis 77

2.16.1 ProbitAnalysis 77

2.16.2 LogitAnalysis 79

2.16.3 Logit-linearAnalysis 80

2.16.4 Extension of Logit-linear Analysis to Multinomial Data 82

2.16.4.1 Graded Categories 83

2.16.4.2 NominalCategories 85

2.17 SomeResults fromClassical Test Theory 88

2.17.1 Test Reliability 90

2.17.2 Estimating Reliability 91

2.17.2.1 Bayes Estimation of True Scores 96

2.17.3 When are the Assumptions of Classical Test Theory Reasonable? 97

3 Unidimensional IRT Models 101

3.1 The General IRT Framework 103

3.2 Item ResponseModels 104

3.2.1 DichotomousCategories 105

3.2.1.1 Normal OgiveModel 105

3.2.1.2 2-PLModel 109

3.2.1.3 3-PLModel 111

3.2.1.4 1-PLModel 113

3.2.1.5 Illustration 114

3.2.2 PolytomousCategories 115

3.2.2.1 Graded CategoriesModel 118

3.2.2.2 Illustration 120

3.2.2.3 The NominalCategoriesModel 122

3.2.2.4 Nominal Multiple-Choice Model 130

3.2.2.5 Illustration 132

3.2.2.6 Partial CreditModel 135

3.2.2.7 Generalized Partial Credit Model 136

3.2.2.8 Illustration 136

3.2.2.9 Rating ScaleModels 136

3.2.3 RankingModel 139

4 Item Parameter Estimation - Binary Data 141

4.1 Estimation of Item Parameters Assuming Known Attribute

Values of the Respondents 142

4.1.1 Estimation 143

4.1.1.1 The 1-parameterModel 143

4.1.1.2 The 2-parameterModel 144

4.1.1.3 The 3-parameterModel 145

4.2 Estimation of Item Parameters Assuming Unknown Attribute Values of the Respondents 146

4.2.1 Joint Maximum Likelihood Estimation (JML) 147

4.2.1.1 The 1-parameter Logistic Model 147

4.2.1.2 Logit-linearAnalysis 148

4.2.1.3 Proportional Marginal Adjustments 153

4.2.2 Marginal Maximum Likelihood Estimation (MML) 158

4.2.2.1 The 2-parameterModel 162

5 Item Parameter Estimation - Polytomous Data 177

5.1 General Results 177

5.2 The Normal OgiveModel 182

5.3 The NominalCategoriesModel 183

5.4 The Graded CategoriesModel 185

5.5 The Generalized Partial Credit Model 188

5.5.1 The Unrestricted Version 189

5.5.2 The EMSolution 190

5.5.2.1 The GPCM Newton-Gauss Joint Solution 191

5.5.3 Rating ScaleModels 191

5.5.3.1 The EMSolution for the RSM 192

5.5.3.2 The Newton-Gauss Solution for the RSM 193

5.6 Boundary Problems 194

5.7 MultipleGroupModels 196

5.8 Discussion 197

5.9 Conclusions 200

6 Multidimensional IRT Models 201

6.1 Classical Multiple Factor Analysis of Test Scores 202

6.2 Classical Item Factor Analysis 203

6.3 Item Factor Analysis Based on Item Response Theory 205

6.4 Maximum Likelihood Estimation of Item Slopes and Intercepts 206

6.4.1 Estimating Parameters of the Item Response Model 208

6.5 Indeterminacies of Item Factor Analysis 212

6.5.1 Direction of Response 212

6.5.2 Indeterminacy of Location and Scale 212

6.5.3 Rotational Indeterminacy of Factor Loadings in exploratory Factor Analysis 213

6.5.3.1 Varimax Factor Pattern 214

6.5.3.2 Promax Factor Pattern 214

6.5.3.3 General andGroup Factors 215

6.5.3.4 Confirmatory Item Factor Analysis and the Bifactor Pattern 215

6.6 Estimation of Item Parameters and Respondent Scores in Item Bifactor Analysis 218

6.7 Estimating Factor Scores 219

6.8 Example 220

6.8.1 Exploratory Item Factor Analysis 221

6.8.2 Confirmatory Item Bifactor Analysis 223

6.9 Two-tierModel 227

6.10 Summary 230

7 Analysis of Dimensionality 233

7.1 Unidimensional Models and Multidimensional Data 234

7.2 Limited-InformationGoodness of Fit Tests 237

7.3 Example 240

7.3.1 Exploratory Item Factor Analysis 240

7.3.2 Confirmatory Item Bifactor Analysis 241

7.4 Discussion 242

8 Computerized Adaptive Testing 243

8.1 What is Computerized AdaptiveTesting? 243

8.2 Computerized Adaptive Testing - An Overview 244

8.3 Item Selection 245

8.3.1 UnidimensionalComputerized Adaptive Testing (UCAT) 246

8.3.1.1 Fisher Information in IRT Model 246

8.3.1.2 Maximizing Fisher Information (MFI) and Its Limitations 248

8.3.1.3 Modifications toMFI 249

8.3.2 MultidimensionalComputerized Adaptive Testing (MCAT) 251

8.3.2.1 Two Conceptualizations of the Information Function in Multidimensional Space 252

8.3.2.2 SelectionMethods inMCAT 253

8.3.3 Bifactor IRT 256

8.4 Terminating an Adaptive Test 257

8.5 AdditionalConsiderations 258

8.6 An Example fromMental HealthMeasurement 260

8.6.1 The CAT-Mental Health 261

8.6.2 Discussion 264

9 Differential Item Functioning 267

9.1 Introduction 267

9.2 Types of DIF 268

9.3 TheMantel-Haenszel Procedure 270

9.4 Lord’sWald Test 271

9.5 LagrangeMultiplier Test 272

9.6 LogisticRegression 273

9.7 Assessing DIF for the BifactorModel 275

9.8 Assessing DIF fromCATData 276

10 Estimating Respondent Attributes 279

10.1 Introduction 279

10.2 Ability Estimation 279

10.2.1 MaximumLikelihood280

10.2.2 BayesMAP 281

10.2.3 Bayes EAP 281

10.2.4 Ability Estimation for Polytomous data 282

10.2.5 Ability Estimation for Multidimensional IRT Models 283

10.2.6 Ability Estimation for the Bifactor Model 284

10.2.7 Estimation of the Ability Distribution 284

10.2.8 Domain Scores 285

11 Multiple Group Item Response Models 287

11.1 Introduction 287

11.2 IRT Estimation when the Grouping Structure is Known: TraditionalMultipleGroup

IRT 288

11.2.1 Example 291

11.3 IRT Estimation when the Grouping Structure is Unknown: Mixtures of Gaussian Components 292

11.3.1 TheMixture Distribution 293

11.3.2 The LikelihoodComponent 295

11.3.3 Algorithm 296

11.3.4 Unequal Variances 297

11.4 MultivariateProbit Analysis 297

11.4.1 TheModel 299

11.4.2 Identification 300

11.4.3 Estimation 300

11.4.4 Tests of Fit 301

11.4.5 Illustration 302

11.5 Multilevel IRTModels 306

11.5.1 The RaschModel 306

11.5.2 The Two-parameter LogisticModel 308

11.5.3 Estimation 308

11.5.4 Illustration 309

12 Test and Scale Development and Maintenance 311

12.1 Introduction 311

12.2 Item Banking 311

12.3 Item Calibration 314

12.3.1 The OEMMethod 315

12.3.2 TheMEMMethod 315

12.3.3 Stocking’sMethod A 315

12.3.4 Stocking’sMethod B 316

12.4 IRT Equating 318

12.4.1 Linking, Scale Aligning and Equating 318

12.4.2 Experimental Designs for Equating 319

12.4.2.1 SingleGroup (SG)Design 319

12.4.2.2 Equivalent Groups (EG) Design 319

12.4.2.3 Counterbalanced (CB) Design 319

12.4.2.4 The Anchor Test or Nonequivalent Groups with Anchor Test (NEAT) Design 319

12.5 Harmonization 320

12.6 Item Parameter Drift 322

12.7 Summary 323

13 Some Interesting Applications 325

13.1 Introduction 325

13.2 Bio-behavioral Synthesis 325

13.3 Mental HealthMeasurement 328

13.3.1 The CAT-Depression Inventory 328

13.3.2 The CAT-Anxiety Scale 330

13.3.3 The Measurement of Suicidality and the Prediction of Future Suicidal Attempt 331

13.3.4 Clinician and Self-rated Psychosis Measurement 332

13.3.5 Substance Use Disorder 334

13.3.6 Special Populations and Differential Item Functioning 335

13.3.6.1 Perinatal 335

13.3.6.2 Emergency Medicine 336

13.3.6.3 Latinos Taking Tests in Spanish 336

13.3.6.4 Criminal Justice 338

13.3.7 Intensive LongitudinalData 339

13.4 IRT inMachine Learning 340

Bibliography 343

Index 361

Note: Product cover images may vary from those shown
R. Darrell Bock
Robert D. Gibbons University of Illinois at Chicago.
Note: Product cover images may vary from those shown
Adroll
adroll