Numerical Issues in Statistical Computing for the Social Scientist. Wiley Series in Probability and Statistics

  • ID: 2172600
  • Book
  • 352 Pages
  • John Wiley and Sons Ltd
1 of 4
At last—a social scientist′s guide through the pitfalls of modern statistical computing

Addressing the current deficiency in the literature on statistical methods as they apply to the social and behavioral sciences, Numerical Issues in Statistical Computing for the Social Scientist seeks to provide readers with a unique practical guidebook to the numerical methods underlying computerized statistical calculations specific to these fields. The authors demonstrate that knowledge of these numerical methods and how they are used in statistical packages is essential for making accurate inferences. With the aid of key contributors from both the social and behavioral sciences, the authors have assembled a rich set of interrelated chapters designed to guide empirical social scientists through the potential minefield of modern statistical computing.

Uniquely accessible and abounding in modern–day tools, tricks, and advice, the text successfully bridges the gap between the current level of social science methodology and the more sophisticated technical coverage usually associated with the statistical field.

Highlights include:

  • A focus on problems occurring in maximum likelihood estimation
  • Integrated examples of statistical computing (using software packages such as the SAS, Gauss, Splus, R, Stata, LIMDEP, SPSS, WinBUGS, and MATLAB®)
  • A guide to choosing accurate statistical packages
  • Discussions of a multitude of computationally intensive statistical approaches such as ecological inference, Markov chain Monte Carlo, and spatial regression analysis
  • Emphasis on specific numerical problems, statistical procedures, and their applications in the field
  • Replications and re–analysis of published social science research, using innovative numerical methods
  • Key numerical estimation issues along with the means of avoiding common pitfalls
  • A related Web site includes test data for use in demonstrating numerical problems, code for applying the original methods described in the book, and an online bibliography of Web resources for the statistical computation

Designed as an independent research tool, a professional reference, or a classroom supplement, the book presents a well–thought–out treatment of a complex and multifaceted field.

READ MORE
Note: Product cover images may vary from those shown
2 of 4

Preface xi

1 Introduction: Consequences of Numerical Inaccuracy 1

1.1 Importance of Understanding Computational Statistics 1

1.2 Brief History: Duhem to the Twenty–First Century 3

1.3 Motivating Example: Rare Events Counts Models 6

1.4 Preview of Findings 10

2 Sources of Inaccuracy in Statistical Computation 12

2.1 Introduction 12

2.1.1 Revealing Example: Computing the Coefficient Standard Deviation 12

2.1.2 Some Preliminary Conclusions 13

2.2 Fundamental Theoretical Concepts 15

2.2.1 Accuracy and Precision 15

2.2.2 Problems, Algorithms, and Implementations 15

2.3 Accuracy and Correct Inference 18

2.3.1 Brief Digression: Why Statistical Inference Is Harder in Practice Than It Appears 20

2.4 Sources of Implementation Errors 21

2.4.1 Bugs, Errors, and Annoyances 22

2.4.2 Computer Arithmetic 23

2.5 Algorithmic Limitations 29

2.5.1 Randomized Algorithms 30

2.5.2 Approximation Algorithms for Statistical Functions 31

2.5.3 Heuristic Algorithms for Random Number Generation 32

2.5.4 Local Search Algorithms 39

2.6 Summary 41

3 Evaluating Statistical Software 44

3.1 Introduction 44

3.1.1 Strategies for Evaluating Accuracy 44

3.1.2 Conditioning 47

3.2 Benchmarks for Statistical Packages 48

3.2.1 NIST Statistical Reference Datasets 49

3.2.2 Benchmarking Nonlinear Problems with StRD 51

3.2.3 Analyzing StRD Test Results 53

3.2.4 Empirical Tests of Pseudo–Random Number Generation 54

3.2.5 Tests of Distribution Functions 58

3.2.6 Testing the Accuracy of Data Input and Output 60

3.3 General Features Supporting Accurate and Reproducible Results 63

3.4 Comparison of Some Popular Statistical Packages 64

3.5 Reproduction of Research 65

3.6 Choosing a Statistical Package 69

4 Robust Inference 71

4.1 Introduction 71

4.2 Some Clarification of Terminology 71

4.3 Sensitivity Tests 73

4.3.1 Sensitivity to Alternative Implementations and Algorithms 73

4.3.2 Perturbation Tests 75

4.3.3 Tests of Global Optimality 84

4.4 Obtaining More Accurate Results 91

4.4.1 High–Precision Mathematical Libraries 92

4.4.2 Increasing the Precision of Intermediate Calculations 93

4.4.3 Selecting Optimization Methods 95

4.5 Inference for Computationally Difficult Problems 103

4.5.1 Obtaining Confidence Intervals with Ill–Behaved Functions 104

4.5.2 Interpreting Results in the Presence of Multiple Modes 106

4.5.3 Inference in the Presence of Instability 114

5 Numerical Issues in Markov Chain Monte Carlo Estimation 118

5.1 Introduction 118

5.2 Background and History 119

5.3 Essential Markov Chain Theory 120

5.3.1 Measure and Probability Preliminaries 120

5.3.2 Markov Chain Properties 121

5.3.3 The Final Word (Sort of) 125

5.4 Mechanics of Common MCMC Algorithms 126

5.4.1 Metropolis–Hastings Algorithm 126

5.4.2 Hit–and–Run Algorithm 127

5.4.3 Gibbs Sampler 128

5.5 Role of Random Number Generation 129

5.5.1 Periodicity of Generators and MCMC Effects 130

5.5.2 Periodicity and Convergence 132

5.5.3 Example: The Slice Sampler 135

5.5.4 Evaluating WinBUGS 137

5.6 Absorbing State Problem 139

5.7 Regular Monte Carlo Simulation 140

5.8 So What Can Be Done? 141

6 Numerical Issues Involved in Inverting Hessian Matrices 143
Jeff Gill and Gary King

6.1 Introduction 143

6.2 Means versus Modes 145

6.3 Developing a Solution Using Bayesian Simulation Tools 147

6.4 What Is It That Bayesians Do? 148

6.5 Problem in Detail: Noninvertible Hessians 149

6.6 Generalized Inverse/Generalized Cholesky Solution 151

6.7 Generalized Inverse 151

6.7.1 Numerical Examples of the Generalized Inverse 154

6.8 Generalized Cholesky Decomposition 155

6.8.1 Standard Algorithm 156

6.8.2 Gill–Murray Cholesky Factorization 156

6.8.3 Schnabel–Eskow Cholesky Factorization 158

6.8.4 Numerical Examples of the Generalized Cholesky Decomposition 158

6.9 Importance Sampling and Sampling Importance Resampling 160

6.9.1 Algorithm Details 160

6.9.2 SIR Output 162

6.9.3 Relevance to the Generalized Process 163

6.10 Public Policy Analysis Example 163

6.10.1 Texas 164

6.10.2 Florida 168

6.11 Alternative Methods 171

6.11.1 Drawing from the Singular Normal 171

6.11.2 Aliasing 173

6.11.3 Ridge Regression 173

6.11.4 Derivative Approach 174

6.11.5 Bootstrapping 174

6.11.6 Respecification (Redux) 175

6.12 Concluding Remarks 176

7 Numerical Behavior of King’s EI Method 177

7.1 Introduction 177

7.2 Ecological Inference Problem and Proposed Solutions 179

7.3 Numeric Accuracy in Ecological Inference 180

7.3.1 Case Study 1: Examples from King (1997) 182

7.3.2 Nonlinear Optimization 186

7.3.3 Pseudo–Random Number Generation 187

7.3.4 Platform and Version Sensitivity 188

7.4 Case Study 2: Burden and Kimball (1998) 189

7.4.1 Data Perturbation 191

7.4.2 Option Dependence 194

7.4.3 Platform Dependence 195

7.4.4 Discussion: Summarizing Uncertainty 196

7.5 Conclusions 197

8 Some Details of Nonlinear Estimation 199
B. D. McCullough

8.1 Introduction 199

8.2 Overview of Algorithms 200

8.3 Some Numerical Details 204

8.4 What Can Go Wrong? 206

8.5 Four Steps 210

8.5.1 Step 1: Examine the Gradient 211

8.5.2 Step 2: Inspect the Trace 211

8.5.3 Step 3: Analyze the Hessian 212

8.5.4 Step 4: Profile the Objective Function 212

8.6 Wald versus Likelihood Inference 215

8.7 Conclusions 217

9 Spatial Regression Models 219
James P. LeSage

9.1 Introduction 219

9.2 Sample Data Associated with Map Locations 219

9.2.1 Spatial Dependence 219

9.2.2 Specifying Dependence Using Weight Matrices 220

9.2.3 Estimation Consequences of Spatial Dependence 222

9.3 Maximum Likelihood Estimation of Spatial Models 223

9.3.1 Sparse Matrix Algorithms 224

9.3.2 Vectorization of the Optimization Problem 225

9.3.3 Trade–offs between Speed and Numerical Accuracy 226

9.3.4 Applied Illustrations 228

9.4 Bayesian Spatial Regression Models 229

9.4.1 Bayesian Heteroscedastic Spatial Models 230

9.4.2 Estimation of Bayesian Spatial Models 231

9.4.3 Conditional Distributions for the SAR Model 232

9.4.4 MCMC Sampler 234

9.4.5 Illustration of the Bayesian Model 234

9.5 Conclusions 236

10 Convergence Problems in Logistic Regression 238
Paul Allison

10.1 Introduction 238

10.2 Overview of Logistic Maximum Likelihood Estimation 238

10.3 What Can Go Wrong? 240

10.4 Behavior of the Newton–Raphson Algorithm under Separation 243

10.4.1 Specific Implementations 244

10.4.2 Warning Messages 244

10.4.3 False Convergence 246

10.4.4 Reporting of Parameter Estimates and Standard Errors 247

10.4.5 Likelihood Ratio Statistics 247

10.5 Diagnosis of Separation Problems 247

10.6 Solutions for Quasi–Complete Separation 248

10.6.1 Deletion of Problem Variables 248

10.6.2 Combining Categories 248

10.6.3 Do Nothing and Report Likelihood Ratio Chi–Squares 249

10.6.4 Exact Inference 249

10.6.5 Bayesian Estimation 250

10.6.6 Penalized Maximum Likelihood Estimation 250

10.7 Solutions for Complete Separation 251

10.8 Extensions 252

11 Recommendations for Replication and Accurate Analysis 253

11.1 General Recommendations for Replication 253

11.1.1 Reproduction, Replication, and Verification 254

11.1.2 Recreating Data 255

11.1.3 Inputting Data 256

11.1.4 Analyzing Data 257

11.2 Recommendations for Producing Verifiable Results 259

11.3 General Recommendations for Improving the Numeric Accuracy of Analysis 260

11.4 Recommendations for Particular Statistical Models 261

11.4.1 Nonlinear Least Squares and Maximum Likelihood 261

11.4.2 Robust Hessian Inversion 262

11.4.3 MCMC Estimation 263

11.4.4 Logistic Regression 265

11.4.5 Spatial Regression 266

11.5 Where Do We Go from Here? 266

Bibliography 267

Author Index 303

Subject Index 315

Note: Product cover images may vary from those shown
3 of 4

Loading
LOADING...

4 of 4
"Uniquely accessible and abounding in modern–day tools, tricks, and advice, the text successfully bridges the gap between the current level of social science methodology and the more sophisticated technical coverage." (Zentralblatt Math 1130, May 2008)

"Clarity of presentations is excellent. Applied statisticians and computer scientists will like this book and find it very useful." (Journal of Statistical Computation and Simulation, November 2005)

"[The authors] …have succeeded in providing...a good understanding of the potential pitfalls involved in the implementation of methodology computationally, and...good advice on dealing with the problems that can arise." (Statistics in Medical Research, June 2005)

“This book provides the researcher with an overview of the issues involved in the implementation and computation of common statistical procedures….” (Statistical Methods in Medical Research, Vol. 14, 2005)

"…this book is a good reference for social scientists that are involved in computational statistics." (Journal of Statistical Software, April 2005)

"…timely and interesting, and on the whole provides a good balance of theory, application, and computation." (Technometrics, May 2005)

"…an excellent text. It has the potential to be enormously influential across the social sciences…It should be required reading for everyone who performs statistical computing at the advanced level…" (Journal of the American Statistical Association, June 2005)

“…a compact guide to the voluminous literature on optimisation, numerical analysis, and computational statistics. This is no small achievement.” (Statistical Software Newsletter in Computational Statistics and Data Analysis)

"…a very important one for researchers, social scientists, and…graduate and post–graduate students in various disciplines..." (Computing Reviews.com, July 6, 2004)

"This comprehensive research and guidebook by Altman, Gill, and McDonald offers to social scientists modern tools and tricks previously lacking in other works.” (Choice, June 2004, Vol. 41 No. 10)

Note: Product cover images may vary from those shown
5 of 4
Note: Product cover images may vary from those shown
Adroll
adroll