+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)

INFORMS Analytics Body of Knowledge. Edition No. 1. Wiley Series in Operations Research and Management Science

  • Book

  • 400 Pages
  • December 2018
  • John Wiley and Sons Ltd
  • ID: 5228290

Standardizes the definition and framework of analytics

#2 on Book Authority’s list of the Best New Analytics Books to Read in 2019 (January 2019)

We all want to make a difference. We all want our work to enrich the world. As analytics professionals, we are fortunate - this is our time!

We live in a world of pervasive data and ubiquitous, powerful computation.  This convergence has inspired and accelerated the development of both analytic techniques and tools and this potential for analytics to have an impact has been a huge call to action for organizations, universities, and governments.

This title from Institute for Operations Research and the Management Sciences (INFORMS) represents the perspectives of some of the most respected experts on analytics.

Readers with various backgrounds in analytics – from novices to experienced professionals – will benefit from reading about and implementing the concepts and methods covered here.

Peer reviewed chapters provide readers with in-depth insights and a better understanding of the dynamic field of analytics

The INFORMS Analytics Body of Knowledge documents the core concepts and skills with which an analytics professional should be familiar; establishes a dynamic resource that will be used by practitioners to increase their understanding of analytics; and, presents instructors with a framework for developing academic courses and programs in analytics.

Table of Contents

Preface xv

List of Contributors xix

1 Introduction to Analytics 1
Philip T. Keenan, Jonathan H. Owen, and Kathryn Schumacher

1.1 Introduction 1

1.2 Conceptual Framework 3

1.2.1 Data-Centric Analytics 3

1.2.2 Decision-Centric Analytics 4

1.2.3 Combining Data- and Decision-Centric Approaches 5

1.3 Categories of Analytics 6

1.3.1 Descriptive Analytics 7

Data Modeling 7

Reporting 10

Visualization 10

Software 10

1.3.2 Predictive Analytics 10

Data Mining and Pattern Recognition 11

Predictive Modeling, Simulation, and Forecasting 11

Leveraging Expertise 12

1.3.3 Prescriptive Analytics 14

1.4 Analytics Within Organizations 16

1.4.1 Projects 17

1.4.2 Communicating Analytics 21

1.4.3 Organizational Capability 21

1.5 Ethical Implications 23

1.6 The Changing World of Analytics 25

1.7 Conclusion 28

References 28

2 Getting Started with Analytics 31
Karl G. Kempf

2.1 Introduction 31

2.2 Five Manageable Tasks 32

2.2.1 Task 1: Selecting the Target Problem 33

2.2.2 Task 2: Assemble the Team 34

Executive Sponsor 35

Project Manager 35

Domain Expert 35

IT Expert 35

Data Scientist 36

Stakeholders 36

2.2.3 Task 3: Prepare the Data 36

2.2.4 Task 4: Selecting Analytics Tools 39

Analytical Specificity or Breadth 39

Access to Data 40

Execution Performance 40

Visualization Capability 40

Data Scientist Skillset 40

Vendor Pricing 41

Team Budget 41

Sharing and Collaboration 41

2.2.5 Task 5: Execute 42

2.3 Real Examples 43

Case 1: Sensor Data and High-Velocity Analytics to Save Operating Costs 43

Case 2: Social Media and High-Velocity Analytics for Quick Response to Customers 44

Case 3: Sensor Data and High-Velocity Analytics to Save Maintenance Costs 44

Case 4: Using Old Data and Analytics to Detect New Fraudulent Claims 45

Case 5: Using Old and New Data Plus Analytics to Decrease Crime 45

Case 6: Collecting the Data and Applying the Analytics Is the Business 45

References 46

Further Reading: Papers 47

Further Reading: Books 48

3 The Analytics Team 49
Thomas H. Davenport

3.1 Introduction 49

3.2 Skills Necessary for Analytics 50

3.2.1 More Advanced or Recent Analytical and Data Science Skills 51

3.2.2 The Larger Team 53

3.3 Managing Analytical Talent 57

3.3.1 Developing Talent 58

3.3.2 Working with the HR Organization 59

3.4 Organizing Analytics 61

3.4.1 Goals of a Particular Analytics Organization 62

3.4.2 Basic Models for Organizing Analytics 63

3.4.3 Coordination Approaches 65

Program Management Office 66

Federation 67

Community 67

Matrix 67

Rotation 67

Assigned Customers 67

What Model Fits Your Business? 68

3.4.4 Organizational Structures for Specific Analytics Strategies and Scenarios 70

3.4.5 Analytical Leadership and the Chief Analytics Officer 70

3.5 To Where Should Analytical Functions Report? 72

Information Technology 72

Strategy 72

Shared Services 72

Finance 73

Marketing or Other Specific Function 73

Product Development 73

3.5.1 Building an Analytical Ecosystem 73

3.5.2 Developing the Analytical Organization over Time 74

References 75

4 The Data 77
Brian T. Downs

4.1 Introduction 77

4.2 Data Collection 77

4.2.1 Data Types 77

4.2.2 Data Discovery 80

4.3 Data Preparation 86

4.4 Data Modeling 93

4.4.1 Relational Databases 93

4.4.2 Nonrelational Databases 95

4.5 Data Management 97

5 Solution Methodologies 99
Mary E. Helander

5.1 Introduction 99

5.1.1 What Exactly Do We Mean by “Solution,” “Problem,” and “Methodology?” 99

5.1.2 It’s All About the Problem 101

5.1.3 Solutions versus Products 101

5.1.4 How This Chapter Is Organized 103

5.1.5 The “Descriptive–Predictive–Prescriptive” Analytics Paradigm 105

5.1.6 The Goals of This Chapter 105

5.2 Macro-Solution Methodologies for the Analytics Practitioner 106

5.2.1 The Scientific Research Methodology 106

5.2.2 The Operations Research Project Methodology 109

5.2.3 The Cross-Industry Standard Process for Data Mining (CRISP-DM) Methodology 112

5.2.4 Software Engineering-Related Solution Methodologies 114

5.2.5 Summary of Macro-Methodologies 114

5.3 Micro-Solution Methodologies for the Analytics Practitioner 116

5.3.1 Micro-Solution Methodology Preliminaries 116

5.3.2 Micro-Solution Methodology Description Framework 117

5.3.3 Group I: Micro-Solution Methodologies for Exploration and Discovery 119

Group I: Problems of Interest 119

Group I: Relevant Models 119

Group I: Data Considerations 120

Group I: Solution Techniques 120

Group I: Relationship to Macro-Methodologies 126

Group I: Takeaways 126

5.3.4 Group II: Micro-Solution Methodologies Using Models Where Techniques to Find Solutions Are Independent of Data 127

Group II: Problems of Interest 127

Group II: Relevant Models 127

Group II: Data Considerations 128

Group II: Solution Techniques 128

Group II: Relationship to Macro-Methodologies 135

Group II: Takeaways 137

5.3.5 Group III: Micro-Solution Methodologies Using Models Where Techniques to Find Solutions Are Dependent on Data 137

Group III: Problems of Interest 137

Group III: Relevant Models 138

Group III: Data Considerations 138

Group III: Solution Techniques 139

Group III: Relationship to Macro-Methodologies 140

Group III: Takeaways 141

5.3.6 Micro-Methodology Summary 141

5.4 General Methodology-Related Considerations 142

5.4.1 Planning an Analytics Project 142

5.4.2 Software and Tool Selection 142

5.4.3 Visualization 143

5.4.4 Fields with Related Methodologies 144

5.5 Summary and Conclusions 144

5.5.1 “Ding Dong, the Scientific Method Is Dead!” 145

5.5.2 “Methodology Cramps My Analytics Style” 145

5.5.3 “There Is Only One Way to Solve This” 146

5.5.4 Perceived Success Is More Important Than the Right Answer 148

5.6 Acknowledgments 149

References 149

6 Modeling 155
Gerald G. Brown

6.1 Introduction 155

6.2 When are Models Appropriate 155

6.2.1 What Is the Problem with This System? 159

6.2.2 Is This Problem Important? 159

6.2.3 How Will This Problem Be Solved Without a New Model? 159

6.2.4 What Modeling Technique Will Be Used? 159

6.2.5 How Will We Know When We Have Succeeded? 160

Who Are the System Operator Stakeholders? 160

6.3 Types of Models 161

6.3.1 Descriptive Models 161

6.3.2 Predictive Models 161

6.3.3 Prescriptive Models 161

6.4 Models Can Also Be Characterized by Whether They Are Deterministic or Stochastic (Random) 161

6.5 Counting 162

6.6 Probability 163

6.7 Probability Perspectives and Subject Matter Experts 165

6.8 Subject Matter Experts 165

6.9 Statistics 166

6.9.1 A Random Sample 166

6.9.2 Descriptive Statistics 166

6.9.3 Parameter Estimation with a Confidence Interval 166

6.9.4 Regression 167

6.10 Inferential Statistics 169

6.11 A Stochastic Process 170

6.12 Digital Simulation 173

6.12.1 Static versus Dynamic Simulations 174

6.13 Mathematical Optimization 174

6.14 Measurement Units 175

6.15 Critical Path Method 176

6.16 Portfolio Optimization Case Study Solved By a Variety of Methods 178

6.16.1 Linear Program 178

6.16.2 Heuristic 179

6.16.3 Assessing Our Progress 179

6.16.4 Relaxations and Bounds 179

6.16.5 Are We Finished Yet? 180

6.17 Game Theory 181

6.18 Decision Theory 184

6.19 Susceptible, Exposed, Infected, Recovered (SEIR) Epidemiology 187

6.20 Search Theory 189

6.21 Lanchester Models of Warfare 189

6.22 Hughes’ Salvo Model of Combat 192

6.23 Single-Use Models 193

6.24 The Principle of Optimality and Dynamic Programming 195

6.25 Stack-Based Enumeration 197

6.25.1 Data Structures 197

6.25.2 Discussion 199

6.25.3 Generating Permutations and Combinations 199

6.26 Traveling Salesman Problem: Another Case Study in Alternate Solution Methods 200

6.27 Model Documentation, Management, and Performance 206

6.27.1 Model Formulation 206

6.27.2 Choice of Implementation Language 207

6.27.3 Supervised versus Automated Models 207

6.27.4 Model Fidelity 208

6.27.5 Sensitivity Analysis 210

6.27.6 With Different Methods 211

6.27.7 With Different Variables 212

6.27.8 Stability 213

6.27.9 Reliability 213

6.27.10 Scalability 213

6.27.11 Extensibility 214

6.28 Rules for Data Use 215

6.28.1 Proprietary Data 215

6.28.2 Licensed Data 215

6.28.3 Personally Identifiable Information 216

6.28.4 Protected Critical Infrastructure Information System (PCIIMS) 216

6.28.5 Institutional Review Board (IRB) 216

6.28.6 Department of Defense and Department of Energy Classification 216

6.28.7 Law Enforcement Data 216

6.28.8 Copyright and Trademark 216

6.28.9 Paraphrased and Plagiarized 217

6.28.10 Displays of Model Outputs 217

6.28.11 Data Integrity 217

6.28.12 Multiple Data Evolutions 217

6.29 Data Interpolation and Extrapolation 217

6.30 Model Verification and Validation 218

6.30.1 Verifying 219

6.30.2 Validating 219

6.30.3 Comparing Models 219

6.30.4 Sample Data 220

6.30.5 Data Diagnostics 220

6.30.6 Data Vintage and Provenance 220

6.31 Communicate with Stakeholders 220

6.31.1 Training 221

6.31.2 Report Writers 221

6.31.3 Standard Form Model Statement 222

6.31.4 Persistence and Monotonicity: Examples of Realistic Model Restrictions 223

6.31.5 Model Solutions Require a Lot of Polish and Refinement Before They Can Directly Influence Policy 224

6.31.6 Model Obsolescence and Model-Advised Thumb Rules 226

6.32 Software 227

6.33 Where to Go from Here 228

6.34 Acknowledgments 228

References 229

7 Machine Learning 231
Samuel H. Huddleston and Gerald G. Brown

7.1 Introduction 231

7.2 Supervised, Unsupervised, and Reinforcement Learning 232

7.3 Model Development, Selection, and Deployment for Supervised Learning 235

7.3.1 Goals and Guiding Principles in Machine Learning 235

7.3.2 Algorithmic Modeling Overview 236

7.3.3 Data Acquisition and Cleaning 236

7.3.4 Feature Engineering 237

7.3.5 Modeling Overview 238

7.3.6 Model Fitting (Training) and Feature Selection 240

7.3.7 Model (Algorithm) Selection 241

7.3.8 Model Performance Assessment 242

7.3.9 Model Implementation 242

7.4 Model Fitting, Model Error, and the Bias-Variance Trade-Off 243

7.4.1 Components of (Regression) Model Error 243

7.4.2 Model Fitting: Balancing Bias and Variance 245

7.5 Predictive Performance Evaluation 247

7.5.1 Regression Performance Evaluation 248

7.5.2 Classification Performance Evaluation 249

7.5.3 Performance Evaluation for Time-Dependent Data 253

7.6 An Overview of Supervised Learning Algorithms 254

7.6.1 k-Nearest Neighbors (KNN) 255

7.6.2 Extensions to Regression 256

7.6.3 Classification and Regression Trees 257

7.6.4 Time Series Forecasting 259

7.6.5 Support Vector Machines 261

7.6.6 Artificial Neural Networks 262

7.6.7 Ensemble Methods 265

7.7 Unsupervised Learning Algorithms 267

7.7.1 Kernel Density Estimation 267

7.7.2 Association Rule Mining 268

7.7.3 Clustering Methods 269

7.7.4 Principal Components Analysis (PCA) 270

7.7.5 Bag-of-Words and Vector Space Models 271

7.8 Conclusion 272

7.9 Acknowledgments 272

References 273

8 Deployment and Life Cycle Management 275
Arnie Greenland

8.1 Introduction 275

8.2 The Analytics Methodology: Understanding the Critical Steps in Deployment and Life Cycle Management 276

8.2.1 CRISP-DM Phase 1: Business Understanding 278

8.2.2 JTA Domain I, Task 1: Obtain or Receive Problem Statement and Usability 278

8.2.3 JTA Domain I, Task 2: Identify Stakeholders 279

8.2.4 JTA Domain I, Task 3: Determine if the Problem Is Amenable to an Analytics Solution 281

8.2.5 JTA Domain I, Task 4: Refine the Problem Statement and Delineate Constraints 281

8.2.6 JTA Domain I, Task 5: Define an Initial Set of Business Benefits 281

8.2.7 JTA Domain I, Task 6: Obtain Stakeholder Agreement on the Business Statement 282

8.2.8 JTA Domain II, Task 1: Reformulate the Problem Statement as an Analytics Problem 283

8.2.9 JTA Domain II, Task 2: Develop a Proposed Set of Drivers and Relationships to Outputs 285

8.2.10 JTA Domain II, Task 3: State the Set of Assumptions Related to the Problem 286

8.2.11 JTA Domain II, Task 4: Define the Key Metrics of Success 287

8.2.12 JTA Domain II, Task 5: Obtain Stakeholder Agreement 287

8.2.13 CRISP-DM Phases 2 and 3: Data Understanding and Data Preparation 288

8.2.14 JTA Domain III, Task 1: Identify and Prioritize Data Needs and Sources 290

8.2.15 JTA Domain III, Task 2: Acquire Data 290

8.2.16 JTA Domain III, Task 3: Harmonize, Rescale, Clean, and Share Data 291

8.2.17 JTA Domain III, Task 4: Identify Relationships in the Data 292

8.2.18 JTA Domain III, Task 5: Document and Report Finding 293

8.2.19 JTA Domain III, Task 6: Refine the Business and Analytics Problem Statements 293

8.2.20 CRISP-DM Phase 4: Modeling 293

8.2.21 CRISP-DM Phase 5: Evaluation 294

8.2.22 CRISP-DM Phase 6: Deployment 297

8.2.23 Deployment of the Analytics Model (Up to Delivery) 298

8.2.24 Post-deployment Activities (Domain VI: Model Life Cycle Management) 301

8.3 Overarching Issues of Life Cycle Management 303

8.3.1 Documentation 303

8.3.2 Communication 305

8.3.3 Testing 307

8.3.4 Metrics 308

9 The Blossoming Analytics Talent Pool: An Overview of the Analytics Ecosystem 311
Ramesh Sharda and Pankush Kalgotra

9.1 Introduction 311

9.2 Analytics Industry Ecosystem 312

9.2.1 Data Generation Infrastructure Providers 314

9.2.2 Data Management Infrastructure Providers 315

9.2.3 Data Warehouse Providers 316

9.2.4 Middleware Providers 316

9.2.5 Data Service Providers 316

9.2.6 Analytics-Focused Software Developers 317

Reporting/Descriptive Analytics 317

Predictive Analytics 318

Prescriptive Analytics 318

9.2.7 Application Developers: Industry-Specific or General 319

9.2.8 Analytics Industry Analysts and Influencers 321

9.2.9 Academic Institutions and Certification Agencies 322

9.2.10 Regulators and Policy Makers 323

9.2.11 Analytics User Organizations 323

9.3 Conclusions 325

References 326

Appendix: Writing and Teaching Analytics with Cases 327
James J. Cochran

Index 355

Authors

James J. Cochran