+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)

PRINTER FRIENDLY

Analysis of Big Dependent Data. Edition No. 1. Wiley Series in Probability and Statistics

  • ID: 5186232
  • Book
  • May 2021
  • John Wiley and Sons Ltd

Master advanced topics in the analysis of large, dynamically dependent datasets with this insightful resource

Statistical Learning with Big Dependent Data delivers a comprehensive presentation of the statistical and machine learning methods useful for analyzing and forecasting large and dynamically dependent data sets. The book presents automatic procedures for modelling and forecasting large sets of time series data.  Beginning with some visualization tools, the book discusses procedures and methods for finding outliers, clusters, and other types of heterogeneity in big dependent data. It then introduces various dimension reduction methods, including regularization and factor models such as regularized Lasso in the presence of dynamical dependence and dynamic factor models. The book also covers other forecasting procedures, including index models, partial least squares, boosting, and now-casting. It further presents machine-learning methods, including neural network, deep learning, classification and regression trees and random forests.  Finally, procedures for modelling and forecasting spatio-temporal dependent data are also presented.

Throughout the book, the advantages and disadvantages of the methods discussed are given.  The book uses real-world examples to demonstrate applications, including use of many R packages. Finally, an R package associated with the book is available to assist readers in reproducing the analyses of examples and to facilitate real applications.

Analysis of Big Dependent Data includes a wide variety of topics for modeling and understanding big dependent data, like:

  • New ways to plot large sets of time series
  • An automatic procedure to build univariate ARMA models for individual components of a large data set
  • Powerful outlier detection procedures for large sets of related time series
  • New methods for finding the number of clusters of time series and discrimination methods , including vector support machines, for time series
  • Broad coverage of dynamic factor models including new representations and estimation methods for generalized dynamic factor models
  • Discussion on the usefulness of lasso with time series and an evaluation of several machine learning procedure for forecasting large sets of time series
  • Forecasting large sets of time series with exogenous variables, including discussions of index models, partial least squares, and boosting.
  • Introduction of modern procedures for modeling and forecasting spatio-temporal data 

Perfect for PhD students and researchers in business, economics, engineering, and science: Statistical Learning with Big Dependent Data also belongs to the bookshelves of practitioners in these fields who hope to improve their understanding of statistical and machine learning methods for analyzing and forecasting big dependent data.

Note: Product cover images may vary from those shown

1 Introduction to Big Dependent Data 1

1.1 Examples of Dependent Data 2

1.2 Stochastic Processes 11

1.2.1 Scalar Processes 11

1.2.1.1 Stationarity 12

1.2.1.2 White Noise Process 14

1.2.1.3 Conditional Distribution 14

1.2.2 Vector Processes 14

1.2.2.1 Vector White Noises 17

1.2.2.2 Invertibility 17

1.3 Sample Moments of Stationary Vector Process 18

1.3.1 Sample Mean 18

1.3.2 Sample Covariance and Correlation Matrices 19

1.3.3 Example 1.1 20

1.3.4 Example 1.2 23

1.4 Nonstationary Processes 23

1.5 Principal Component Analysis 26

1.5.1 Discussion 30

1.5.2 Properties of the Principal Components 30

1.5.3 Example 1.3 31

1.6 Effects of Serial Dependence 35

1.6.1 Example 1.4 37

1.7 Appendix 1: Some Matrix Theory 38

2 Linear Univariate Time Series 43

2.1 Visualizing a Large Set of Time Series 45

2.1.1 Dynamic Plots 45

2.1.2 Static Plots 51

2.1.3 Example 2.1 55

2.2 Stationary ARMA Models 56

2.2.1 The Autoregressive Process 58

2.2.1.1 Autocorrelation Functions 59

2.2.2 The Moving Average Process 60

2.2.3 The ARMA Process 62

2.2.4 Linear Combinations of ARMA Processes 63

2.2.5 Example 2.2 64

2.3 Spectral Analysis of Stationary Processes 65

2.3.1 Fitting Harmonic Functions to a Time Series 65

2.3.2 The Periodogram 67

2.3.3 The Spectral Density Function and its Estimation 70

2.3.4 Example 2.3 71

2.4 Integrated Processes 72

2.4.1 The Random Walk Process 72

2.4.2 ARIMA Models 74

2.4.3 Seasonal ARIMA Models 75

2.4.3.1 The Airline Model 77

2.4.4 Example 2.4 78

2.5 Structural and State Space Models 80

2.5.1 Structural Time Series Models 80

2.5.2 State-Space Models 81

2.5.3 The Kalman Filter 85

2.6 Forecasting with Linear Models 88

2.6.1 Computing Optimal Predictors 88

2.6.2 Variances of the Predictions 90

2.6.3 Measuring Predictability 91

2.7 Modeling a Set of Time Series 92

2.7.1 Data Transformation 93

2.7.2 Testing for White Noise 95

2.7.3 Determination of the Difference Order 95

2.7.4 Example 2.5 96

2.7.5 Model Identification 97

2.8 Estimation and Information Criteria 97

2.8.1 Conditional Likelihood 97

2.8.2 On-line Estimation 99

2.8.3 Maximum Likelihood Estimation 100

2.8.4 Model Selection 101

2.8.4.1 The Akaike Information Criterion 102

2.8.4.2 The BIC Criterion 103

2.8.4.3 Other Criteria 103

2.8.4.4 Cross-Validation 104

2.8.5 Example 2.6 104

2.9 Diagnostic Checking 107

2.9.1 Residual Plot 107

2.9.2 Portmanteau Test for Residual Serial Correlations 107

2.9.3 Homoscedastic Tests 109

2.9.4 Normality Tests 109

2.9.5 Checking for Deterministic Components 109

2.9.6 Example 2.7 110

2.10 Forecasting 111

2.10.1 Out-of-Sample Forecasts 111

2.10.2 Forecasting with Model Averaging 112

2.10.3 Forecasting with Shrinkage Estimators 113

2.10.4 Example 2.8 114

2.11 Appendix 2: Difference Equations 115

3 Analysis of Multivariate Time Series 125

3.1 Transfer Function Models 126

3.1.1 Single Input and Single Output 126

3.1.2 Example 3.1 131

3.1.3 Multiple Inputs and Multiple Outputs 132

3.2 Vector AR Models 133

3.2.1 Impulse Response Function 135

3.2.2 Some Special Cases 136

3.2.3 Estimation 137

3.2.4 Model Building 139

3.2.5 Prediction 141

3.2.6 Forecast Error Variance Decomposition 143

3.2.7 Example 3.2 144

3.3 Vector Moving-Average Models 152

3.3.1 Properties of VMA Models 153

3.3.2 VMA Modeling 153

3.4 Stationary VARMA Models 157

3.4.1 Are VAR Models Sufficient? 157

3.4.2 Properties of VARMA Models 158

3.4.3 Modeling VARMA Process 159

3.4.4 Use of VARMA Models 159

3.4.5 Example 3.4 160

3.5 Unit Roots and Co-integration 165

3.5.1 Spurious Regression 165

3.5.2 Linear Combinations of a Vector Process 166

3.5.3 Co-integration 167

3.5.4 Over-Differencing 167

3.6 Error-Correction Models 169

3.6.1 Co-integration Test 170

3.6.2 Example 3.5 171

4 Handling Heterogeneity in Many Time Series 179

4.1 Intervention Analysis 180

4.1.1 Intervention with Indicator Variables 182

4.1.2 Intervention with Step Functions 184

4.1.3 Intervention with General Exogenous Variables 185

4.1.4 Building an Intervention Model 185

4.1.5 Example 4.1 186

4.2 Estimation of Missing Values 187

4.2.1 Univariate Interpolation 187

4.2.2 Multivariate Interpolation 192

4.2.3 Example 4.2 193

4.3 Outliers in Vector Time Series 194

4.3.1 Multivariate Additive Outliers 195

4.3.1.1 Effects on Residuals and Estimation 195

4.3.2 Multivariate Level Shift or Structural Break 197

4.3.2.1 Effects on Residuals and Estimation 197

4.3.3 Other Types of Outliers 198

4.3.3.1 Multivariate Innovative Outliers 198

4.3.3.2 Transitory Change 199

4.3.3.3 Ramp Shift 200

4.3.4 Masking and Swamping 200

4.4 Univariate Outlier Detection 201

4.4.1 Other Procedures for Univariate Outlier Detection 203

4.4.2 Example 4.3 204

4.4.3 New Approaches to Outlier Detection 205

4.4.4 Example 4.4 207

4.4.5 Example 4.5 209

4.5 Multivariate Outliers Detection 210

4.5.1 VARMA Outlier Detection 210

4.5.2 Outlier Detection by Projections 212

4.5.3 A Projection Algorithm for Outliers Detection 214

4.5.4 The Nonstationary Case 215

4.5.5 Example 4.6 216

4.5.6 Example 4.7 217

4.6 Robust estimation 218

4.6.1 Example 4.8 220

4.7 Heterogeneity for Parameter Changes 221

4.7.1 Parameter Changes in Univariate Time Series 221

4.7.2 Covariance Changes in Multivariate Time Series 223

4.7.2.1 Detecting Multiple Covariance Changes 224

4.7.2.2 LR test: 225

4.8 Appendix 4: Cusum Algorithms 227

5 Clustering and Classification of Time Series 235

5.1 Distances and Dissimilarities 236

5.1.1 Distance Between Univariate Time Series 236

5.1.2 Dissimilarities Between Univariate Series 239

5.1.3 Example 5.1 241

5.1.4 Dissimilarities Based on Cross-Linear Dependency 247

5.1.5 Example 5.2 250

5.2 Hierarchical Clustering of Time Series 252

5.2.1 Criteria for Defining Distances Between Groups 253

5.2.2 The Dendrogram 254

5.2.3 Selecting the Number of Groups 254

5.2.3.1 The Height and Step Plots 254

5.2.3.2 Silhouette Statistic 255

5.2.3.3 The Gap Statistic 258

5.2.4 Example 5.3 260

5.3 Clustering by Variables 270

5.3.1 The K-means Algorithm 271

5.3.1.1 Number of Groups 273

5.3.2 Example 5.4 273

5.3.3 K-Mediods 277

5.3.4 Model Based Clustering by Variables 279

5.3.4.1 ML Estimation of the AR Mixture Model 280

5.3.4.2 The EM Algorithm 282

5.3.4.3 Estimation of Mixture of Multivariate Normals 283

5.3.4.4 Bayesian Estimation 284

5.3.4.5 Clustering with Structural Breaks 285

5.3.5 Example 5.5 286

5.3.6 Clustering by Projections 287

5.3.7 Example 5.6 290

5.4 Classification with Time Series 292

5.4.1 Classification Among a Set of Models 293

5.4.2 Checking the Classification Rule 295

5.5 Classification with Features 296

5.5.1 Linear Discriminant Function 296

5.5.2 Quadratic Classification and Admissible Functions 297

5.5.3 Logistic Regression 298

5.5.4 Example 5.7 300

5.6 Nonparametric Classification 307

5.6.1 Nearest Neighbors 307

5.6.2 Support Vector Machines 308

5.6.2.1 Linearly Separable Problems 309

5.6.2.2 Nonlinearly Separable Problems 312

5.6.3 Density Estimation 314

5.6.4 Example 5.8 315

5.7 Other Classification Problems and Methods 317

6 Dynamic Factor Models 323

6.1 The Dynamic Factor Model for Stationary Series 325

6.1.1 Properties of the Covariance Matrices 327

6.1.1.1 The Exact DFM 328

6.1.1.2 The Approximate DFM 329

6.1.2 Example 6.1 330

6.1.3 Dynamic Factor and VARMA Models 333

6.2 Fitting a Stationary DFM to Data 334

6.2.1 Principal Component Estimation 334

6.2.2 Pooled Principal Component Estimator 336

6.2.3 Generalized Principal Component Estimator 337

6.2.4 Maximum Likelihood Estimation 337

6.2.5 Selecting the Number of Factors 339

6.2.5.1 Rank Testing via Canonical Correlation 339

6.2.5.2 Testing a Jump in Eigenvalues 340

6.2.5.3 Using Information Criteria 341

6.2.6 Forecasting with DFM 341

6.2.7 Example 6.2 342

6.2.8 Example 6.3 343

6.2.9 Alternative Formulations of the EDFM 348

6.3 Generalized Dynamic Factor Models for Stationary Series 350

6.3.1 Some Properties of the GDFM 351

6.3.2 GDFM and VARMA Models 352

6.4 Dynamic Principal Components 352

6.4.1 DPC for Optimal Reconstruction 352

6.4.2 One-sided Dynamic Principal Components 353

6.4.3 Model Selection and Forecasting 356

6.4.4 One Sided DPC and GDFM Estimation 357

6.4.5 Example 6.4 357

6.5 Dynamic Factor Models for Nonstationary Series 360

6.5.1 Example 6.5 362

6.5.2 Cointegration and DFM 366

6.6 Generalized Dynamic Factor Models for Nonstationary Series 366

6.6.1 Estimation by Generalized Dynamic Principal Component 367

6.6.2 Example 6.6 369

6.7 Outliers in Dynamic Factor Models 371

6.7.1 Factor and Idiosyncratic Outliers 371

6.7.2 A Procedure to Find Outliers in DFM 372

6.8 DFM with Cluster Structure 373

6.8.1 Fitting DFMCS 374

6.8.2 Example 6.7 377

6.9 Some Extensions of DFM 381

6.10 High-Dimensional Case 383

6.10.1 Sparse Principal Components 383

6.10.2 An Structural-Factor Model Approach 386

6.10.3 Estimation 386

6.10.4 Selecting the Number of Common Factors 388

6.10.5 Asymptotic Properties of Loading Estimates 389

7 Forecasting With Big Dependent Data 401

7.1 Regularized Linear Models 402

7.1.1 Properties of Lasso Estimator 405

7.1.2 Some Extensions of Lasso Regression 409

7.1.2.1 Adaptive Lasso 409

7.1.2.2 Group Lasso 410

7.1.2.3 Elastic Net 411

7.1.2.4 Fused Lasso 411

7.1.2.5 SCAD Penalty 411

7.1.3 Example 7.1 412

7.1.4 Example 7.2 418

7.2 Impacts of Dynamic Dependence on Lasso 422

7.3 Lasso for Dependent Data 428

7.3.1 Example 7.5 433

7.4 Principal Component Regression and Diffusion Index 435

7.4.1 Example 7.6 436

7.5 Partial Least Squares 440

7.5.1 Example 7.7 443

7.6 Boosting 446

7.6.1 `2 Boosting 447

7.6.2 Choices of Weak Learner 448

7.6.3 Example 7.8 449

7.6.4 Boosting for Classification 452

7.7 Mixed-Frequency Data and Nowcasting 454

7.7.1 MIDAS Regression 455

7.7.2 Nowcasting 456

7.7.3 Example 7.9 457

7.7.4 Example 7.10 461

7.8 Strong Serial Dependence 464

8 Machine Learning of Big Dependent Data 471

8.1 Regression Trees and Random Forest 472

8.1.1 Growing Tree 472

8.1.2 Pruning 473

8.1.3 Classification Trees 474

8.1.4 Example 8.1 475

8.1.5 Random Forests 477

8.1.6 Example 8.2 478

8.2 Neural Networks 480

8.2.1 Network Training 482

8.2.2 Example 8.3 488

8.3 Deep Learning 490

8.3.1 Types of Deep Networks 490

8.3.2 Recurrent Neural Network 492

8.3.3 Activation Functions for Deep Learning 493

8.3.4 Training Deep Networks 494

8.3.4.1 Long Short-Term Memory Model 495

8.3.4.2 Training Algorithm 496

8.4 Some Applications 497

8.4.1 The Package: keras 498

8.4.2 Example 8.4 498

8.4.3 Example 8.5 502

8.4.4 Dropout Layer 505

8.4.5 Application of Convolution Networks 506

8.4.6 Application of LSTM 513

8.4.7 Example 8.6 518

8.5 Deep Generative Models 524

8.6 Reinforcement Learning 524

9 Spatio-Temporal Dependent Data 529

9.1 Examples and Visualization 530

9.2 Spatial Processes and Data Analysis 536

9.3 Geostatistical Processes 538

9.3.1 Stationary Variogram 539

9.3.2 Examples of Semivariogram 539

9.3.3 Stationary Covariance Function 541

9.3.4 Estimation of Variogram 542

9.3.5 Testing Spatial Dependence 543

9.3.6 Kriging 543

9.3.6.1 Simple Kriging 544

9.3.6.2 Ordinary Kriging 546

9.3.6.3 Universal Kriging 547

9.4 Lattice Processes 547

9.4.1 Markov-Type Models 548

9.5 Spatial Point Processes 550

9.5.1 Second-Order Intensity 551

9.6 Example 9.1 552

9.7 Spatio-Temporal Processes and Analysis 555

9.7.1 Basic Properties 556

9.7.2 Some Nonseparable Covariance Functions 559

9.7.3 Spatio-Temporal Variogram 560

9.7.4 Spatio-Temporal Kriging 560

9.7.5 Example 9.2 562

9.8 Descriptive Spatio-Temporal Models 565

9.8.1 Random Effects with Spatio-Temporal Basis Functions 566

9.8.2 Random Effects with Spatial Basis Functions 567

9.8.3 Fixed Rank Kriging 568

9.8.4 Example 9.3 570

9.8.5 Spatial Principal Component Analysis 572

9.8.6 Example 9.4 573

9.8.7 Random Effects with Temporal Basis Functions 576

9.8.8 Example 9.5 576

9.9 Dynamic Spatio-Temporal Models 582

9.9.1 Space-Time Autoregressive Moving-Average Models 582

9.9.2 Spatio-Temporal Component Models 584

9.9.3 Spatio-Temporal Factor Models 584

9.9.4 Spatio-Temporal Hierarchical Models 585

Note: Product cover images may vary from those shown
Daniel Peña
Ruey S. Tsay University of Chicago, IL, USA.
Note: Product cover images may vary from those shown
Adroll
adroll