+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)

The Book of Alternative Data. A Guide for Investors, Traders and Risk Managers. Edition No. 1

  • Book

  • 416 Pages
  • August 2020
  • John Wiley and Sons Ltd
  • ID: 5842174

The first and only book to systematically address methodologies and processes of leveraging non-traditional information sources in the context of investing and risk management

Harnessing non-traditional data sources to generate alpha, analyze markets, and forecast risk is a subject of intense interest for financial professionals. A growing number of regularly-held conferences on alternative data are being established, complemented by an upsurge in new papers on the subject. Alternative data is starting to be steadily incorporated by conventional institutional investors and risk managers throughout the financial world. Methodologies to analyze and extract value from alternative data, guidance on how to source data and integrate data flows within existing systems is currently not treated in literature. Filling this significant gap in knowledge, The Book of Alternative Data is the first and only book to offer a coherent, systematic treatment of the subject.

This groundbreaking volume provides readers with a roadmap for navigating the complexities of an array of alternative data sources, and delivers the appropriate techniques to analyze them. The authors - leading experts in financial modeling, machine learning, and quantitative research and analytics - employ a step-by-step approach to guide readers through the dense jungle of generated data. A first-of-its kind treatment of alternative data types, sources, and methodologies, this innovative book:

  • Provides an integrated modeling approach to extract value from multiple types of datasets
  • Treats the processes needed to make alternative data signals operational
  • Helps investors and risk managers rethink how they engage with alternative datasets
  • Features practical use case studies in many different financial markets and real-world techniques
  • Describes how to avoid potential pitfalls and missteps in starting the alternative data journey
  • Explains how to integrate information from different datasets to maximize informational value

The Book of Alternative Data is an indispensable resource for anyone wishing to analyze or monetize different non-traditional datasets, including Chief Investment Officers, Chief Risk Officers, risk professionals, investment professionals, traders, economists, and machine learning developers and users.

Table of Contents

Preface xv

Acknowledgments xvii

Part 1 Introduction and Theory 1

1 Alternative Data: The Lay of the Land 3

1.1 Introduction 3

1.2 What is “Alternative Data”? 5

1.3 Segmentation of Alternative Data 7

1.4 The Many Vs of Big Data 9

1.5 Why Alternative Data? 11

1.6 Who is Using Alternative Data? 15

1.7 Capacity of a Strategy and Alternative Data 16

1.8 Alternative Data Dimensions 19

1.9 Who Are the Alternative Data Vendors? 23

1.10 Usage of Alternative Datasets on the Buy Side 24

1.11 Conclusion 26

2 The Value of Alternative Data 27

2.1 Introduction 27

2.2 The Decay of Investment Value 27

2.3 Data Markets 29

2.4 The Monetary Value of Data (Part I) 31

2.4.1 Cost Value 34

2.4.2 Market Value 34

2.4.3 Economic Value 35

2.5 Evaluating (Alternative) Data Strategies with and without Backtesting 35

2.5.1 Systematic Investors 36

2.5.2 Discretionary Investors 38

2.5.3 Risk Managers 39

2.6 The Monetary Value of Data (Part II) 39

2.6.1 The Buyer’s Perspective 40

2.6.2 The Seller’s Perspective 41

2.7 The Advantages of Maturing Alternative Datasets 45

2.8 Summary 46

3 Alternative Data Risks and Challenges 47

3.1 Legal Aspects of Data 47

3.2 Risks of Using Alternative Data 50

3.3 Challenges of Using Alternative Data 51

3.3.1 Entity Matching 52

3.3.2 Missing Data 54

3.3.3 Structuring the Data 55

3.3.4 Treatment of Outliers 56

3.4 Aggregating the Data 57

3.5 Summary 58

4 Machine Learning Techniques 59

4.1 Introduction 59

4.2 Machine Learning: Definitions and Techniques 60

4.2.1 Bias, Variance, and Noise 60

4.2.2 Cross-Validation 61

4.2.3 Introducing Machine Learning 62

4.2.4 Popular Supervised Machine Learning Techniques 64

4.2.5 Clustering-Based Unsupervised Machine Learning Techniques 70

4.2.6 Other Unsupervised Machine Learning Techniques 71

4.2.7 Machine Learning Libraries 71

4.2.8 Neutral Networks and Deep Learning 72

4.2.9 Gaussian Processes 80

4.3 Which Technique to Choose? 82

4.4 Assumptions and Limitations of the Machine Learning Techniques 84

4.4.1 Causality 84

4.4.2 Non-stationarity 85

4.4.3 Restricted Information Set 86

4.4.4 The Algorithm Choice 86

4.5 Structuring Images 87

4.5.1 Features and Feature Detection Algorithms 87

4.5.2 Deep Learning and CNNs for Image Classification 89

4.5.3 Augmenting Satellite Image Data with Other Datasets 90

4.5.4 Imaging Tools 91

4.6 Natural Language Processing (NLP) 91

4.6.1 What is Natural Language Processing (NLP)? 91

4.6.2 Normalization 93

4.6.3 Creating Word Embeddings: Bag-of-Words 94

4.6.4 Creating Word Embeddings: Word2vec and Beyond 94

4.6.5 Sentiment Analysis and NLP Tasks as Classification Problems 96

4.6.6 Topic Modeling 96

4.6.7 Various Challenges in NLP 97

4.6.8 Different Languages and Different Texts 98

4.6.9 Speech in NLP 99

4.6.10 NLP Tools 100

4.7 Summary 102

5 The Processes behind the Use of Alternative Data 105

5.1 Introduction 105

5.2 Steps in the Alternative Data Journey 106

5.2.1 Step 1. Set up a Vision and Strategy 106

5.2.2 Step 2. Identify the Appropriate Datasets 107

5.2.3 Step 3. Perform Due Diligence on Vendors 108

5.2.4 Step 4. Pre-assess Risks 109

5.2.5 Step 5. Pre-assess the Existence of Signals 109

5.2.6 Step 6. Data Onboarding 110

5.2.7 Step 7. Data Preprocessing 110

5.2.8 Step 8. Signal Extraction 111

5.2.9 Step 9. Implementation (or Deployment in Production) 112

5.2.10 Maintenance Process 113

5.3 Structuring Teams to Use Alternative Data 114

5.4 Data Vendors 116

5.5 Summary 118

6 Factor Investing 119

6.1 Introduction 119

6.1.1 The CAPM 119

6.2 Factor Models 120

6.2.1 The Arbitrage Pricing Theory 122

6.2.2 The Fama-French 3-Factor Model 123

6.2.3 The Carhart Model 124

6.2.4 Other Approaches (Data Mining) 125

6.3 The Difference between Cross-Sectional and Time Series Trading Approaches 126

6.4 Why Factor Investing? 126

6.5 Smart Beta Indices Using Alternative Data Inputs 127

6.6 ESG Factors 128

6.7 Direct and Indirect Prediction 129

6.8 Summary 132

Part 2 Practical Applications 133

7 Missing Data: Background 135

7.1 Introduction 135

7.2 Missing Data Classification 136

7.2.1 Missing Data Treatments 137

7.3 Literature Overview of Missing Data Treatments 139

7.3.1 Luengo et al. (2012) 139

7.3.2 Garcia-Laencina et al. (2010) 143

7.3.3 Grzymala-Busse et al. (2000) 146

7.3.4 Zou et al. (2005) 147

7.3.5 Jerez et al. (2010) 147

7.3.6 Farhangfar et al. (2008) 148

7.3.7 Kang et al. (2013) 149

7.4 Summary 149

8 Missing Data: Case Studies 151

8.1 Introduction 151

8.2 Case Study: Imputing Missing Values in Multivariate Credit Default Swap Time Series 152

8.2.1 Missing Data Classification 153

8.2.2 Imputation Metrics 154

8.2.3 CDS Data and Test Data Generation 154

8.2.4 Multiple Imputation Methods 157

8.2.5 Deterministic and EOF-Based Techniques 160

8.2.6 Results 164

8.3 Case Study: Satellite Images 173

8.4 Summary 176

8.5 Appendix: General Description of the MICE Procedure 178

8.6 Appendix: Software Libraries Used in This Chapter 179

9 Outliers (Anomalies) 181

9.1 Introduction 181

9.2 Outliers Definition, Classification, and Approaches to Detection 182

9.3 Temporal Structure 183

9.4 Global Versus Local Outliers, Point Anomalies, and Micro-Clusters 184

9.5 Outlier Detection Problem Setup 184

9.6 Comparative Evaluation of Outlier Detection Algorithms 185

9.7 Approaches to Outlier Explanation 189

9.7.1 Micenkova et al. 189

9.7.2 Duan et al. 191

9.7.3 Angiulli et al. 192

9.8 Case Study: Outlier Detection on Fed Communications Index 194

9.9 Summary 201

9.10 Appendix 202

9.10.1 Model-Based Techniques 202

9.10.2 Distance-Based Techniques 202

9.10.3 Density-Based Techniques 203

9.10.4 Heuristics-Based Approaches 203

10 Automotive Fundamental Data 205

10.1 Introduction 205

10.2 Data 206

10.3 Approach 1: Indirect Approach 211

10.3.1 The Steps Followed 212

10.3.2 Stage 1 213

10.4 Approach 2: Direct Approach 223

10.4.1 The Data 223

10.4.2 Factor Generation 224

10.4.3 Factor Performance 225

10.4.4 Detailed Factor Results 229

10.5 Gaussian Processes Example 238

10.6 Summary 239

10.7 Appendix 240

10.7.1 List of Companies 240

10.7.2 Description of Financial Statement Items 241

10.7.3 Ratios Used 242

10.7.4 IHS Markit Data Features 243

10.7.5 Reporting Delays by Country 244

11 Surveys and Crowdsourced Data 245

11.1 Introduction 245

11.2 Survey Data as Alternative Data 245

11.3 The Data 247

11.4 The Product 247

11.5 Case Studies 249

11.5.1 Case Study: Company Event Study (Pooled Survey) 249

11.5.2 Case Study: Oil and Gas Production (Q&A Survey) 252

11.6 Some Technical Considerations on Surveys 254

11.7 Crowdsourcing Analyst Estimates Survey 255

11.8 Alpha Capture Data 256

11.9 Summary 256

11.10 Appendix 256

12 Purchasing Managers’ Index 259

12.1 Introduction 259

12.2 PMI Performance 261

12.3 Nowcasting GDP Growth 262

12.4 Impacts on Financial Markets 263

12.5 Summary 266

13 Satellite Imagery and Aerial Photography 267

13.1 Introduction 267

13.2 Forecasting US Export Growth 269

13.3 Car Counts and Earnings Per Share for Retailers 271

13.4 Measuring Chinese PMI Manufacturing with Satellite Data 277

13.5 Summary 280

14 Location Data 283

14.1 Introduction 283

14.2 Shipping Data to Track Crude Oil Supplies 283

14.3 Mobile Phone Location Data to Understand Retail Activity 287

14.3.1 Trading REIT ETF Using Mobile Phone Location Data 288

14.3.2 Estimating Earnings per Share with Mobile Phone Location Data 291

14.4 Taxi Ride Data and New York Fed Meetings 295

14.5 Corporate Jet Location Data and M&A 296

14.6 Summary 298

15 Text Web Social Media and News 299

15.1 Introduction 299

15.2 Collecting Web Data 299

15.3 Social Media 300

15.3.1 Hedonometer Index 302

15.3.2 Using Twitter Data to Help Forecast US Change in Nonfarm Payrolls 305

15.3.3 Twitter Data to Forecast Stock Market Reaction to FOMC 308

15.3.4 Liquidity and Sentiment from Social Media 309

15.4 News 309

15.4.1 Machine-Readable News to Trade FX and Understand FX Volatility 310

15.4.2 Federal Reserve Communications and US Treasury Yields 316

15.5 Other Web Sources 320

15.5.1 Measuring Consumer Price Inflation 321

15.6 Summary 322

16 Investor Attention 323

16.1 Introduction 323

16.2 Readership of Payrolls to Measure Investor Attention 323

16.3 Google Trends Data to Measure Market Themes 325

16.4 Investopedia Search Data to Measure Investor Anxiety 328

16.5 Using Wikipedia to Understand Price Action in Cryptocurrencies 330

16.6 Online Attention for Countries to Inform EMFX Trading 330

16.7 Summary 333

17 Consumer Transactions 335

17.1 Introduction 335

17.2 Credit and Debit Card Transaction Data 336

17.3 Consumer Receipts 337

17.4 Summary 340

18 Government, Industrial, and Corporate Data 341

18.1 Introduction 341

18.2 Using Innovation Measures to Trade Equities 342

18.3 Quantifying Currency Crisis Risk 344

18.4 Modeling Central Bank Intervention in Currency Markets 346

18.5 Summary 348

19 Market Data 351

19.1 Introduction 351

19.2 Relationship between Institutional FX Flow Data and FX Spot 351

19.3 Understanding Liquidity Using High-Frequency FX Data 355

19.4 Summary 357

20 Alternative Data in Private Markets 359

20.1 Introduction 359

20.2 Defining Private Equity and Venture Capital Firms 360

20.3 Private Equity Datasets 362

20.4 Understanding the Performance of Private Firms 363

20.5 Summary 364

Conclusions 365

Some Last Words 365

References 367

About the Authors 373

Index 375

Authors

Alexander Denev Saeed Amen