Heavy–tailed distributions are typical for phenomena in complex multi–component systems. They possess a number of specific features including the slower than exponential decay to zero of the tail, the violation of Cramerâ??s condition, a possible non–existence of some moments, and sparse observations in the tail of the distribution. Consequently the analysis of such distributions requires unique statistical methods. Nonparametric Analysis of Univariate Heavy–Tailed Data introduces these statistical techniques. It provides a survey of classical results and explores recent developments in the theory of nonparametric estimation of the heavy–tailed probability density function and its application to classification when objects belong to populations distributed with heavy tails, the tail index, high quantiles, the hazard rate, and the renewal function.
Presents non–asymptotical methods of heavy–tailed data analysis.
Demonstrates preliminary data analysis and how to detect heavy tails and dependence.
Presents the unique data transformations to estimate heavy–tailed probability density function at infinity better.
Discusses a regularization theory of the solution of inverse ill–posed stochastic operator equations, and its application to the estimation of the probability density function, the hazard rate and the identification of Markov models.
Provides and examines smoothing methods of the nonparametric estimates as the key point for accurate approximation.
Features numerous exercises and examples of real–life applications in teletraffic theory, population analysis and finance.
Nonparametric Analysis of Univariate Heavy–Tailed Data assumes only an introductory knowledge of probability theory, statistical methods and functional analysis. It is ideally suited for statisticians, researchers and PhD students in statistics and probability theory. There is also much to benefit those working and studying in a wide range of disciplines from computer science, telecommunications and performance evaluation, to demography and population analysis.
1. Definitions and rough detection of tail heaviness.
1.1 Definitions and basic properties of classes of heavy–tailed Distributions.
1.2 Tail index estimation.
1.2.1 Estimators of a positive–valued tail index.
1.2.2 The choice of k in Hill′s estimator.
1.2.3 Estimators of a real–valued tail index.
1.2.4 On–line estimation of the tail index.
1.3 Detection of tail heaviness and dependence.
1.3.1 Rough tests of tail heaviness.
1.3.2 Analysis of Web traffic and TCP flow data.
1.3.3 Dependence detection from univariate data.
1.3.4 Dependence detection from bivariate data.
1.3.5 Bivariate analysis of TCP flow data.
1.4 Notes and comments.
2. Classical methods of probability density estimation.
2.1 Principles of density estimation.
2.2 Methods of density estimation.
2.2.1 Kernel estimators.
2.2.2 Projection estimators.
2.2.3 Spline estimators.
2.2.4 Smoothing methods.
2.2.5 Illustrative examples.
2.3 Kernel estimation from dependent data.
2.3.1 Statement of the problem.
2.3.2 Numerical calculation of the bandwidth.
2.3.3 Data–driven selection of the bandwidth.
2.4.1 Finance: evaluation of market risk.
2.4.3 Population analysis.
3. Heavy–tailed density estimation.
3.1 Problems of the estimation of heavy–tailed densities.
3.2 Combined parametric–nonparametric method.
3.2.1 Nonparametric estimation of the density by structural risk minimization.
3.2.2 Illustrative examples.
3.2.3 Web data analysis by a combined parametric–nonparametric method.
3.3 Barronâ??s estimator and Ï?2–optimality.
3.4 Kernel estimators with variable bandwidth.
3.5 Retransformed nonparametric estimators.
4. Transformations and heavy–tailed density estimation.
4.1 Problems of data transformations.
4.2 Estimates based on a fixed transformation.
4.3 Estimates based on an adaptive transformation.
4.3.1 Estimation algorithm.
4.3.2 Analysis of the algorithm.
4.3.3 Further remarks.
4.4 Estimating the accuracy of retransformed estimates.
4.5 Boundary kernels.
4.6 Accuracy of a nonvariable bandwidth kernel estimator.
4.7 The D method for a nonvariable bandwidth kernel estimator.
4.8 The D method for a variable bandwidth kernel estimator.
4.8.1 Method and results.
4.8.2 Application to Web traffic characteristics.
4.9 The Ï?2 method for the projection estimator.
5. Classification and retransformed density estimates.
5.1 Classification and quality of density estimation.
5.2 Convergence of the estimated probability of misclassification.
5.3 Simulation study.
5.4 Application of the classification technique to Web data analysis.
5.4.1 Intelligent browser.
5.4.2 Web data analysis by traffic classification.
5.4.3 Web prefetching.
6. Estimation of high quantiles.
6.2 Estimators of high quantiles.
6.3 Distribution of high quantile estimates.
6.4 Simulation study.
6.4.1 Comparison of high quantile estimates in terms of relative bias and mean squared error.
6.4.2 Comparison of high quantile estimates in terms of confidence intervals.
6.5 Application to Web traffic data.
7. Nonparametric estimation of the hazard rate function.
7.1 Definition of the hazard rate function.
7.2 Statistical regularization method.
7.3 Numerical solution of ill–posed problems.
7.4 Estimation of the hazard rate function of heavy–tailed distributions.
7.5 Hazard rate estimation for compactly supported distributions.
7.5.1 Estimation of the hazard rate from the simplest equations.
7.5.2 Estimation of the hazard rate from a special kernel equation.
7.6 Estimation of the ratio of hazard rates.
7.6.1 Failure time detection.
7.6.2 Hormesis detection.
7.7 Hazard rate estimation in teletraffic theory.
7.7.1 Teletraffic processes at the packet level.
7.7.2 Estimation of the intensity of a nonhomogeneous Poisson process.
7.8 Semi–Markov modeling in teletraffic engineering.
7.8.1 The Gilbert–Elliott model.
7.8.2 Estimation of a retrial process.
8. Nonparametric estimation of the renewal function.
8.1 Traffic modeling by recurrent marked point processes.
8.2 Introduction to renewal function estimation.
8.3 Histogram–type estimator of the renewal function.
8.4 Convergence of the histogram–type estimator.
8.5 Selection of k by a bootstrap method.
8.6 Selection of k by a plot.
8.7 Simulation study.
8.8 Application to the inter–arrival times of TCP connections.
8.9 Conclusions and discussion.
A Proofs of Chapter 2.
B Proofs of Chapter 4.
C Proofs of Chapter 5.
D Proofs of Chapter 6.
E Proofs of Chapter 7.
F Proofs of Chapter 8.
List of Main Symbols and Abbreviations.
"It is ideally suited for statisticians, researchers and Ph.D. students in statistics and probability theory. There is also much to benefit those working and studying a wide range of disciplines from computer science, telecommunications and performance evaluation, to demography and population analysis." (Mathematical Review, Issue 2009e)