A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

Released Saturday, 1st November 2014

Good episode? Give it some love!

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

Saturday, 1st November 2014

Good episode? Give it some love!

Rate Episode

The mean prediction error of a classification or regression procedure can be estimated using resampling designs such as the cross-validation design. We decompose the variance of such an estimator associated with an arbitrary resampling procedure into a small linear combination of covariances between elementary estimators, each of which is a regular parameter as described in the theory of $U$-statistics. The enumerative combinatorics of the occurrence frequencies of these covariances govern the linear combination's coefficients and, therefore, the variance's large scale behavior. We study the variance of incomplete U-statistics associated with kernels which are partly but not entirely symmetric. This leads to asymptotic statements for the prediction error's estimator, under general non-empirical conditions on the resampling design. In particular, we show that the resampling based estimator of the average prediction error is asymptotically normally distributed under a general and easily verifiable condition. Likewise, we give a sufficient criterion for consistency. We thus develop a new approach to understanding small-variance designs as they have recently appeared in the literature. We exhibit the $U$-statistics which estimate these variances.We present a case from linear regression where the covariances between the elementary estimators can be computed analytically. We illustrate our theory by computing estimators of the studied quantities in an artificial data example.

Rate