Loading web-font TeX/Math/Italic
Optimal Estimation of the Null Distribution in Large-Scale Inference | IEEE Journals & Magazine | IEEE Xplore

Optimal Estimation of the Null Distribution in Large-Scale Inference


Abstract:

The advent of large-scale inference has spurred reexamination of conventional statistical thinking. In a series of highly original articles, Efron persuasively illustrate...Show More

Abstract:

The advent of large-scale inference has spurred reexamination of conventional statistical thinking. In a series of highly original articles, Efron persuasively illustrated the danger for downstream inference in assuming the veracity of a posited null distribution. In a Gaussian model for n many z-scores with at most k \lt \frac {n}{2} nonnulls, Efron suggests estimating the parameters of an empirical null N(\theta , \sigma ^{2}) instead of assuming the theoretical null N(0, 1) . Looking to the robust statistics literature by viewing the nonnulls as outliers is unsatisfactory as the question of optimal rates is still open; even consistency is not known in the regime k \asymp n which is especially relevant to many large-scale inference applications. However, provably rate-optimal robust estimators have been developed in other models (e.g. Huber contamination) which appear quite close to Efron’s proposal. Notably, the impossibility of consistency when k \asymp n in these other models may suggest the same major weakness afflicts Efron’s popularly adopted recommendation. A sound evaluation thus requires a complete understanding of information-theoretic limits. We characterize the regime of k for which consistent estimation is possible, notably without imposing any assumptions at all on the nonnull effects. Unlike in other robust models, it is shown consistent estimation of the location parameter is possible if and only if \frac {n}{2} {-} k = \omega (\sqrt {n}) , and of the scale parameter in the entire regime k \lt \frac {n}{2} . Furthermore, we establish sharp minimax rates and show estimators based on the empirical characteristic function are optimal by exploiting the Gaussian character of the data.
Published in: IEEE Transactions on Information Theory ( Volume: 71, Issue: 3, March 2025)
Page(s): 2075 - 2103
Date of Publication: 14 January 2025

ISSN Information:

Funding Agency:


I. Introduction

Consider the model \begin{equation*} X_{j} = \theta + \gamma _{j} + \sigma Z_{j} \tag {1}\end{equation*} where is the location parameter of interest, are the unknown effects, are the noise variables, and is the scale parameter. We are concerned with estimation of and given observations from (1). We denote the joint distribution of by when the data are given by (1); the expectation with respect to is denoted by . Of course, the parameters of interest are not identifiable in (1) as written. To ensure identifiability, we assume a limited number of are nonzero, that is, .

Contact IEEE to Subscribe

References

References is not available for this document.