In multiple testing problems the notions of power and protected inference must be redefined to incorporate multiplicity. The first distinction between single hypothesis testing and multiple testing is the fact that some null hypotheses are true and some are false. We operate under a Bayesian model in which we posit a prior probability, r.1, that the alternative hypothesis is true. Suppose that in your experiment, the alternative hypothesis is true for a random number, \(M\), of the \(m\) tests. Suppose we apply a given procedure and reject the null hypothesis for a random number of tests, \(R\), and that of these, the alternative hypothesis is true for \(T\) and the null hypothesis is true for \(V=R-T\). The proportion of null hypotheses rejected for which the null hypothesis is true, \(\mathrm{FDP}=V/R\), is called the false discovery proportion. Its expected value, \(\mathrm{FDR}=\mathrm{E}[\mathrm{FDP}]\), is called the false discovery rate. The proportion of test statistics for which the alternative hypothesis is true for which the null hypothesis is rejected, \(\mathrm{TPP}=T/M\), is called the true positive proportion.
Arguments
- Definition of Power:
- Average Power: the expected value of the true positive proportion, \(\mathrm{AvgPwr}=\mathrm{E}[\mathrm{TPP}]\). This is the most commonly used definition, but for moderate m, when the distribution of the \(\mathrm{TPP}\) is dispersed, is not a reliable summary of its distribution.
- TPX: The true positive exceedance (TPX) is used to define the power. This is the probability that the \(\mathrm{TPP}\) exceeds a given value, \(\mathrm{P}(\mathrm{TPP} > \lambda)\) for some chosen value of \(\lambda\).
- TPX threshold: If the TPX is the chosen power definition, then you must specify the threshold \(\lambda\) mentioned above.
- Power: The desired power.
- Type of Protected Inference:
- FDR: In moderate to large m multiple testing problems the most commonly used procedure offering protected inference is the Benjamini-Hochberg False Discovery Rate (FDR) procedure. It promises control of the \(\mathrm{FDR}=\mathrm{E}[FDP]=\mathrm{E}[V/R]\leq \alpha\). However, when \(m\) is moderate, the distribution of the \(\mathrm{FDP}=V/R\) is dispersed so that controlling the FDR, the mean of the FDP distribution, is not reliable.
- FDX CLT: When m is moderate, it is advisable to directly control the tail of the distribution of the \(\mathrm{FDP}\) or false discovery exceedance (\(\mathrm{FDX}\)). That is, we guarantee that \(\mathrm{P}(\mathrm{FDP} > \alpha) < \alpha\). There are two ways to do this. Either asymptotic approximation (this choice) which is less conservative but requires either independence or a known correlation structure and may not have a solution or
- FDX Romano: Procedure due to Lehmann and Romano which also guarantees \(\mathrm{P}(\mathrm{FDP}>\alpha)<\alpha\). More conservative, but requires less assumptions and always works
- Auto: This option lets the function pick the best type of protected inference.
- Protected Inference Rate (alpha): The desired level, \(\alpha\), of control over protected inference.
- Effect Size: The common effect size for all tests for which the alternative hypothesis is true.
- Prior Prob that HA is true: As mentioned above, in multiple testing problems, some null hypotheses are true and some are false. We operate under a Bayesian model in which the user posits a prior probability that the alternative hypothesis is true.
- # Simultaneous Tests: How many multiple tests are there e.g. \(m\).
Plots showing how dispersion in the FDP and TPP increase from negligible for large \(m\) to very worrisome for moderate \(m\) and how the presence of correlated tests worsens this problem.
Much more is possible with the R package, pwrFDR