In multiple testing problems the notions of power and error control (protected inference) must be redefined to incorporate multiplicity. Currently there are mainly two well known procedures used to adjust for multiplicity. The oldest and most well-known is Bonferroni’s procedure. It provides protected inference by controlling the “family-wise error rate” or probaility that one or more tests distributed as the null are called significant. Put another way, the FWER is the probability of one or more false positives. If \(R\) is the number of tests called significant by the procedure and \(V\) is the number of these distributed as the null (false positives) then the FWER is \(\mathrm{P}(V > 0)\). The second well known procedure, in popular use in the –omics setting since the early 2000’s, is the Benjamini-Hochberg False Discovery Rate Procedure (BH-FDR). It provides protected inference by controlling the False Discovery Rate (FDR), which is the expected value of the false discovery proportion (FDP) \(\mathbf{E}[\mathbf{FDP}] = \mathbf{E}[V/R]\). Not so well-known is the fact that for small to moderate number of tests, \(m<500\), the distribution of the FDP is highly dispersed so that control of the expected FDP doesn’t guarantee much of anything for one multiple testing experiment. Other less common procedures exist for controlling the probability of false discovery exceedance (FDX), \(\mathrm{P}( V/R > \delta )\). One due to Lehmann and Romano requires only very mild assumptions on the conditional distribution of the null distributed statistics given those distributed as the alternative, but is only slightly less conservative than Holm’s procedure (another version of FWER control). My procedure controls the probality of FDX and is only slightly more conservative than the BH-FDR procedure. It allows for dependent test statistics but a correlation structure must be specified.

While it is common in clinical trials involving two or so primary endpoints to power the study under FWER control, the notion of multiple testing power and the variety of ways in which it can be defined is not widely discussed. Let \(T= R-V\) be the number of tests called significant that are distributed as the alternative (true positives). The following definitions of multiple testing power are fairly well-known. The total power is the probability that all true positives are rejected. In order that we can describe this, we must be able to write down the number of tests that are distributed as the alternative. There are two approaches. The frequentist approach posits the number that are null and alternatively distributed while the Bayesian approach posits a prior probability, \(r_1\), that a given test is distributed as the alternative. We use the latter as it is more flexible. Thus we envision that there are prior variables drawn as i.i.d. bernouli 0,1 variables determining to which population each of the test statistics belong, those distributed as the alternative or as the null hypothesis. Thus the total distributed as the alternative is the binomial sum, \(M\). The total power is \(\mathrm{P}(T = M)\). This is typically too high of a hurdle requiring infeasible sample sizes. The average power is the expected value of the true positive proportion, \(T/M\). For a procedure that applies a fixed criterion to all test statitsics, such as the Bonferroni procedure, the average power is the power per test. Like the caveats discussed in connection with protected inference via FDR control, the average power may not guarantee anything specific enough for one multiple testing experiment when the number of tests is less than 500 or so. The TPX power is the probability that the true positive proportion exceeds a given level \(\mathrm{P}( T/M > \lambda)\). This can be determined for any given procedure via simulation. My package calculates it via asymptotic methods.

Arguments

Plots showing how dispersion in the FDP and TPP increase from negligible for large \(m\) to very worrisome for moderate \(m\) and how the presence of correlated tests worsens this problem.

Much more is possible with the R package, pwrFDR. Have a look at the package vignette Using pwrFDR