In multiple testing problems the notions of power and error control (protected inference) must be redefined to incorporate multiplicity. Currently there are mainly two well known procedures used to adjust for multiplicity. The oldest and most well-known is Bonferroni’s procedure. It provides protected inference by controlling the “family-wise error rate” or probaility that one or more tests distributed as the null are called significant. Put another way, the FWER is the probability of one or more false positives. If \(R\) is the number of tests called significant by the procedure and \(V\) is the number of these distributed as the null (false positives) then the FWER is \(\mathrm{P}(V > 0)\). The second well known procedure, in popular use in the –omics setting since the early 2000’s, is the Benjamini-Hochberg False Discovery Rate Procedure (BH-FDR). It provides protected inference by controlling the False Discovery Rate (FDR), which is the expected value of the false discovery proportion (FDP) \(\mathbf{E}[\mathbf{FDP}] = \mathbf{E}[V/R]\). Not so well-known is the fact that for small to moderate number of tests, \(m<500\), the distribution of the FDP is highly dispersed so that control of the expected FDP doesn’t guarantee much of anything for one multiple testing experiment. Other less common procedures exist for controlling the probability of false discovery exceedance (FDX), \(\mathrm{P}( V/R > \delta )\). One due to Lehmann and Romano requires only very mild assumptions on the conditional distribution of the null distributed statistics given those distributed as the alternative, but is only slightly less conservative than Holm’s procedure (another version of FWER control). My procedure controls the probality of FDX and is only slightly more conservative than the BH-FDR procedure. It allows for dependent test statistics but a correlation structure must be specified.

While it is common in clinical trials involving two or so primary endpoints to power the study under FWER control, the notion of multiple testing power and the variety of ways in which it can be defined is not widely discussed. Let \(T= R-V\) be the number of tests called significant that are distributed as the alternative (true positives). The following definitions of multiple testing power are fairly well-known. The total power is the probability that all true positives are rejected. In order that we can describe this, we must be able to write down the number of tests that are distributed as the alternative. There are two approaches. The frequentist approach posits the number that are null and alternatively distributed while the Bayesian approach posits a prior probability, \(r_1\), that a given test is distributed as the alternative. We use the latter as it is more flexible. Thus we envision that there are prior variables drawn as i.i.d. bernouli 0,1 variables determining to which population each of the test statistics belong, those distributed as the alternative or as the null hypothesis. Thus the total distributed as the alternative is the binomial sum, \(M\). The total power is \(\mathrm{P}(T = M)\). This is typically too high of a hurdle requiring infeasible sample sizes. The average power is the expected value of the true positive proportion, \(T/M\). For a procedure that applies a fixed criterion to all test statitsics, such as the Bonferroni procedure, the average power is the power per test. Like the caveats discussed in connection with protected inference via FDR control, the average power may not guarantee anything specific enough for one multiple testing experiment when the number of tests is less than 500 or so. The TPX power is the probability that the true positive proportion exceeds a given level \(\mathrm{P}( T/M > \lambda)\). This can be determined for any given procedure via simulation. My package calculates it via asymptotic methods.

Arguments

Definition of Power:
- Average Power: the expected value of the true positive proportion, \(\mathrm{AvgPwr}=\mathrm{E}[\mathrm{TPP}]\). This is the most commonly used definition, but for moderate m, when the distribution of the \(\mathrm{TPP}\) is dispersed, is not a reliable summary of its distribution.
- TPX: The true positive exceedance (TPX) is used to define the power. This is the probability that the \(\mathrm{TPP}\) exceeds a given value, \(\mathrm{P}(\mathrm{TPP} > \lambda)\) for some chosen value of \(\lambda\).
- TPX threshold: If the TPX is the chosen power definition, then you must specify the threshold \(\lambda\) mentioned above.
- Power: The desired power.
Type of Protected Inference:
- FDR: In moderate to large m multiple testing problems the most commonly used procedure offering protected inference is the Benjamini-Hochberg False Discovery Rate (FDR) procedure. It promises control of the \(\mathrm{FDR}=\mathrm{E}[FDP]=\mathrm{E}[V/R]\leq \alpha\). However, when \(m\) is moderate, the distribution of the \(\mathrm{FDP}=V/R\) is dispersed so that controlling the FDR, the mean of the FDP distribution, is not reliable.
- FDX CLT: When m is moderate, it is advisable to directly control the tail of the distribution of the \(\mathrm{FDP}\) or false discovery exceedance (\(\mathrm{FDX}\)). That is, we guarantee that \(\mathrm{P}(\mathrm{FDP} > \alpha) < \alpha\). There are two ways to do this. Either asymptotic approximation (this choice) which is less conservative but requires either independence or a known correlation structure and may not have a solution or
- FDX Romano: Procedure due to Lehmann and Romano which also guarantees \(\mathrm{P}(\mathrm{FDP}>\alpha)<\alpha\). More conservative, but requires less assumptions and always works
- Auto: This option lets the function pick the best type of protected inference.
- Protected Inference Rate (alpha): The desired level, \(\alpha\), of control over protected inference.
Effect Size: The common effect size for all tests for which the alternative hypothesis is true.
Prior Prob that HA is true: As mentioned above, in multiple testing problems, some null hypotheses are true and some are false. We operate under a Bayesian model in which the user posits a prior probability that the alternative hypothesis is true.
# Simultaneous Tests: How many multiple tests are there e.g. \(m\).

Plots showing how dispersion in the FDP and TPP increase from negligible for large \(m\) to very worrisome for moderate \(m\) and how the presence of correlated tests worsens this problem.

Much more is possible with the R package, pwrFDR. Have a look at the package vignette Using pwrFDR

pwrFDR Documentation

Arguments

continue