Question: TopTable vs DecideTests: can I just extract DEs with decidetest rowsums !=0?
0
gravatar for rc1253
9 months ago by
rc12530
rc12530 wrote:

Hi everyone,

I am looking for critique on the methods I've used to find DEGs in my data.

I initially used Ebayes with 'BH' FDR adjustment but found this stringency wasn't enough. I switched to trying bonferroni but had issues with my decidetests and toptable results not matching?

dt <-decideTests(efit, method = "global", adjust.method = "bonferroni", p.value = 0.01)

Degs <- topTable(efit, number = Inf, adjust.method = "bonferroni", p.value = 0.01)

When I subset the write.fit(efit, dt) in regards to the toptable results its apparent that there are many genes being selected by TopTable that have all 0's in the decidetests.

Instead of using TopTable should I just subset by rowsums!=0 from the decidetests?

Cheers.

limma • 337 views
ADD COMMENTlink modified 9 months ago by Aaron Lun25k • written 9 months ago by rc12530
Answer: TopTable vs DecideTests: can I just extract DEs with decidetest rowsums !=0?
2
gravatar for Aaron Lun
9 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

You don't mention your experimental design, but decideTests and topTable are probably not even testing the same null hypothesis. decideTests will give you a matrix where each column corresponds to the null hypothesis for each coefficient in your design matrix. By comparison, topTable will do an ANOVA testing the null hypothesis that all (non-intercept) coefficients are equal to zero. An additional difference is that decideTests will apply the multiple testing correction across all gene/coefficient combinations, while the ANOVA in topTable only gives one p-value per gene and thus applies the correction across genes only. So, before you do anything else, you should clarify to yourself what it is you actually want to test.

Having said all that, your multiple testing correction strategy is rather silly for several reasons.

  • If you want a more stringent threshold... just lower the FDR threshold! If 5% is too high, just use 1%, or 0.1%, or whatever. No need to switch to a different correction method, especially if you do not appreciate the difference between FDR control (Benjamini-Hochberg) and strong FWER control (Bonferroni).
  • If you must achieve strong FWER control across the set of detected genes, use Holm's method by setting adjust.method="holm", as this dominates Bonferroni under all circumstances (see comments in ?p.adjust).
  • If you want more stringent defintions of DE genes, use treat and topTreat with a non-zero log-fold change threshold.
ADD COMMENTlink modified 9 months ago • written 9 months ago by Aaron Lun25k

Hi Aaron,

Thank you for the response.

My design is:

z <- two conditions across 16-time points.
design <- model.matrix(~0 + z) (basically testing DE at each time point between control and treated conditions)

That makes sense about them having a different hypothesis, decidetests is probably better then if I want to find genes differentially expressed in at least one-time point as it is testing at each time point(coefficient) separately.

I have since tried to perform the Treat methodology if you'd be able to take a look:

tfit <- treat(vfit, lfc = log2(1.5))

dt_t <-decideTests(tfit, method = "global", adjust.method = "BH", p.value = 0.01)

write.fit(tfit, dtt, file="../outputtreat/treatresults").
treat
degs <- read.csv(file="../outputtreat/treatresults", sep = "")

treatdegs <- treatdegs[rowSums(treat_degs[,50:65])!=0,] (these rows correspond to the decidetest results)

Essentially I have ended up with half the genes as before which is good.

ADD REPLYlink modified 9 months ago • written 9 months ago by rc12530
1

I hope you did a contrasts.fit somewhere before calling treat, otherwise your tests won't make any sense.

I should also add that the rowSums approach is not-quite-right in terms of FDR control of the resulting set of genes. This is because the BH method is applied globally to the set of gene- and coefficient-specific tests. There is no guarantee that the FDR across genes is controlled at the nominal threshold, or even close. For example:

set.seed(0)
pvalues <- matrix(runif(100000), ncol=10) # 10 contrasts, one per column.
pvalues[1:1000,] <- 1e-8 # true positives

adj.p <- pvalues
adj.p[] <- p.adjust(pvalues, method="BH") # global BH.

is.sig <- rowSums(adj.p <= 0.05) > 0 # significant in any contrast.
sum(is.sig[-(1:1000)])/sum(is.sig)

When I run the above code, I get an actual FDR of ~30%, which is much greater than my nominal FDR of 5% - not good. The correct way to do things would be to properly combine the p-values across contrasts for each gene, using Simes' method:

pval.list <- lapply(1:ncol(pvalues), FUN=function(i) { pvalues[,i] })
per.gene <- do.call(scran::combinePValues, c(pval.list, list(method="simes")))
adj.per.gene <- p.adjust(per.gene, method="BH")
sig.per.gene <- adj.per.gene <= 0.05
sum(sig.per.gene[-(1:1000)])/sum(sig.per.gene) # 0.043, which is better.

Another issue with rowSums that I forgot to mention earlier is that decideTests will happily give values of differing signs in the same row, depending on the direction of the log-fold change for each contrast. This means that your rowSums approach will fail to detect genes that change in opposite directions in an equal number of contrasts. You might say that this is intended behaviour but I would find that weird; it means that you'd be happy to retain a gene that goes up in 8 time points and goes down in 7 time points, but not if it goes down in 8 time points.

ADD REPLYlink modified 9 months ago • written 9 months ago by Aaron Lun25k

Hi Aaron,

Thanks again for the detailed response, you're a great help to this bioinformatics newbie!

I did indeed run a contrast matrix:

contr.matrix <- makeContrasts(
tp09 = Infected09 - Mock09,
tp12 = Infected12 - Mock12,
tp15 = Infected15 - Mock15,
tp18 = Infected18 - Mock18,
tp21 = Infected21 - Mock21,
tp24 = Infected24 - Mock24,
tp27 = Infected27 - Mock27,
tp30 = Infected30 - Mock30,
tp33 = Infected33 - Mock33,
tp36 = Infected36 - Mock36,
tp39 = Infected39 - Mock39,
tp42 = Infected42 - Mock42,
tp45 = Infected45 - Mock45,
tp48 = Infected48 - Mock48,
tp51 = Infected51 - Mock51,
tp54 = Infected54 - Mock54,
levels = design1
)

That makes sense about rowSums, thank you I hadn't realised the FDR method was global across all the contrasts for a given gene.

I have attempted to replicate your code with the matrix of P-values in the object generated by Treat:

treatpvalues <- tfit$p.value
treat
pval.list <- lapply(1:ncol(treatpvalues), FUN=function(i) { treatpvalues[,i] })
treatper.gene <- do.call(scran::combinePValues, c(treatpval.list, list(method="simes")))
treatadj.per.gene <- p.adjust(treatper.gene, method="BH")
treatsig.per.gene <- treatadj.per.gene <= 0.01

I changed the cutoff to 0.01 for a more stringent selection of DE genes.

Does this look better?

Thanks so much for your help.

ADD REPLYlink modified 9 months ago • written 9 months ago by rc12530
1

That looks fine to me.

ADD REPLYlink written 9 months ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 399 users visited in the last hour