TopTable vs DecideTests: can I just extract DEs with decidetest rowsums !=0?
1
0
Entering edit mode
rc1253 • 0
@rc1253-19367
Last seen 4.9 years ago

Hi everyone,

I am looking for critique on the methods I've used to find DEGs in my data.

I initially used Ebayes with 'BH' FDR adjustment but found this stringency wasn't enough. I switched to trying bonferroni but had issues with my decidetests and toptable results not matching?

dt <-decideTests(efit, method = "global", adjust.method = "bonferroni", p.value = 0.01)

Degs <- topTable(efit, number = Inf, adjust.method = "bonferroni", p.value = 0.01)

When I subset the write.fit(efit, dt) in regards to the toptable results its apparent that there are many genes being selected by TopTable that have all 0's in the decidetests.

Instead of using TopTable should I just subset by rowsums!=0 from the decidetests?

Cheers.

limma • 2.4k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

You don't mention your experimental design, but decideTests and topTable are probably not even testing the same null hypothesis. decideTests will give you a matrix where each column corresponds to the null hypothesis for each coefficient in your design matrix. By comparison, topTable will do an ANOVA testing the null hypothesis that all (non-intercept) coefficients are equal to zero. An additional difference is that decideTests will apply the multiple testing correction across all gene/coefficient combinations, while the ANOVA in topTable only gives one p-value per gene and thus applies the correction across genes only. So, before you do anything else, you should clarify to yourself what it is you actually want to test.

Having said all that, your multiple testing correction strategy is rather silly for several reasons.

  • If you want a more stringent threshold... just lower the FDR threshold! If 5% is too high, just use 1%, or 0.1%, or whatever. No need to switch to a different correction method, especially if you do not appreciate the difference between FDR control (Benjamini-Hochberg) and strong FWER control (Bonferroni).
  • If you must achieve strong FWER control across the set of detected genes, use Holm's method by setting adjust.method="holm", as this dominates Bonferroni under all circumstances (see comments in ?p.adjust).
  • If you want more stringent defintions of DE genes, use treat and topTreat with a non-zero log-fold change threshold.
ADD COMMENT
0
Entering edit mode

Hi Aaron,

Thank you for the response.

My design is:

z <- two conditions across 16-time points.
design <- model.matrix(~0 + z) (basically testing DE at each time point between control and treated conditions)

That makes sense about them having a different hypothesis, decidetests is probably better then if I want to find genes differentially expressed in at least one-time point as it is testing at each time point(coefficient) separately.

I have since tried to perform the Treat methodology if you'd be able to take a look:

tfit <- treat(vfit, lfc = log2(1.5))

dt_t <-decideTests(tfit, method = "global", adjust.method = "BH", p.value = 0.01)

write.fit(tfit, dtt, file="../outputtreat/treatresults").
treat
degs <- read.csv(file="../outputtreat/treatresults", sep = "")

treatdegs <- treatdegs[rowSums(treat_degs[,50:65])!=0,] (these rows correspond to the decidetest results)

Essentially I have ended up with half the genes as before which is good.

ADD REPLY
1
Entering edit mode

I hope you did a contrasts.fit somewhere before calling treat, otherwise your tests won't make any sense.

I should also add that the rowSums approach is not-quite-right in terms of FDR control of the resulting set of genes. This is because the BH method is applied globally to the set of gene- and coefficient-specific tests. There is no guarantee that the FDR across genes is controlled at the nominal threshold, or even close. For example:

set.seed(0)
pvalues <- matrix(runif(100000), ncol=10) # 10 contrasts, one per column.
pvalues[1:1000,] <- 1e-8 # true positives

adj.p <- pvalues
adj.p[] <- p.adjust(pvalues, method="BH") # global BH.

is.sig <- rowSums(adj.p <= 0.05) > 0 # significant in any contrast.
sum(is.sig[-(1:1000)])/sum(is.sig)

When I run the above code, I get an actual FDR of ~30%, which is much greater than my nominal FDR of 5% - not good. The correct way to do things would be to properly combine the p-values across contrasts for each gene, using Simes' method:

pval.list <- lapply(1:ncol(pvalues), FUN=function(i) { pvalues[,i] })
per.gene <- do.call(scran::combinePValues, c(pval.list, list(method="simes")))
adj.per.gene <- p.adjust(per.gene, method="BH")
sig.per.gene <- adj.per.gene <= 0.05
sum(sig.per.gene[-(1:1000)])/sum(sig.per.gene) # 0.043, which is better.

Another issue with rowSums that I forgot to mention earlier is that decideTests will happily give values of differing signs in the same row, depending on the direction of the log-fold change for each contrast. This means that your rowSums approach will fail to detect genes that change in opposite directions in an equal number of contrasts. You might say that this is intended behaviour but I would find that weird; it means that you'd be happy to retain a gene that goes up in 8 time points and goes down in 7 time points, but not if it goes down in 8 time points.

ADD REPLY
0
Entering edit mode

Hi Aaron,

Thanks again for the detailed response, you're a great help to this bioinformatics newbie!

I did indeed run a contrast matrix:

contr.matrix <- makeContrasts(
tp09 = Infected09 - Mock09,
tp12 = Infected12 - Mock12,
tp15 = Infected15 - Mock15,
tp18 = Infected18 - Mock18,
tp21 = Infected21 - Mock21,
tp24 = Infected24 - Mock24,
tp27 = Infected27 - Mock27,
tp30 = Infected30 - Mock30,
tp33 = Infected33 - Mock33,
tp36 = Infected36 - Mock36,
tp39 = Infected39 - Mock39,
tp42 = Infected42 - Mock42,
tp45 = Infected45 - Mock45,
tp48 = Infected48 - Mock48,
tp51 = Infected51 - Mock51,
tp54 = Infected54 - Mock54,
levels = design1
)

That makes sense about rowSums, thank you I hadn't realised the FDR method was global across all the contrasts for a given gene.

I have attempted to replicate your code with the matrix of P-values in the object generated by Treat:

treatpvalues <- tfit$p.value
treat
pval.list <- lapply(1:ncol(treatpvalues), FUN=function(i) { treatpvalues[,i] })
treatper.gene <- do.call(scran::combinePValues, c(treatpval.list, list(method="simes")))
treatadj.per.gene <- p.adjust(treatper.gene, method="BH")
treatsig.per.gene <- treatadj.per.gene <= 0.01

I changed the cutoff to 0.01 for a more stringent selection of DE genes.

Does this look better?

Thanks so much for your help.

ADD REPLY
1
Entering edit mode

That looks fine to me.

ADD REPLY

Login before adding your answer.

Traffic: 720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6