Hi everyone,
I am looking for critique on the methods I've used to find DEGs in my data.
I initially used Ebayes with 'BH' FDR adjustment but found this stringency wasn't enough. I switched to trying bonferroni but had issues with my decidetests and toptable results not matching?
dt <-decideTests(efit, method = "global", adjust.method = "bonferroni", p.value = 0.01)
Degs <- topTable(efit, number = Inf, adjust.method = "bonferroni", p.value = 0.01)
When I subset the write.fit(efit, dt) in regards to the toptable results its apparent that there are many genes being selected by TopTable that have all 0's in the decidetests.
Instead of using TopTable should I just subset by rowsums!=0 from the decidetests?
Cheers.
Hi Aaron,
Thank you for the response.
My design is:
z <- two conditions across 16-time points.
design <- model.matrix(~0 + z) (basically testing DE at each time point between control and treated conditions)
That makes sense about them having a different hypothesis, decidetests is probably better then if I want to find genes differentially expressed in at least one-time point as it is testing at each time point(coefficient) separately.
I have since tried to perform the Treat methodology if you'd be able to take a look:
tfit <- treat(vfit, lfc = log2(1.5))
dt_t <-decideTests(tfit, method = "global", adjust.method = "BH", p.value = 0.01)
write.fit(tfit, dtt, file="../outputtreat/treatresults").
treatdegs <- read.csv(file="../outputtreat/treatresults", sep = "")
treatdegs <- treatdegs[rowSums(treat_degs[,50:65])!=0,] (these rows correspond to the decidetest results)
Essentially I have ended up with half the genes as before which is good.
I hope you did a
contrasts.fit
somewhere before callingtreat
, otherwise your tests won't make any sense.I should also add that the
rowSums
approach is not-quite-right in terms of FDR control of the resulting set of genes. This is because the BH method is applied globally to the set of gene- and coefficient-specific tests. There is no guarantee that the FDR across genes is controlled at the nominal threshold, or even close. For example:When I run the above code, I get an actual FDR of ~30%, which is much greater than my nominal FDR of 5% - not good. The correct way to do things would be to properly combine the p-values across contrasts for each gene, using Simes' method:
Another issue with
rowSums
that I forgot to mention earlier is thatdecideTests
will happily give values of differing signs in the same row, depending on the direction of the log-fold change for each contrast. This means that yourrowSums
approach will fail to detect genes that change in opposite directions in an equal number of contrasts. You might say that this is intended behaviour but I would find that weird; it means that you'd be happy to retain a gene that goes up in 8 time points and goes down in 7 time points, but not if it goes down in 8 time points.Hi Aaron,
Thanks again for the detailed response, you're a great help to this bioinformatics newbie!
I did indeed run a contrast matrix:
contr.matrix <- makeContrasts(
tp09 = Infected09 - Mock09,
tp12 = Infected12 - Mock12,
tp15 = Infected15 - Mock15,
tp18 = Infected18 - Mock18,
tp21 = Infected21 - Mock21,
tp24 = Infected24 - Mock24,
tp27 = Infected27 - Mock27,
tp30 = Infected30 - Mock30,
tp33 = Infected33 - Mock33,
tp36 = Infected36 - Mock36,
tp39 = Infected39 - Mock39,
tp42 = Infected42 - Mock42,
tp45 = Infected45 - Mock45,
tp48 = Infected48 - Mock48,
tp51 = Infected51 - Mock51,
tp54 = Infected54 - Mock54,
levels = design1
)
That makes sense about rowSums, thank you I hadn't realised the FDR method was global across all the contrasts for a given gene.
I have attempted to replicate your code with the matrix of P-values in the object generated by Treat:
treatpvalues <- tfit$p.value
treatpval.list <- lapply(1:ncol(treatpvalues), FUN=function(i) { treatpvalues[,i] })
treatper.gene <- do.call(scran::combinePValues, c(treatpval.list, list(method="simes")))
treatadj.per.gene <- p.adjust(treatper.gene, method="BH")
treatsig.per.gene <- treatadj.per.gene <= 0.01
I changed the cutoff to 0.01 for a more stringent selection of DE genes.
Does this look better?
Thanks so much for your help.
That looks fine to me.