Does DEseq2 analysis make GOseq redundant?
2
0
Entering edit mode
david.rinker ▴ 10
@davidrinker-13538
Last seen 7.2 years ago

I'm running a deferential expression pipeline that first calculates DE with DEseq2 and then looks for enticement in the DE genes. This pipeline seems somewhat common in the literature, so I'm sure it's fine.

While well represented in the literature, I'm wondering if this approach is not redundant? GOseq is supposed to prevent biases introduced by the greater power to detect DE in longer transcripts. However, since the variance stabilization approach of DEseq2 should already have effectively compensated for any gene length effects, it seems like the subsequent use of GOseq could result in more false positives  negatives by effectively subjecting longer genes to a second round of increased scrutiny (and possible devaluation rejection).

My feeling now is that if a gene makes it through DEseq2 and shows up as being DE, then it should remain in the GO enrichment analysis with no addition weighting based upon length.

EDIT: In my initial wording, I made some very poor word choices and have tried to amend them.  My initial question was poorly formed and am hoping that it now is more to the point of what I am asking.

deseq2 goseq gene ontology differential gene expression • 1.9k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States
The difference in power persists even after normalization and transformation of counts. This is discussed in the GOseq paper I believe.
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Gene length bias (in an enrichment analysis) and RNA-seq normalization (variance stabilization or otherwise) are quite different things. Normalization doesn't prevent gene length bias, in fact quite the opposite.

DE analysis when done properly will always show a gene length bias because longer genes generate more reads and hence more statistical power, and any efficient DE method will utilize the extra power when it is available.

You seem to assume that goseq subjects genes to a second round of scrutiny to judge which are truly DE but that is not at all how it works. goseq takes the DE list as given. It simply analyses how the DE genes assign to annotation categories, conditional on their lengths. In other words, it does indeed keep all the DE genes in the analysis, just as you say it should.

ADD COMMENT
0
Entering edit mode

Thanks for the response. My language was very imprecise. I meant to ask whether false negatives (in the GO enrichment results)  could be introduced by unnecessarily devaluing longer genes?

The way that I understand the paper, GOseq assigns weights based upon size:

 "The PWF quantifies how the probability of a gene selected as DE changes as a function of its transcript length."

But if the DE gene set of interest has already been exposed to a prior round of scrutiny using DEseq, the rlog transformation should have already minimized the bias that might be due to length differences. 

ADD REPLY
1
Entering edit mode

Gordon and I agree: there is a difference in statistical power across genes. DESeq2's variance stabilizing transformations can't help that genes with higher counts will have more power for DE (note, the VST or rlog are not used in DESeq2 testing routines). Gene sets even of the same size are not equally powered.

ADD REPLY
0
Entering edit mode

Ok.  Thanks to you both for helping to clarify this. Much appreciated!

ADD REPLY

Login before adding your answer.

Traffic: 454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6