Question: Does DEseq2 analysis make GOseq redundant?
gravatar for david.rinker
11 months ago by
david.rinker10 wrote:

I'm running a deferential expression pipeline that first calculates DE with DEseq2 and then looks for enticement in the DE genes. This pipeline seems somewhat common in the literature, so I'm sure it's fine.

While well represented in the literature, I'm wondering if this approach is not redundant? GOseq is supposed to prevent biases introduced by the greater power to detect DE in longer transcripts. However, since the variance stabilization approach of DEseq2 should already have effectively compensated for any gene length effects, it seems like the subsequent use of GOseq could result in more false positives  negatives by effectively subjecting longer genes to a second round of increased scrutiny (and possible devaluation rejection).

My feeling now is that if a gene makes it through DEseq2 and shows up as being DE, then it should remain in the GO enrichment analysis with no addition weighting based upon length.

EDIT: In my initial wording, I made some very poor word choices and have tried to amend them.  My initial question was poorly formed and am hoping that it now is more to the point of what I am asking.

ADD COMMENTlink modified 11 months ago • written 11 months ago by david.rinker10
gravatar for Michael Love
11 months ago by
Michael Love18k
United States
Michael Love18k wrote:
The difference in power persists even after normalization and transformation of counts. This is discussed in the GOseq paper I believe.
ADD COMMENTlink written 11 months ago by Michael Love18k
gravatar for Gordon Smyth
11 months ago by
Gordon Smyth33k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth33k wrote:

Gene length bias (in an enrichment analysis) and RNA-seq normalization (variance stabilization or otherwise) are quite different things. Normalization doesn't prevent gene length bias, in fact quite the opposite.

DE analysis when done properly will always show a gene length bias because longer genes generate more reads and hence more statistical power, and any efficient DE method will utilize the extra power when it is available.

You seem to assume that goseq subjects genes to a second round of scrutiny to judge which are truly DE but that is not at all how it works. goseq takes the DE list as given. It simply analyses how the DE genes assign to annotation categories, conditional on their lengths. In other words, it does indeed keep all the DE genes in the analysis, just as you say it should.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Gordon Smyth33k

Thanks for the response. My language was very imprecise. I meant to ask whether false negatives (in the GO enrichment results)  could be introduced by unnecessarily devaluing longer genes?

The way that I understand the paper, GOseq assigns weights based upon size:

 "The PWF quantifies how the probability of a gene selected as DE changes as a function of its transcript length."

But if the DE gene set of interest has already been exposed to a prior round of scrutiny using DEseq, the rlog transformation should have already minimized the bias that might be due to length differences. 

ADD REPLYlink written 11 months ago by david.rinker10

Gordon and I agree: there is a difference in statistical power across genes. DESeq2's variance stabilizing transformations can't help that genes with higher counts will have more power for DE (note, the VST or rlog are not used in DESeq2 testing routines). Gene sets even of the same size are not equally powered.

ADD REPLYlink written 11 months ago by Michael Love18k

Ok.  Thanks to you both for helping to clarify this. Much appreciated!

ADD REPLYlink written 11 months ago by david.rinker10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 131 users visited in the last hour