Question

Does DEseq2 analysis make GOseq redundant?

0

Entering edit mode

david.rinker ▴ 10

@davidrinker-13538

Last seen 6.6 years ago

I'm running a deferential expression pipeline that first calculates DE with DEseq2 and then looks for enticement in the DE genes. This pipeline seems somewhat common in the literature, so I'm sure it's fine.

While well represented in the literature, I'm wondering if this approach is not redundant? GOseq is supposed to prevent biases introduced by the greater power to detect DE in longer transcripts. However, since the variance stabilization approach of DEseq2 should already have effectively compensated for any gene length effects, it seems like the subsequent use of GOseq could result in more false ~~positives~~ negatives by effectively subjecting longer genes to a second round of increased scrutiny (and possible devaluation ~~rejection~~).

My feeling now is that if a gene makes it through DEseq2 and shows up as being DE, then it should remain in the GO enrichment analysis with no addition weighting based upon length.

EDIT: In my initial wording, I made some very poor word choices and have tried to amend them. My initial question was poorly formed and am hoping that it now is more to the point of what I am asking.

deseq2 goseq gene ontology differential gene expression • 1.7k views

ADD COMMENT • link 6.8 years ago david.rinker ▴ 10

score 1 · Answer 1 · 2017-07-20

1

Entering edit mode

Michael Love 42k

@mikelove

Last seen 4 hours ago

United States

The difference in power persists even after normalization and transformation of counts. This is discussed in the GOseq paper I believe.

ADD COMMENT • link 6.8 years ago Michael Love 42k

score 1 · Answer 2 · 2017-07-20

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

Gene length bias (in an enrichment analysis) and RNA-seq normalization (variance stabilization or otherwise) are quite different things. Normalization doesn't prevent gene length bias, in fact quite the opposite.

DE analysis when done properly will always show a gene length bias because longer genes generate more reads and hence more statistical power, and any efficient DE method will utilize the extra power when it is available.

You seem to assume that goseq subjects genes to a second round of scrutiny to judge which are truly DE but that is not at all how it works. goseq takes the DE list as given. It simply analyses how the DE genes assign to annotation categories, conditional on their lengths. In other words, it does indeed keep all the DE genes in the analysis, just as you say it should.

ADD COMMENT • link 6.8 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks for the response. My language was very imprecise. I meant to ask whether false negatives (in the GO enrichment results) could be introduced by unnecessarily devaluing longer genes?

The way that I understand the paper, GOseq assigns weights based upon size:

"The PWF quantifies how the probability of a gene selected as DE changes as a function of its transcript length."

But if the DE gene set of interest has already been exposed to a prior round of scrutiny using DEseq, the rlog transformation should have already minimized the bias that might be due to length differences.

ADD REPLY • link 6.8 years ago david.rinker ▴ 10

1

Entering edit mode

Gordon and I agree: there is a difference in statistical power across genes. DESeq2's variance stabilizing transformations can't help that genes with higher counts will have more power for DE (note, the VST or rlog are not used in DESeq2 testing routines). Gene sets even of the same size are not equally powered.