Question

WHat is the difference between DEG and TopVarGenes in DESEq2?

0

Entering edit mode

ecg1g15 ▴ 30

@ecg1g15-19970

Last seen 4.3 years ago

I am interested to selecte the top genes that carry signal in the dataset. What is the difference between the most variable genes across samples from these two functions?

library("genefilter")
topVarGenes <- head(order(-rowVars(assay(vsd))),30)

#Different from the top 30 from here (some are, some aren't): 
DEG <- subset(res, padj <0.1)

Is this because res have the results from the DESeq2 without the VST transformation?
- Would it make any sense to obtain the DEG after vst to obtain the most variable genes after normalisation? (this is what topVarGenes does)
- If they are different, which one are to use for what?

DESeq2 R RNASeqR • 4.6k views

ADD COMMENT • link 4.3 years ago ecg1g15 ▴ 30

1

Entering edit mode

Hi, were you not asking these same questions on Biostars, or am I confusing myself?

ADD REPLY • link 4.3 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Sorry, I did, I am just overthinking as I am working with a complex environmental dataset and there are infinite ways of analysing the data depending on the questions, so I should simplify the objective and stick to it.

ADD REPLY • link 4.3 years ago ecg1g15 ▴ 30

score 5 · Accepted Answer · 2020-12-01

5

Entering edit mode

ATpoint ★ 4.7k

@atpoint-13662

Last seen 1 day ago

Germany

That is actually pretty simple. rowVars selects genes based on the row-wise variance, and the DEG results are based on the entire differential expression framework which is described both in the paper and the vignette. The latter would be more reliable in terms of which genes have actual statistical support to be DE, and the first one is more of a quick/dirty feature selection, e.g. for exploratory purposes such as PCA. The DEG have nothing to do with the vst function whereas it makes sense to run rowVars on the output of vst as this corrects for the biased mean/variance trend and puts data on the log scale. If you want DEGs then follow the standard pipeline in the vignette (which explicitely mentions that vst is not used throughout the DE pipeline), and if you want a quick feature selection for QC / exploration then use rowVars.

Edit: Oh yeah, asked before: https://www.biostars.org/p/472945/

What is unclear, what don't you understand, there are in fact multiple threads where you asked/commented on that over at biostars. Don't overthink it. I always do it like this: Use the top variable genes for a quick initial QC, and for anything else use the DEGs (heatmaps, classifications...). Does that make sense to you?

ADD COMMENT • link 4.3 years ago ATpoint ★ 4.7k

0

Entering edit mode

Apologies as I had asked it before (what questions should be adresses here or in Biostars?)

I believe I understood, whereas the vst applies a log-similar transformation for visualisation and clustering, for differential expression and other analysis, the DESEq has a count normalised step to library depth.

Therefore, to extract a data.frame containing the normalised Deseq counts, how do I do it?

# This seems to give me the original counta table
norm_counts_dds <- as.data.frame(assay(dds))

ADD REPLY • link 4.3 years ago ecg1g15 ▴ 30

1

Entering edit mode

I am not a Bioc developer but I personally think technical questions that require the very expertise of the developers should be posted here, and everything else over at more general fora such as biostars or StackExchange Bioinformatics.

What you ask is written in the manual. Make it easy, just use vst.

ADD REPLY • link 4.3 years ago ATpoint ★ 4.7k

0

Entering edit mode

``` norm_counts_dds <- counts(dds, normalized=TRUE)

ADD REPLY • link 4.3 years ago ecg1g15 ▴ 30