Search
Question: DESeq2 normalization prior to identification of highly-expressed functional categories
0
gravatar for Matt
10 months ago by
Matt0
Matt0 wrote:

I've been using DESeq2 for differential expression analysis of microbial (meta)transcriptomic datasets and have been very happy with its performance. I've started to overlay pathway analyses onto these differential expression results to identify functional groupings of genes (via KEGG or SEED) that are over- or under-represented in these DE gene sets. In parallel, I'd also like to be able to take a dataset, order the genes from most- to least-expressed, and look for enrichment of certain functional groupings in the most highly-expressed genes in a given dataset. My question is whether it makes sense to normalize, specifically via a DESeq2-performed size factor, rlog, or vst normalization, prior to ordering the genes from greatest to least expression?

I'm aware of the value of these normalization strategies for preparing datasets for differential expression analyses but would greatly appreciate an opinion on whether these are also appropriate methods for preparing a transcriptional dataset for the types of analysis I described.

ADD COMMENTlink modified 10 months ago by Michael Love15k • written 10 months ago by Matt0
0
gravatar for Michael Love
10 months ago by
Michael Love15k
United States
Michael Love15k wrote:

The gene counts don't tell you about the order of expression across genes. For that you need to estimate a quantity like TPM, where ideally there is normalization for transcript length, fragment length distribution, and various other sample-specific biases. You can estimate TPMs very quickly with software like Salmon, Sailfish, or kallisto, and then import these into R with the tximport package. These are also my preferred way to generate count matrices for DESeq2, as we mention in the current version of the vignette and workflow.

ADD COMMENTlink written 10 months ago by Michael Love15k

Mike,

Thanks. I agree on further consideration that TPM is the appropriate method for comparing different genes within a given library. As for your recommendation on the various software packages and tximport, you don't mean that you are importing TPM as the primary data type for DE analysis, right? This would be an alternative treatment of counts used for analyses other than DE calling, wouldn't they?

 

ADD REPLYlink written 10 months ago by Matt0

Take a look at the tximport vignette and the associated citation for details. In short, if you use the suggested code I've laid out there (you can also find it in the DESeq2 vignette), DESeq2 will use the estimated fragment *counts* summarized to the gene level, and then internally it computes normalization factors for those counts which account for technical biases as well as potential changes in average transcript length per gene across samples. So it's still a count based method, and DESeq2 will round the incoming estimated counts to integers which are stored in counts(dds). The user-facing part is just: point tximport to the quantification files, then use DESeqDataSetFromTximport instead of the other alternative constructor functions.

ADD REPLYlink written 10 months ago by Michael Love15k
Thanks again for the clarification. Using an external package and tximport doesn't fit our current workflow but may as we evolve our analysis strategy. As for your previous recommendations for using TPM to compare genes your advice is well-taken. On Fri, Jan 6, 2017 at 3:41 PM Michael Love [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Michael Love <https: support.bioconductor.org="" u="" 5822=""/> wrote Comment: > DESeq2 normalization prior to identification of highly-expressed functional > categories <https: support.bioconductor.org="" p="" 90840="" #90883="">: > > Take a look at the tximport vignette and the associated citation for > details. In short, if you use the suggested code I've laid out there (you > can also find it in the DESeq2 vignette), DESeq2 will use the estimated > fragment *counts* summarized to the gene level, and then internally it > computes normalization factors for those counts which account for technical > biases as well as potential changes in average transcript length per gene > across samples. So it's still a count based method, and DESeq2 will round > the incoming estimated counts to integers which are stored in counts(dds). > The user-facing part is just: point tximport to the quantification files, > then use DESeqDataSetFromTximport instead of the other alternative > constructor functions. > ------------------------------ > > Post tags: deseq2, pathway analysis, order, normalization > > You may reply via email or visit > C: DESeq2 normalization prior to identification of highly-expressed functional cate >
ADD REPLYlink written 10 months ago by Matt0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 161 users visited in the last hour