DESeq2 normalization prior to identification of highly-expressed functional categories
1
0
Entering edit mode
Matt • 0
@matt-12117
Last seen 6.1 years ago

I've been using DESeq2 for differential expression analysis of microbial (meta)transcriptomic datasets and have been very happy with its performance. I've started to overlay pathway analyses onto these differential expression results to identify functional groupings of genes (via KEGG or SEED) that are over- or under-represented in these DE gene sets. In parallel, I'd also like to be able to take a dataset, order the genes from most- to least-expressed, and look for enrichment of certain functional groupings in the most highly-expressed genes in a given dataset. My question is whether it makes sense to normalize, specifically via a DESeq2-performed size factor, rlog, or vst normalization, prior to ordering the genes from greatest to least expression?

I'm aware of the value of these normalization strategies for preparing datasets for differential expression analyses but would greatly appreciate an opinion on whether these are also appropriate methods for preparing a transcriptional dataset for the types of analysis I described.

deseq2 pathway analysis order normalization • 1.3k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 23 minutes ago
United States

The gene counts don't tell you about the order of expression across genes. For that you need to estimate a quantity like TPM, where ideally there is normalization for transcript length, fragment length distribution, and various other sample-specific biases. You can estimate TPMs very quickly with software like Salmon, Sailfish, or kallisto, and then import these into R with the tximport package. These are also my preferred way to generate count matrices for DESeq2, as we mention in the current version of the vignette and workflow.

ADD COMMENT
0
Entering edit mode

Mike,

Thanks. I agree on further consideration that TPM is the appropriate method for comparing different genes within a given library. As for your recommendation on the various software packages and tximport, you don't mean that you are importing TPM as the primary data type for DE analysis, right? This would be an alternative treatment of counts used for analyses other than DE calling, wouldn't they?

 

ADD REPLY
0
Entering edit mode

Take a look at the tximport vignette and the associated citation for details. In short, if you use the suggested code I've laid out there (you can also find it in the DESeq2 vignette), DESeq2 will use the estimated fragment *counts* summarized to the gene level, and then internally it computes normalization factors for those counts which account for technical biases as well as potential changes in average transcript length per gene across samples. So it's still a count based method, and DESeq2 will round the incoming estimated counts to integers which are stored in counts(dds). The user-facing part is just: point tximport to the quantification files, then use DESeqDataSetFromTximport instead of the other alternative constructor functions.

ADD REPLY
0
Entering edit mode
Thanks again for the clarification. Using an external package and tximport doesn't fit our current workflow but may as we evolve our analysis strategy. As for your previous recommendations for using TPM to compare genes your advice is well-taken. On Fri, Jan 6, 2017 at 3:41 PM Michael Love [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Michael Love <https: support.bioconductor.org="" u="" 5822=""/> wrote Comment: > DESeq2 normalization prior to identification of highly-expressed functional > categories <https: support.bioconductor.org="" p="" 90840="" #90883="">: > > Take a look at the tximport vignette and the associated citation for > details. In short, if you use the suggested code I've laid out there (you > can also find it in the DESeq2 vignette), DESeq2 will use the estimated > fragment *counts* summarized to the gene level, and then internally it > computes normalization factors for those counts which account for technical > biases as well as potential changes in average transcript length per gene > across samples. So it's still a count based method, and DESeq2 will round > the incoming estimated counts to integers which are stored in counts(dds). > The user-facing part is just: point tximport to the quantification files, > then use DESeqDataSetFromTximport instead of the other alternative > constructor functions. > ------------------------------ > > Post tags: deseq2, pathway analysis, order, normalization > > You may reply via email or visit > C: DESeq2 normalization prior to identification of highly-expressed functional cate >
ADD REPLY

Login before adding your answer.

Traffic: 1053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6