GSEA preranked analysis downstream of DEseq2
1
1
Entering edit mode
EJ ▴ 10
@ej-11019
Last seen 4.9 years ago
USA, Boston, Harvard Medical School

Hi,

I have generated the differential expression results for my RNAseq data using DEseq2.  Based on my research, there are two ways of generating the GSEA preranked list: 1) by log2FC; 2) by p value.

Each metrics suffers from certain shortcomings. For example, genes ranked by log2FC are biased by bigger variance in genes with low counts while genes ranked by p value are biased by genes with higher abundance and longer transcripts.  

I have a thought - is it possible to weight log2FC by p value or padj and then generate the GSEA ranked list? For example, gene A and B have the same log2FC but has different p values, A with smaller p value and B with bigger p value.  We will then add more weight to gene A than gene B based on their p values.  Does this make sense or completely statistically wrong? If it makes sense, what mathematical formula should be used to perform this transformation using log2FC and p value?? 

Thank you!!!

deseq2 gsea rnaseq • 5.3k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 6 hours ago
United States

hi EJ,

A few comments:

"genes ranked by log2FC are biased by bigger variance in genes with low counts"

Note that this is not the case for DESeq2 log fold changes -- a unique property of our using Bayesian posterior estimates for LFC. See the DESeq2 paper or vignette, and examine an MA plot.

"while genes ranked by p value are biased by genes with higher abundance and longer transcripts"

For this consideration, you can use goseq following DESeq2. This method is specially designed to address this problem. There are a few posts on the support site on how to use goseq after DESeq2. I haven't had time to do any comparative analysis on the best methods for gene-set testing after DESeq2. I think goseq is the downstream method that I see most often used.

I do like the idea of methods that use the LFC or t-test, and aggregate across the genes in the set, which allows one to detect, at the level of gene set, when there is an abundance of marginal signal for each gene. I haven't had time to implement something for DESeq2 LFCs, although it's something I'm thinking of. 

You might take a look also at the ROAST and CAMERA methods which are available in limma.

ADD COMMENT

Login before adding your answer.

Traffic: 305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6