Search
Question: GSEA preranked analysis downstream of DEseq2
0
gravatar for EJ
15 months ago by
EJ0
USA, Boston, Harvard Medical School
EJ0 wrote:

Hi,

I have generated the differential expression results for my RNAseq data using DEseq2.  Based on my research, there are two ways of generating the GSEA preranked list: 1) by log2FC; 2) by p value.

Each metrics suffers from certain shortcomings. For example, genes ranked by log2FC are biased by bigger variance in genes with low counts while genes ranked by p value are biased by genes with higher abundance and longer transcripts.  

I have a thought - is it possible to weight log2FC by p value or padj and then generate the GSEA ranked list? For example, gene A and B have the same log2FC but has different p values, A with smaller p value and B with bigger p value.  We will then add more weight to gene A than gene B based on their p values.  Does this make sense or completely statistically wrong? If it makes sense, what mathematical formula should be used to perform this transformation using log2FC and p value?? 

Thank you!!!

ADD COMMENTlink modified 15 months ago by Michael Love14k • written 15 months ago by EJ0
1
gravatar for Michael Love
15 months ago by
Michael Love14k
United States
Michael Love14k wrote:

hi EJ,

A few comments:

"genes ranked by log2FC are biased by bigger variance in genes with low counts"

Note that this is not the case for DESeq2 log fold changes -- a unique property of our using Bayesian posterior estimates for LFC. See the DESeq2 paper or vignette, and examine an MA plot.

"while genes ranked by p value are biased by genes with higher abundance and longer transcripts"

For this consideration, you can use goseq following DESeq2. This method is specially designed to address this problem. There are a few posts on the support site on how to use goseq after DESeq2. I haven't had time to do any comparative analysis on the best methods for gene-set testing after DESeq2. I think goseq is the downstream method that I see most often used.

I do like the idea of methods that use the LFC or t-test, and aggregate across the genes in the set, which allows one to detect, at the level of gene set, when there is an abundance of marginal signal for each gene. I haven't had time to implement something for DESeq2 LFCs, although it's something I'm thinking of. 

You might take a look also at the ROAST and CAMERA methods which are available in limma.

ADD COMMENTlink written 15 months ago by Michael Love14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 260 users visited in the last hour