large range of LFC in DESeq2
1
0
Entering edit mode
Mariaxi • 0
@mariaxi-18596
Last seen 3.9 years ago

Hi,

I am new to RNASeq and learning by reading the tutorials for DESEq2. When I ran my first DE analysis, my Log2FC range from -26 to 26, which I hadn't seen before. When I mentioned this fact to experienced bioinformaticians in my company, they all agreed that this was odd; however, no one has the time to take a look at my data. Reading more about how DESeq2 calculates LFC and looking at the genes which have this large LFCs I realized this is due to one condition having many reads (e.g. ~80 - normalized counts) while the other condition has 0 reads. I came across a post in Bioconductor that suggested using betaPrior=TRUE while running DESeq and that reduced my LFC range (-3 to 3). What is the correct way to report the DE for these genes?

This is the code I used:

# check that both dataset have the same samples and in the same order
all(rownames(meta) %in% colnames(data))

all(rownames(meta1) == colnames(data1))

dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ gender + condition)

dds <- DESeq(dds)

res <- results(dds)

When I type res, I get the following:

> res
log2 fold change (MLE): contion 2 vs 1
Wald test p-value: condition 2 vs 1
DataFrame with 35118 rows and 6 columns

Example of extreme LFC:

baseMean log2FoldChange lfcSE stat pvalue padj

gene1

27.1703929

-26.086127

2.3313637

-11.1892141

4.604968e-29

5.260255e-25

gene2

26.1996811

-26.085583

2.3469943

-11.1144639

1.066896e-28

8.124766e-25

gene3

36.8962467

-8.289083

3.6075064

-2.2977320

2.157705e-02

1.149870e-01

normalized count for these genes:

samples in condition 1 are labeled 1_# and samples in condition 2 are labeled 2_#
 11 12 13 14 15 16 17 1_8 gene1 80.0311 0 80.5786 89.5606 89.4903 66.462 78.72 0 gene2 74.9497 0 70.7785 74.4042 97.7193 80.1764 55.76 0 gene3 287.096 251.657 0 345.842 0 0 0 0

 2_1 2_2 2_3 2_4 2_5 2_6 2_7 2_8 gene1 0 0 0 0 0 0 0 0 gene2 0 0 0 0 0 0 0 0 gene3 0 0 0 0 0 0 0 0

When I run DESeq with betaPrior = TRUE:

dds2 <- DESeq(dds1, betaPrior = TRUE)
res2 <- results(dds2)

log2 fold change (MAP): condition 2 vs 1
Wald test p-value: condition 2 vs 1
DataFrame with 35118 rows and 6 columns

the LFC for the same genes is now:

baseMean log2FoldChange lfcSE stat pvalue padj

gene1

27.17039

-0.2768540

0.1798314

-1.539520

0.1236774

0.3430296

gene2

26.19968

-0.2651210

0.1787147

-1.483487

0.1379451

0.3679503

gene3

36.89625

-0.1502297

0.1187575

-1.265012

0.2058670

0.4613731

But now my genes don't make the cutoff for being differentially expressed. Any guidance is appreciated.

Let me know if I need to provide more data or clarification.

Thanks!

deseq2 lfc betaprior • 682 views
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

In the current version of DESeq2, the following paradigm is suggested:

dds <- DESeq(dds)
res <- lfcShrink(dds, coef="condition_B_vs_A", type="apeglm")

The second line runs results() and then LFC shrinkage with a method, apeglm, that beats our earlier implementation, which is now called "normal" because it uses a Normal prior for the LFC. We have more details in the vignette.

0
Entering edit mode

Thanks, Mike! I guess I was looking at a tutorial that was for beginners but a bit old. I have found the latest version of the full vignette.

0
Entering edit mode

Edit: I see that you are recommending setting svalue=TRUE for dealing with the low p, low lfc situations here, which works. However, apeglm doesn't seem to support contrasts. Do you have a suggestion for adjusting LFCs and pvalues (s-values) for contrasts?

Original question:

The lfcShrink solution indeed fixes the exaggerated LFC values but it has no effect on p-values as they are not re-calculated. So I am ending up with very small LFCs having very small p-values in a similar situation. For example LFC=0.04, p=2e-9. Do you have any suggestions on this?

In my case, the very large original LFCs seem to be stemming from samples that are in the DDS object but not in a given comparison. If I create a DDS only with samples in that comparison I do not see the same large LFCs. The second reason I am suspecting the samples that are not in the comparison is that lfcShrink with apeglm gives reasonable LFCs but ashr does not, and I think apeglm only considers the samples in the comparison while ashr considers all samples, although I might be mistaken in this.

I didn't want to start a new question because I think the issue is the same but I'd be happy to provide more details in a separate question if it is more appropriate.

0
Entering edit mode

You can use lfcThreshold, either in results or with ashr in lfcShrink which can be used with contrasts.

0
Entering edit mode

Michael Love Thanks for your response.

The problem is the artefactually inflated LFCs. While apeglm shrinks these LFCs, results or ashr does not. So passing a small (or even moderately large) lfcThreshold values to these functions does not solve the issue.

0
Entering edit mode

If ashr doesn't shrink these, are you sure they are artifact? It's hard to say for sure. Maybe filtering number of positive counts is another approach to help prioritize genes.