large range of LFC in DESeq2
1
0
Entering edit mode
Mariaxi • 0
@mariaxi-18596
Last seen 5.9 years ago

Hi,

I am new to RNASeq and learning by reading the tutorials for DESEq2. When I ran my first DE analysis, my Log2FC range from -26 to 26, which I hadn't seen before. When I mentioned this fact to experienced bioinformaticians in my company, they all agreed that this was odd; however, no one has the time to take a look at my data. Reading more about how DESeq2 calculates LFC and looking at the genes which have this large LFCs I realized this is due to one condition having many reads (e.g. ~80 - normalized counts) while the other condition has 0 reads. I came across a post in Bioconductor that suggested using betaPrior=TRUE while running DESeq and that reduced my LFC range (-3 to 3). What is the correct way to report the DE for these genes? 

This is the code I used:

data <- read.table("gene-count.txt", header = T, row.names = 1, check.names=F)
meta <- read.table("meta/Meta_ordered.txt", header = T, row.names = 1)

# check that both dataset have the same samples and in the same order
all(rownames(meta) %in% colnames(data))

all(rownames(meta1) == colnames(data1))

dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ gender + condition)

dds <- DESeq(dds)

res <- results(dds)

When I type res, I get the following:

> res
log2 fold change (MLE): contion 2 vs 1 
Wald test p-value: condition 2 vs 1 
DataFrame with 35118 rows and 6 columns

Example of extreme LFC:

baseMean log2FoldChange lfcSE stat pvalue padj
             

gene1

27.1703929

-26.086127

2.3313637

-11.1892141

4.604968e-29

5.260255e-25

gene2

26.1996811

-26.085583

2.3469943

-11.1144639

1.066896e-28

8.124766e-25

gene3

36.8962467

-8.289083

3.6075064

-2.2977320

2.157705e-02

1.149870e-01

 

normalized count for these genes:

samples in condition 1 are labeled 1_# and samples in condition 2 are labeled 2_#
 

1_1

1_2 1_3 1_4 1_5 1_6 1_7 1_8

gene1

80.03106

0

80.57861

89.56059

89.49031

66.46199

78.72004

0

gene2

74.94973

0

70.77851

74.40419

97.7193

80.17636

55.76003

0

gene3

287.0956

251.6573

0

345.8417

0

0

0

0

 

  2_1 2_2 2_3 2_4 2_5 2_6 2_7 2_8

gene1

0

0

0

0

0

0

0

0

gene2

0

0

0

0

0

0

0

0

gene3

0

0

0

0

0

0

0

0

 

When I run DESeq with betaPrior = TRUE:

dds2 <- DESeq(dds1, betaPrior = TRUE)
res2 <- results(dds2)

log2 fold change (MAP): condition 2 vs 1 
Wald test p-value: condition 2 vs 1 
DataFrame with 35118 rows and 6 columns

 

the LFC for the same genes is now:

baseMean log2FoldChange lfcSE stat pvalue padj
             

gene1

27.17039

-0.2768540

0.1798314

-1.539520

0.1236774

0.3430296

gene2

26.19968

-0.2651210

0.1787147

-1.483487

0.1379451

0.3679503

gene3

36.89625

-0.1502297

0.1187575

-1.265012

0.2058670

0.4613731

But now my genes don't make the cutoff for being differentially expressed. Any guidance is appreciated.

Let me know if I need to provide more data or clarification. 

 

Thanks! 

 

deseq2 lfc betaprior • 1.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

In the current version of DESeq2, the following paradigm is suggested:

dds <- DESeq(dds)
res <- lfcShrink(dds, coef="condition_B_vs_A", type="apeglm")

The second line runs results() and then LFC shrinkage with a method, apeglm, that beats our earlier implementation, which is now called "normal" because it uses a Normal prior for the LFC. We have more details in the vignette.

 

ADD COMMENT
0
Entering edit mode

Thanks, Mike! I guess I was looking at a tutorial that was for beginners but a bit old. I have found the latest version of the full vignette.

ADD REPLY
0
Entering edit mode

Hi Michael Love

Edit: I see that you are recommending setting svalue=TRUE for dealing with the low p, low lfc situations here, which works. However, apeglm doesn't seem to support contrasts. Do you have a suggestion for adjusting LFCs and pvalues (s-values) for contrasts?

Original question:

The lfcShrink solution indeed fixes the exaggerated LFC values but it has no effect on p-values as they are not re-calculated. So I am ending up with very small LFCs having very small p-values in a similar situation. For example LFC=0.04, p=2e-9. Do you have any suggestions on this?

In my case, the very large original LFCs seem to be stemming from samples that are in the DDS object but not in a given comparison. If I create a DDS only with samples in that comparison I do not see the same large LFCs. The second reason I am suspecting the samples that are not in the comparison is that lfcShrink with apeglm gives reasonable LFCs but ashr does not, and I think apeglm only considers the samples in the comparison while ashr considers all samples, although I might be mistaken in this.

I didn't want to start a new question because I think the issue is the same but I'd be happy to provide more details in a separate question if it is more appropriate.

Thanks in advance, Ozkan

ADD REPLY
0
Entering edit mode

You can use lfcThreshold, either in results or with ashr in lfcShrink which can be used with contrasts.

ADD REPLY
0
Entering edit mode

Michael Love Thanks for your response.

The problem is the artefactually inflated LFCs. While apeglm shrinks these LFCs, results or ashr does not. So passing a small (or even moderately large) lfcThreshold values to these functions does not solve the issue.

ADD REPLY
0
Entering edit mode

If ashr doesn't shrink these, are you sure they are artifact? It's hard to say for sure. Maybe filtering number of positive counts is another approach to help prioritize genes.

ADD REPLY

Login before adding your answer.

Traffic: 886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6