Search
Question: large range of LFC in DESeq2
0
gravatar for Mariaxi
12 days ago by
Mariaxi0
Mariaxi0 wrote:

Hi,

I am new to RNASeq and learning by reading the tutorials for DESEq2. When I ran my first DE analysis, my Log2FC range from -26 to 26, which I hadn't seen before. When I mentioned this fact to experienced bioinformaticians in my company, they all agreed that this was odd; however, no one has the time to take a look at my data. Reading more about how DESeq2 calculates LFC and looking at the genes which have this large LFCs I realized this is due to one condition having many reads (e.g. ~80 - normalized counts) while the other condition has 0 reads. I came across a post in Bioconductor that suggested using betaPrior=TRUE while running DESeq and that reduced my LFC range (-3 to 3). What is the correct way to report the DE for these genes? 

This is the code I used:

data <- read.table("gene-count.txt", header = T, row.names = 1, check.names=F)
meta <- read.table("meta/Meta_ordered.txt", header = T, row.names = 1)

# check that both dataset have the same samples and in the same order
all(rownames(meta) %in% colnames(data))

all(rownames(meta1) == colnames(data1))

dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ gender + condition)

dds <- DESeq(dds)

res <- results(dds)

When I type res, I get the following:

> res
log2 fold change (MLE): contion 2 vs 1 
Wald test p-value: condition 2 vs 1 
DataFrame with 35118 rows and 6 columns

Example of extreme LFC:

baseMean log2FoldChange lfcSE stat pvalue padj
             

gene1

27.1703929

-26.086127

2.3313637

-11.1892141

4.604968e-29

5.260255e-25

gene2

26.1996811

-26.085583

2.3469943

-11.1144639

1.066896e-28

8.124766e-25

gene3

36.8962467

-8.289083

3.6075064

-2.2977320

2.157705e-02

1.149870e-01

 

normalized count for these genes:

samples in condition 1 are labeled 1_# and samples in condition 2 are labeled 2_#
 

1_1

1_2 1_3 1_4 1_5 1_6 1_7 1_8

gene1

80.03106

0

80.57861

89.56059

89.49031

66.46199

78.72004

0

gene2

74.94973

0

70.77851

74.40419

97.7193

80.17636

55.76003

0

gene3

287.0956

251.6573

0

345.8417

0

0

0

0

 

  2_1 2_2 2_3 2_4 2_5 2_6 2_7 2_8

gene1

0

0

0

0

0

0

0

0

gene2

0

0

0

0

0

0

0

0

gene3

0

0

0

0

0

0

0

0

 

When I run DESeq with betaPrior = TRUE:

dds2 <- DESeq(dds1, betaPrior = TRUE)
res2 <- results(dds2)

log2 fold change (MAP): condition 2 vs 1 
Wald test p-value: condition 2 vs 1 
DataFrame with 35118 rows and 6 columns

 

the LFC for the same genes is now:

baseMean log2FoldChange lfcSE stat pvalue padj
             

gene1

27.17039

-0.2768540

0.1798314

-1.539520

0.1236774

0.3430296

gene2

26.19968

-0.2651210

0.1787147

-1.483487

0.1379451

0.3679503

gene3

36.89625

-0.1502297

0.1187575

-1.265012

0.2058670

0.4613731

But now my genes don't make the cutoff for being differentially expressed. Any guidance is appreciated.

Let me know if I need to provide more data or clarification. 

 

Thanks! 

 

ADD COMMENTlink modified 12 days ago by Michael Love20k • written 12 days ago by Mariaxi0
0
gravatar for Michael Love
12 days ago by
Michael Love20k
United States
Michael Love20k wrote:

In the current version of DESeq2, the following paradigm is suggested:

dds <- DESeq(dds)
res <- lfcShrink(dds, coef="condition_B_vs_A", type="apeglm")

The second line runs results() and then LFC shrinkage with a method, apeglm, that beats our earlier implementation, which is now called "normal" because it uses a Normal prior for the LFC. We have more details in the vignette.

 

ADD COMMENTlink written 12 days ago by Michael Love20k

Thanks, Mike! I guess I was looking at a tutorial that was for beginners but a bit old. I have found the latest version of the full vignette.

ADD REPLYlink written 12 days ago by Mariaxi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 365 users visited in the last hour