Question

large range of LFC in DESeq2

0

Entering edit mode

Mariaxi • 0

@mariaxi-18596

Last seen 5.3 years ago

Hi,

I am new to RNASeq and learning by reading the tutorials for DESEq2. When I ran my first DE analysis, my Log2FC range from -26 to 26, which I hadn't seen before. When I mentioned this fact to experienced bioinformaticians in my company, they all agreed that this was odd; however, no one has the time to take a look at my data. Reading more about how DESeq2 calculates LFC and looking at the genes which have this large LFCs I realized this is due to one condition having many reads (e.g. ~80 - normalized counts) while the other condition has 0 reads. I came across a post in Bioconductor that suggested using betaPrior=TRUE while running DESeq and that reduced my LFC range (-3 to 3). What is the correct way to report the DE for these genes?

This is the code I used:

data <- read.table("gene-count.txt", header = T, row.names = 1, check.names=F)
meta <- read.table("meta/Meta_ordered.txt", header = T, row.names = 1)

# check that both dataset have the same samples and in the same order
all(rownames(meta) %in% colnames(data))

all(rownames(meta1) == colnames(data1))

dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ gender + condition)

dds <- DESeq(dds)

res <- results(dds)

When I type res, I get the following:

> res
log2 fold change (MLE): contion 2 vs 1 
Wald test p-value: condition 2 vs 1 
DataFrame with 35118 rows and 6 columns

Example of extreme LFC:

baseMean	log2FoldChange	lfcSE	stat	pvalue	padj


gene1	27.1703929	-26.086127	2.3313637	-11.1892141	4.604968e-29	5.260255e-25
gene2	26.1996811	-26.085583	2.3469943	-11.1144639	1.066896e-28	8.124766e-25
gene3	36.8962467	-8.289083	3.6075064	-2.2977320	2.157705e-02	1.149870e-01

normalized count for these genes:

samples in condition 1 are labeled 1_# and samples in condition 2 are labeled 2_#

	1_1	1_2	1_3	1_4	1_5	1_6	1_7	1_8
gene1	80.03106	0	80.57861	89.56059	89.49031	66.46199	78.72004	0
gene2	74.94973	0	70.77851	74.40419	97.7193	80.17636	55.76003	0
gene3	287.0956	251.6573	0	345.8417	0	0	0	0

	2_1	2_2	2_3	2_4	2_5	2_6	2_7	2_8
gene1	0	0	0	0	0	0	0	0
gene2	0	0	0	0	0	0	0	0
gene3	0	0	0	0	0	0	0	0

When I run DESeq with betaPrior = TRUE:

dds2 <- DESeq(dds1, betaPrior = TRUE)
res2 <- results(dds2)

log2 fold change (MAP): condition 2 vs 1 
Wald test p-value: condition 2 vs 1 
DataFrame with 35118 rows and 6 columns

the LFC for the same genes is now:

baseMean	log2FoldChange	lfcSE	stat	pvalue	padj


gene1	27.17039	-0.2768540	0.1798314	-1.539520	0.1236774	0.3430296
gene2	26.19968	-0.2651210	0.1787147	-1.483487	0.1379451	0.3679503
gene3	36.89625	-0.1502297	0.1187575	-1.265012	0.2058670	0.4613731

But now my genes don't make the cutoff for being differentially expressed. Any guidance is appreciated.

Let me know if I need to provide more data or clarification.

Thanks!

deseq2 lfc betaprior • 1.5k views

ADD COMMENT • link updated 18 months ago by Michael Love 41k • written 5.4 years ago by Mariaxi • 0

score 0 · Answer 1 · 2018-12-04

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

In the current version of DESeq2, the following paradigm is suggested:

dds <- DESeq(dds)
res <- lfcShrink(dds, coef="condition_B_vs_A", type="apeglm")

The second line runs results() and then LFC shrinkage with a method, apeglm, that beats our earlier implementation, which is now called "normal" because it uses a Normal prior for the LFC. We have more details in the vignette.

ADD COMMENT • link 5.4 years ago Michael Love 41k

0

Entering edit mode

Thanks, Mike! I guess I was looking at a tutorial that was for beginners but a bit old. I have found the latest version of the full vignette.

ADD REPLY • link 5.4 years ago Mariaxi • 0

0

Entering edit mode

Hi Michael Love

Edit: I see that you are recommending setting svalue=TRUE for dealing with the low p, low lfc situations here, which works. However, apeglm doesn't seem to support contrasts. Do you have a suggestion for adjusting LFCs and pvalues (s-values) for contrasts?

Original question:

The lfcShrink solution indeed fixes the exaggerated LFC values but it has no effect on p-values as they are not re-calculated. So I am ending up with very small LFCs having very small p-values in a similar situation. For example LFC=0.04, p=2e-9. Do you have any suggestions on this?

In my case, the very large original LFCs seem to be stemming from samples that are in the DDS object but not in a given comparison. If I create a DDS only with samples in that comparison I do not see the same large LFCs. The second reason I am suspecting the samples that are not in the comparison is that lfcShrink with apeglm gives reasonable LFCs but ashr does not, and I think apeglm only considers the samples in the comparison while ashr considers all samples, although I might be mistaken in this.

I didn't want to start a new question because I think the issue is the same but I'd be happy to provide more details in a separate question if it is more appropriate.

Thanks in advance, Ozkan

ADD REPLY • link 18 months ago ozkan_aydemir • 0

0

Entering edit mode

You can use lfcThreshold, either in results or with ashr in lfcShrink which can be used with contrasts.

ADD REPLY • link 18 months ago Michael Love 41k

0

Entering edit mode

Michael Love Thanks for your response.

The problem is the artefactually inflated LFCs. While apeglm shrinks these LFCs, results or ashr does not. So passing a small (or even moderately large) lfcThreshold values to these functions does not solve the issue.

ADD REPLY • link 18 months ago ozkan_aydemir • 0

0

Entering edit mode

If ashr doesn't shrink these, are you sure they are artifact? It's hard to say for sure. Maybe filtering number of positive counts is another approach to help prioritize genes.

ADD REPLY • link 18 months ago Michael Love 41k