Question: Difference in Wald and LRT test for two condition RNAseq data.
0
3.3 years ago by
unique3790 wrote:

Hi,

I am wondering which test is suitable for my case two condition i.e. C (3 replicates) and T (3 replicates)) and what is basically makes difference between "Wald" and "LRT" test implemented in DESeq2. I tested both with my dataset but i was surprised when i end up with much differences in terns of DE tags. Then i tested with example dataset and found there is really differences in both tests.

here i m presenting both tests using example of deseq and got differences in terms of DE tags:

# count tables from RNA-Seq data
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
cond <- factor(rep(1:2, each=5))
cond
# object construction
dds <- DESeqDataSetFromMatrix(cnts, DataFrame(cond), ~ cond)
# standard analysis (Default Wald test)
dds <- DESeq(dds)

res <- results(dds)

sum(na.omit(res$pvalue <= 0.05)) ###7 sum(na.omit(res$pvalue <= 0.05 & abs(res$log2FoldChange) >= 1)) ###4 sum(na.omit(res$pvalue <= 0.05 & abs(res$log2FoldChange) >= 1.5)) ###0 sum(na.omit(res$padj <= 0.05))
###0
sum(na.omit(res$padj <= 0.05 & abs(res$log2FoldChange) >= 1))
###0
sum(na.omit(res$padj <= 0.05 & abs(res$log2FoldChange) >= 1.5))
###0

# an alternate analysis: likelihood ratio test
ddsLRT <- DESeq(dds, test="LRT", reduced= ~ 1)

resLRT <- results(ddsLRT)

sum(na.omit(resLRT$pvalue <= 0.05)) ###7 sum(na.omit(resLRT$pvalue <= 0.05 & abs(resLRT$log2FoldChange) >= 1)) ###7 sum(na.omit(resLRT$pvalue <= 0.05 & abs(resLRT$log2FoldChange) >= 1.5)) ###2 sum(na.omit(resLRT$padj <= 0.05))
###0
sum(na.omit(resLRT$pvalue <= 0.05 & abs(resLRT$log2FoldChange) >= 1))
###7
sum(na.omit(resLRT$padj <= 0.05 & abs(resLRT$log2FoldChange) >= 1.5))
###0

**I read the manual and understand the concept behind the wald (applicable when we have two condition) and LRT test  (when we have more than 2 conditions and intent to check interaction between conditions). But what if i only have two condition like Male (15 or more replicate) and Female (15 replicates or more replicate).**
Then in this case what test would be more appropriate using DESeq2 ?

deseq2 • 4.8k views
modified 3.3 years ago by Michael Love25k • written 3.3 years ago by unique3790
Answer: Difference in Wald and LRT test for two conditional RNAseq data.
2
3.3 years ago by
Michael Love25k
United States
Michael Love25k wrote:

They really give very similar inference. I think the difference in a few genes you are seeing is because of the fold change thresholding.

Running DESeq() with test="LRT" turns off the shrinkage of log fold changes (see vignette, or DESeq2 paper for more on LFC shrinkage).

We recommend that you use results() with 'lfcThreshold' argument rather than the more arbitrary combination of a null hypothesis test of LFC=0 followed by LFC filtering. See the DESeq2 paper for more on this as well.

Either Wald or LRT is appropriate, it's just that the Wald pipeline incorporates LFC shrinkage which we see as a benefit.

1. Ok i understand that i have posted my question with very small example dataset. Now i am attaching a pdf where i have compared the wald and lrt test with my real dataset. I can see very big inference. Now do tell me which results (logFC, P-value and FDR) should i believe ?? As lrt given me 16 tags at FDR <0.05 with |LFC 1.5| while wald reported nothing. if LFC (LFC shrinkage) is the reason then what is the real LFC for these 16 tags ??

https://www.dropbox.com/s/lro215lng1pclr8/_Native_Wald_vs_LRT.pdf?dl=1

https://www.dropbox.com/s/ls5ccnih706om3c/16_tagsLRT.png?dl=1

2. is there difference between edgeR's  "glmLRT function" with LRT in results function of DESeq2?

From what you're showing, the two methods are giving almost identical results. The only major difference is the logFC values, which are smaller in the Wald test, due to LFC shrinkage. We can't tell you whether As Mike said, you should read the DESeq2 paper to learn more about LFC shrinkage, so you can decide whether it makes sense for your use case. It also looks like you're still not using the lfcThreshold argument to results, which is most likely what you want for testing against a non-zero log fold change.

As for edgeR's glmLRT function, it is nearly identical to the LRT in DESeq2. As far as I know, the only difference is that the two packages use slightly different methods of calculating the dispersions (and, by default, slightly different normalizations). But if you forced both packages to use the same normalization and dispersions, the should give you the same results for the LRT.

Hello again,

Following your advice, I have performed the analysis but I am wondering why LFC is not the same for the lfcThreshold argument to resultsas wald test has calculated. I understand, might p-value could be different but LFC can't be if I enable the option lfcThreshold in results function with LRT test.

lfcThreshold: If lfcThreshold is specified, the results are for Wald tests, and LRT p-values will be overwritten.

## Wald test

dds <- DESeq(dds)
resWald <- results(dds, cooksCutoff=T, independentFiltering = T)
#order all counts by FDR
resWald <- resWald[order(resWald$padj),] head(as.data.frame(resWald)) > head(as.data.frame(resWald)) baseMean log2FoldChange lfcSE stat pvalue ENSDARG00000074645 144.6382 1.2302354 0.2020148 6.089827 1.130329e-09 ENSDARG00000070494 160.7595 1.2515170 0.2098221 5.964657 2.451485e-09 ENSDARG00000059231 636.6678 0.9865171 0.1679251 5.874744 4.234984e-09 ENSDARG00000060202 412.5622 1.0071744 0.1702577 5.915588 3.306924e-09 ENSDARG00000091927 977.2865 1.0206422 0.1789931 5.702132 1.183180e-08 ENSDARG00000086641 337.4890 1.1354132 0.2019136 5.623264 1.873829e-08 padj ENSDARG00000074645 1.559003e-05 ENSDARG00000070494 1.559003e-05 ENSDARG00000059231 1.559003e-05 ENSDARG00000060202 1.559003e-05 ENSDARG00000091927 3.484466e-05 ENSDARG00000086641 4.017659e-05 ## LRT test dds <- DESeq(dds, test="LRT", reduced = ~ 1) resLRT <- results(dds, cooksCutoff=T, independentFiltering = T) #order all counts by FDR resLRT <- resLRT[order(resLRT$padj),]

baseMean log2FoldChange     lfcSE     stat       pvalue
ENSDARG00000074645 144.6382       1.555164 0.2563382 36.09718 1.877181e-09
ENSDARG00000070494 160.7595       1.629318 0.2740430 34.44181 4.391860e-09
ENSDARG00000059231 636.6678       1.134511 0.1932094 33.79412 6.126381e-09
ENSDARG00000060202 412.5622       1.164194 0.1969721 34.29987 4.724122e-09
ENSDARG00000091927 977.2865       1.203468 0.2111049 31.70260 1.796814e-08
ENSDARG00000086641 337.4890       1.432763 0.2549518 30.62353 3.132711e-08
ENSDARG00000074645 2.201974e-05
ENSDARG00000070494 2.201974e-05
ENSDARG00000059231 2.201974e-05
ENSDARG00000060202 2.201974e-05
ENSDARG00000091927 5.166558e-05
ENSDARG00000086641 6.434140e-05

## LRT test with lfcThreshold = 0
dds <- DESeq(dds, test="LRT", reduced = ~ 1)
resLRT_lfcThreshold <- results(dds, lfcThreshold = 0, cooksCutoff=T, independentFiltering = T)
#order all counts by FDR
baseMean log2FoldChange     lfcSE     stat       pvalue
ENSDARG00000074645 144.6382       1.555164 0.2563382 36.09718 1.877181e-09
ENSDARG00000070494 160.7595       1.629318 0.2740430 34.44181 4.391860e-09
ENSDARG00000059231 636.6678       1.134511 0.1932094 33.79412 6.126381e-09
ENSDARG00000060202 412.5622       1.164194 0.1969721 34.29987 4.724122e-09
ENSDARG00000091927 977.2865       1.203468 0.2111049 31.70260 1.796814e-08
ENSDARG00000086641 337.4890       1.432763 0.2549518 30.62353 3.132711e-08
ENSDARG00000074645 2.201974e-05
ENSDARG00000070494 2.201974e-05
ENSDARG00000059231 2.201974e-05
ENSDARG00000060202 2.201974e-05
ENSDARG00000091927 5.166558e-05
ENSDARG00000086641 6.434140e-05