Difference in Wald and LRT test for two condition RNAseq data.
1
0
Entering edit mode
unique379 • 0
@unique379-10694
Last seen 4.7 years ago

Hi,

I am wondering which test is suitable for my case two condition i.e. C (3 replicates) and T (3 replicates)) and what is basically makes difference between "Wald" and "LRT" test implemented in DESeq2. I tested both with my dataset but i was surprised when i end up with much differences in terns of DE tags. Then i tested with example dataset and found there is really differences in both tests.

here i m presenting both tests using example of deseq and got differences in terms of DE tags:

# count tables from RNA-Seq data
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
cond <- factor(rep(1:2, each=5))
cond
# object construction
dds <- DESeqDataSetFromMatrix(cnts, DataFrame(cond), ~ cond)
# standard analysis (Default Wald test)
dds <- DESeq(dds)

res <- results(dds)

sum(na.omit(res$pvalue <= 0.05)) ###7 sum(na.omit(res$pvalue <= 0.05 & abs(res$log2FoldChange) >= 1)) ###4 sum(na.omit(res$pvalue <= 0.05 & abs(res$log2FoldChange) >= 1.5)) ###0 sum(na.omit(res$padj <= 0.05))
###0
sum(na.omit(res$padj <= 0.05 & abs(res$log2FoldChange) >= 1))
###0
sum(na.omit(res$padj <= 0.05 & abs(res$log2FoldChange) >= 1.5))
###0

# an alternate analysis: likelihood ratio test
ddsLRT <- DESeq(dds, test="LRT", reduced= ~ 1)

resLRT <- results(ddsLRT)

sum(na.omit(resLRT$pvalue <= 0.05)) ###7 sum(na.omit(resLRT$pvalue <= 0.05 & abs(resLRT$log2FoldChange) >= 1)) ###7 sum(na.omit(resLRT$pvalue <= 0.05 & abs(resLRT$log2FoldChange) >= 1.5)) ###2 sum(na.omit(resLRT$padj <= 0.05))
###0
sum(na.omit(resLRT$pvalue <= 0.05 & abs(resLRT$log2FoldChange) >= 1))
###7
sum(na.omit(resLRT$padj <= 0.05 & abs(resLRT$log2FoldChange) >= 1.5))
###0

**I read the manual and understand the concept behind the wald (applicable when we have two condition) and LRT test  (when we have more than 2 conditions and intent to check interaction between conditions). But what if i only have two condition like Male (15 or more replicate) and Female (15 replicates or more replicate).**
Then in this case what test would be more appropriate using DESeq2 ?

deseq2 • 6.5k views
2
Entering edit mode
@mikelove
Last seen 23 hours ago
United States

They really give very similar inference. I think the difference in a few genes you are seeing is because of the fold change thresholding.

Running DESeq() with test="LRT" turns off the shrinkage of log fold changes (see vignette, or DESeq2 paper for more on LFC shrinkage).

We recommend that you use results() with 'lfcThreshold' argument rather than the more arbitrary combination of a null hypothesis test of LFC=0 followed by LFC filtering. See the DESeq2 paper for more on this as well.

Either Wald or LRT is appropriate, it's just that the Wald pipeline incorporates LFC shrinkage which we see as a benefit.

0
Entering edit mode

1. Ok i understand that i have posted my question with very small example dataset. Now i am attaching a pdf where i have compared the wald and lrt test with my real dataset. I can see very big inference. Now do tell me which results (logFC, P-value and FDR) should i believe ?? As lrt given me 16 tags at FDR <0.05 with |LFC 1.5| while wald reported nothing. if LFC (LFC shrinkage) is the reason then what is the real LFC for these 16 tags ??

https://www.dropbox.com/s/lro215lng1pclr8/_Native_Wald_vs_LRT.pdf?dl=1

https://www.dropbox.com/s/ls5ccnih706om3c/16_tagsLRT.png?dl=1

2. is there difference between edgeR's  "glmLRT function" with LRT in results function of DESeq2?

0
Entering edit mode

From what you're showing, the two methods are giving almost identical results. The only major difference is the logFC values, which are smaller in the Wald test, due to LFC shrinkage. We can't tell you whether As Mike said, you should read the DESeq2 paper to learn more about LFC shrinkage, so you can decide whether it makes sense for your use case. It also looks like you're still not using the lfcThreshold argument to results, which is most likely what you want for testing against a non-zero log fold change.

As for edgeR's glmLRT function, it is nearly identical to the LRT in DESeq2. As far as I know, the only difference is that the two packages use slightly different methods of calculating the dispersions (and, by default, slightly different normalizations). But if you forced both packages to use the same normalization and dispersions, the should give you the same results for the LRT.

0
Entering edit mode

Hello again,

Following your advice, I have performed the analysis but I am wondering why LFC is not the same for the lfcThreshold argument to resultsas wald test has calculated. I understand, might p-value could be different but LFC can't be if I enable the option lfcThreshold in results function with LRT test.

lfcThreshold: If lfcThreshold is specified, the results are for Wald tests, and LRT p-values will be overwritten.

## Wald test

dds <- DESeq(dds)
resWald <- results(dds, cooksCutoff=T, independentFiltering = T)
#order all counts by FDR
resWald <- resWald[order(resWald$padj),] head(as.data.frame(resWald)) > head(as.data.frame(resWald)) baseMean log2FoldChange lfcSE stat pvalue ENSDARG00000074645 144.6382 1.2302354 0.2020148 6.089827 1.130329e-09 ENSDARG00000070494 160.7595 1.2515170 0.2098221 5.964657 2.451485e-09 ENSDARG00000059231 636.6678 0.9865171 0.1679251 5.874744 4.234984e-09 ENSDARG00000060202 412.5622 1.0071744 0.1702577 5.915588 3.306924e-09 ENSDARG00000091927 977.2865 1.0206422 0.1789931 5.702132 1.183180e-08 ENSDARG00000086641 337.4890 1.1354132 0.2019136 5.623264 1.873829e-08 padj ENSDARG00000074645 1.559003e-05 ENSDARG00000070494 1.559003e-05 ENSDARG00000059231 1.559003e-05 ENSDARG00000060202 1.559003e-05 ENSDARG00000091927 3.484466e-05 ENSDARG00000086641 4.017659e-05 ## LRT test dds <- DESeq(dds, test="LRT", reduced = ~ 1) resLRT <- results(dds, cooksCutoff=T, independentFiltering = T) #order all counts by FDR resLRT <- resLRT[order(resLRT$padj),]

baseMean log2FoldChange     lfcSE     stat       pvalue
ENSDARG00000074645 144.6382       1.555164 0.2563382 36.09718 1.877181e-09
ENSDARG00000070494 160.7595       1.629318 0.2740430 34.44181 4.391860e-09
ENSDARG00000059231 636.6678       1.134511 0.1932094 33.79412 6.126381e-09
ENSDARG00000060202 412.5622       1.164194 0.1969721 34.29987 4.724122e-09
ENSDARG00000091927 977.2865       1.203468 0.2111049 31.70260 1.796814e-08
ENSDARG00000086641 337.4890       1.432763 0.2549518 30.62353 3.132711e-08
ENSDARG00000074645 2.201974e-05
ENSDARG00000070494 2.201974e-05
ENSDARG00000059231 2.201974e-05
ENSDARG00000060202 2.201974e-05
ENSDARG00000091927 5.166558e-05
ENSDARG00000086641 6.434140e-05

## LRT test with lfcThreshold = 0
dds <- DESeq(dds, test="LRT", reduced = ~ 1)
resLRT_lfcThreshold <- results(dds, lfcThreshold = 0, cooksCutoff=T, independentFiltering = T)
#order all counts by FDR
baseMean log2FoldChange     lfcSE     stat       pvalue
ENSDARG00000074645 144.6382       1.555164 0.2563382 36.09718 1.877181e-09
ENSDARG00000070494 160.7595       1.629318 0.2740430 34.44181 4.391860e-09
ENSDARG00000059231 636.6678       1.134511 0.1932094 33.79412 6.126381e-09
ENSDARG00000060202 412.5622       1.164194 0.1969721 34.29987 4.724122e-09
ENSDARG00000091927 977.2865       1.203468 0.2111049 31.70260 1.796814e-08
ENSDARG00000086641 337.4890       1.432763 0.2549518 30.62353 3.132711e-08
ENSDARG00000074645 2.201974e-05
ENSDARG00000070494 2.201974e-05
ENSDARG00000059231 2.201974e-05
ENSDARG00000060202 2.201974e-05
ENSDARG00000091927 5.166558e-05
ENSDARG00000086641 6.434140e-05

0
Entering edit mode
It's how DESeq() is called that matters. Again, running DESeq() with test="LRT" turns off the shrinkage of log fold changes. So you have un-shrunken LFCs even though you are performing a Wald test. This information is actually reported to you if you just print the results table in R. It will say either MAP or MLE, for shrunken or un-shrunken respectively (see paper for details) > res