Dear all,
My questions are about the interpretation of results between LRT and Wald test.
In my phenodata, there are two factors: “line” (3 levels in the following order: A, B and C) and “treatment” (2 levels: control and pollutant).
The design formula is written as follows:
m3 <- DESeqDataSetFromMatrix(countData = SC, colData = phenodata, design = ~ line + treatment + line:treatment)
1- LRTs analyses :
For all model analyses, I use the dispersion calculated from the initial full model m3 written above.
1.1 -The global treatment effect (across all lines) :
a<-nbinomLRT(m3, full= ~line + treatment, reduced= ~ line)
results(a)
log2 fold change (MLE): treatment pollutant vs control
LRT p-value: '~ line + treatment' vs '~ line'
DataFrame with 2596 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
Prot1 1.35029077421343 0.110099773339937 0.56420941937807 0.0380249630726155 0.845393021746716 1
Prot2 1.09110379087717 -0.303956103829664 0.575130622150613 0.280648118622068 0.596276731218334 1
Prot3 42.8582598378946 -0.047232281359725 0.0965362661213601 0.239286154847804 0.624722167336395 1
…
- The logFC is calculated for the last level of both factors, so it corresponds to the logFC of treated line C vs control line C. Is it correct ?
- The pvalues correspond to the global effect of the pollutant across all lines. Is it correct ?
1.2 The global line effect :
b<-nbinomLRT(m3, full= ~line + treatment, reduced= ~ treatment) res(b)
log2 fold change (MLE): treatment pollutant vs control
LRT p-value: '~ line + treatment' vs '~ treatment'
DataFrame with 2596 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
Prot1 1.35029077421343 0.110099773339937 0.56420941937807 0.533426184969613 0.765892780340315
Prot2 1.09110379087717 -0.303956103829664 0.575130622150613 0.927368054977961 0.628962257856992
Prot3 42.8582598378946 -0.047232281359725 0.0965362661213601 1.87892768732695 0.390837329103353
…
- The logFC is calculated for the last level of both factors, so it corresponds to the logFC of treated line C vs control line C, i.e., logFC are the same as in the previous command of section 1.1, is it correct ?
- The pvalues here correspond to the global “line” effect (i.e., not only the line C but A and B also) across all treatment attributes. Is it correct ?
1.3 - The global interaction effect :
c<-nbinomLRT(m3, full= ~line + treatment + treatment:line, reduced= ~ line +treatment) results(c)
log2 fold change (MLE): lineC.treatmentpollutant
LRT p-value: '~ line + treatment + treatment:line' vs '~ line + treatment'
DataFrame with 2596 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
Prot1 1.35029077421343 0.273519127005516 1.42292211159674 0.0388981336352145 0.980738846067103 1
Prot2 1.09110379087717 -0.236581829056236 1.41075467002527 0.20737551801291 0.901506740840279 1
Prot3 42.8582598378946 -0.0526182741727665 0.233486903301052 0.0535340911781361 0.973588016732097 1
- The logFCs correspond to the interaction term of lineC.treatmentpollutant only. Is it correct ?
- Do the pvalues here correspond to the global interaction effect ? (i.e., all interaction terms, i.e., not only the interaction effect of line C vs line A but take also into account the interaction of line B vs line A with pollutant and line B vs C with pollutant)?
2- Wald tests:
For all model analyses, I use the dispersion (link to the graph) calculated from the initial full model m3 written below.
2.1 - Pollutant effect for line A only (the main effect):
m3 <- DESeqDataSetFromMatrix(countData = SC, colData = phenodata, design = ~ line + treatment + line:treatment)
m3<- DESeq(m3, fitType='local')
plotDispEsts(m3)
results(m3, contrast=c("treatment", "control", "pollutant"), test="Wald")
log2 fold change (MLE): treatment_pollutant_vs_control effect
Wald test p-value: treatment_pollutant_vs_control effect
DataFrame with 2596 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
Prot1 1.35029077421343 -0.0111964302736192 1.01800777183796 -0.0109983740629058 0.991224744056979
Prot2 1.09110379087717 -0.326531759157502 1.08346183664078 -0.301378182520852 0.763126129203888
Prot3 42.8582598378946 -0.0163049704831442 0.168724478873716 -0.0966366622791459 0.923014940447818
…
- Here the log2FCs correspond to the contrast of pollutant vs control for the line A only (i.e., the main effect = the first level), even if line A is not specified, whereas logFCs correspond to the last level when using nbinomLRT() function (see above, e.g. section 1.1). Is it correct?
Source : ?results :
For analyses using the likelihood ratio test (using nbinomLRT) [… ] log2 fold change is printed in the results table can be controlled using the name argument, or by default this will be the estimated coefficient for the last element of resultsNames(object).
[…]
# the condition effect for genotype I (the main effect) results(dds, contrast=c("condition","B","A"))
(-> here the first level of “genotype” factor is “genotype I” and the test employed is the Wald test)
- The pvalues refer to the contrast of pollutant vs control for the line A only, and not to the global effect across line A, B and C as it is the case when using nbinomLRT() function of section 1.1 ?
2.1 - Pollutant effect for the line B only : (by default, test=”Wald”)
- Is it the pollutant effect for the line A (main effect) + the additional interaction effect of pollutant with line B ?
results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineB.treatmentpollutant")))
…
Source ?results :
# the condition effect for genotype III. # this is the main effect plus the interaction term # (the extra condition effect in genotype III compared to genotype I). results(dds, contrast=list( c("condition_B_vs_A","genotypeIII.conditionB") ))
- I wonder here why we do not have to add also the main line B effect :
results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineB.treatmentpollutant", "line_B_vs_A"))) ?
2.3 - Pollutant effect for the line C only :
results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineC.treatmentpollutant")))
…
2.4 - The interaction term for pollutant effect in line B vs A only:
resm3<- results(m3, contrast=list( c("lineB.treatmentpollutant")))
2.5 - The interaction term for pollutant effect in line C vs A only:
resm3<- results(m3, contrast=list( c("lineC.treatmentpollutant")))
2.6 - The Line B effect vs line A only (regardless "treatment" attributes):
results(m3, contrast=list( c("line_B_vs_A")))
Many thanks in advance for your help,
Best regards,
P. B.
Thank you so much for the quick reply.
You have summarized and answered some of my questions. I have written the details in each case to be sure to understand well and avoid any confusion.
To be sure, do you confirm that regarding each of my bullet-point questions, the response is "yes" ?
My role here, as I conceive of it, is to help users with questions about how to use the software, not to double check statistical analyses. I just don't have any time to do this. I would recommend working with a statistician if you are unsure on the meaning of the coefficients in the linear model here.
I understand perfectly. Your response was actually very clear. I sincerely thank you for your time and I apologize if my questions are above the scope of this website. It would just help me in the understanding of this software and how to use it properly. Many thanks in advance for any further help.