Question: Interpretation of logFC and pvalues: LRT vs Wald tests in DESeq2
0
14 days ago by
paul.boisseaux0 wrote:

Dear all,

My questions are about the interpretation of results between LRT and Wald test.

In my phenodata, there are two factors: “line” (3 levels in the following order: A, B and C) and “treatment” (2 levels: control and pollutant).

The design formula is written as follows:

m3 <- DESeqDataSetFromMatrix(countData = SC,

colData = phenodata,

design = ~ line + treatment + line:treatment)

# 1- LRTs analyses :

For all model analyses, I use the dispersion calculated from the initial full model m3 written above.

## 1.1 -The global treatment effect (across all lines) :

a<-nbinomLRT(m3, full= ~line + treatment, reduced= ~ line)

results(a)

log2 fold change (MLE): treatment pollutant vs control

LRT p-value: '~ line + treatment' vs '~ line'

DataFrame with 2596 rows and 6 columns

baseMean     log2FoldChange              lfcSE                stat            pvalue      padj

<numeric>          <numeric>          <numeric>           <numeric>         <numeric> <numeric>

Prot1  1.35029077421343  0.110099773339937   0.56420941937807  0.0380249630726155 0.845393021746716         1
Prot2  1.09110379087717 -0.303956103829664  0.575130622150613   0.280648118622068 0.596276731218334         1
Prot3  42.8582598378946 -0.047232281359725 0.0965362661213601   0.239286154847804 0.624722167336395         1

• The logFC is calculated for the last level of both factors, so it corresponds to the logFC of treated line C vs control line C. Is it correct ?
• The pvalues correspond to the global effect of the pollutant across all lines. Is it correct ?

## 1.2 The global line effect :

b<-nbinomLRT(m3, full= ~line + treatment, reduced= ~ treatment)

res(b)
log2 fold change (MLE): treatment pollutant vs control

LRT p-value: '~ line + treatment' vs '~ treatment'

DataFrame with 2596 rows and 6 columns

                           baseMean     log2FoldChange              lfcSE               stat            pvalue

                          <numeric>          <numeric>          <numeric>          <numeric>         <numeric>

Prot1  1.35029077421343  0.110099773339937   0.56420941937807  0.533426184969613 0.765892780340315

Prot2  1.09110379087717 -0.303956103829664  0.575130622150613  0.927368054977961 0.628962257856992

Prot3  42.8582598378946 -0.047232281359725 0.0965362661213601   1.87892768732695 0.390837329103353


• The logFC is calculated for the last level of both factors, so it corresponds to the logFC of treated line C vs control line C, i.e., logFC are the same as in the previous command of section 1.1, is it correct ?
• The pvalues here correspond to the global “line” effect (i.e., not only the line C but A and B also) across all treatment attributes. Is it correct ?

## 1.3 - The global interaction effect :

c<-nbinomLRT(m3, full= ~line + treatment + treatment:line, reduced= ~ line +treatment)

results(c)
log2 fold change (MLE): lineC.treatmentpollutant

LRT p-value: '~ line + treatment + treatment:line' vs '~ line + treatment'

DataFrame with 2596 rows and 6 columns

                           baseMean      log2FoldChange             lfcSE                stat            pvalue      padj

                          <numeric>           <numeric>         <numeric>           <numeric>         <numeric> <numeric>

Prot1  1.35029077421343   0.273519127005516  1.42292211159674  0.0388981336352145 0.980738846067103         1

Prot2  1.09110379087717  -0.236581829056236  1.41075467002527    0.20737551801291 0.901506740840279         1

Prot3  42.8582598378946 -0.0526182741727665 0.233486903301052  0.0535340911781361 0.973588016732097         1

• The logFCs correspond to the interaction term of lineC.treatmentpollutant only. Is it correct ?
• Do the pvalues here correspond to the global interaction effect ? (i.e., all interaction terms, i.e., not only the interaction effect of line C vs line A but take also into account the interaction of line B vs line A with pollutant and line B vs C with pollutant)?

# 2- Wald tests:

For all model analyses, I use the dispersion (link to the graph) calculated from the initial full model m3 written below.

## 2.1 - Pollutant effect for line A only (the main effect):

m3 <- DESeqDataSetFromMatrix(countData = SC, colData = phenodata, design = ~ line + treatment + line:treatment)

m3<- DESeq(m3, fitType='local')

plotDispEsts(m3)

results(m3, contrast=c("treatment", "control", "pollutant"), test="Wald")


log2 fold change (MLE): treatment_pollutant_vs_control effect

Wald test p-value: treatment_pollutant_vs_control effect

DataFrame with 2596 rows and 6 columns

                           baseMean      log2FoldChange             lfcSE                 stat            pvalue

                          <numeric>           <numeric>         <numeric>            <numeric>         <numeric>

Prot1  1.35029077421343 -0.0111964302736192  1.01800777183796  -0.0109983740629058 0.991224744056979

Prot2  1.09110379087717  -0.326531759157502  1.08346183664078   -0.301378182520852 0.763126129203888

Prot3  42.8582598378946 -0.0163049704831442 0.168724478873716  -0.0966366622791459 0.923014940447818


• Here the log2FCs correspond to the contrast of pollutant vs control for the line A only (i.e., the main effect = the first level), even if line A is not specified, whereas logFCs correspond to the last level when using nbinomLRT() function (see above, e.g. section 1.1). Is it correct?

Source : ?results :

For analyses using the likelihood ratio test (using nbinomLRT)  [… ] log2 fold change is printed in the results table can be controlled using the name argument, or by default this will be the estimated coefficient for the last element of resultsNames(object).

[…]

# the condition effect for genotype I (the main effect) results(dds, contrast=c("condition","B","A"))

(->  here the first level of “genotype” factor is “genotype I” and the test employed is the Wald test)

• The pvalues refer to the contrast of pollutant vs control for the line A only, and not to the global effect across line A, B and C as it is the case when using nbinomLRT() function of section 1.1 ?

## 2.1 - Pollutant effect for the line B only : (by default, test=”Wald”)

• Is it the pollutant effect for the line A (main effect) + the additional interaction effect of pollutant with line B ?
results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineB.treatmentpollutant")))

Source ?results :

# the condition effect for genotype III. # this is the main effect plus the interaction term # (the extra condition effect in genotype III compared to genotype I). results(dds, contrast=list( c("condition_B_vs_A","genotypeIII.conditionB") ))
• I wonder here why we do not have to add also the main line B effect :
results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineB.treatmentpollutant", "line_B_vs_A"))) ?

## 2.3 - Pollutant effect for the line C only :

results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineC.treatmentpollutant")))

## 2.4 - The interaction term for pollutant effect in line B vs A  only:

resm3<- results(m3, contrast=list( c("lineB.treatmentpollutant")))

## 2.5 - The interaction term for pollutant effect in line C vs A  only:

resm3<- results(m3, contrast=list( c("lineC.treatmentpollutant")))

## 2.6 - The Line B effect vs line A only (regardless "treatment" attributes):

results(m3, contrast=list( c("line_B_vs_A")))

Many thanks in advance for your help,

Best regards,

P. B.

deseq2 • 64 views
modified 14 days ago by Michael Love21k • written 14 days ago by paul.boisseaux0
Answer: Interpretation of logFC and pvalues: LRT vs Wald tests in DESeq2
0
14 days ago by
Michael Love21k
United States
Michael Love21k wrote:

If I read correctly there are two questions here: does the pvalue in a LRT test for multiple coefficients or just the one printed in the table under logFC. This question is answered in the documentation (in the vignette FAQ and ?results): it is the pvalue for the LRT of the multiple coefficients not just the one printed in the table. The other question is whether the main effect Wald test is just for the reference level: yes.

Thank you so much for the quick reply.

You have summarized and answered some of my questions. I have written the details in each case to be sure to understand well and avoid any confusion.

To be sure, do you confirm that regarding each of my bullet-point questions, the response is "yes" ?

My role here, as I conceive of it, is to help users with questions about how to use the software, not to double check statistical analyses. I just don't have any time to do this. I would recommend working with a statistician if you are unsure on the meaning of the coefficients in the linear model here.