**0**wrote:

Dear all,

My questions are about the interpretation of results between LRT and Wald test.

In my phenodata, there are two factors: “line” (3 levels in the following order: A, B and C) and “treatment” (2 levels: control and pollutant).

The design formula is written as follows:

m3 <- DESeqDataSetFromMatrix(countData = SC, colData = phenodata, design = ~ line + treatment + line:treatment)

**1- LRTs analyses :**

For all model analyses, I use the dispersion calculated from the initial full model m3 written above.

## 1.1 -The global treatment effect (across all lines) :

a<-nbinomLRT(m3, full= ~line + treatment, reduced= ~ line)

results(a)

log2 fold change (MLE): treatment pollutant vs control

LRT p-value: '~ line + treatment' vs '~ line'

DataFrame with 2596 rows and 6 columns

baseMean log2FoldChange lfcSE stat pvalue padj

<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>

Prot1 1.35029077421343 0.110099773339937 0.56420941937807 0.0380249630726155 0.845393021746716 1

Prot2 1.09110379087717 -0.303956103829664 0.575130622150613 0.280648118622068 0.596276731218334 1

Prot3 42.8582598378946 -0.047232281359725 0.0965362661213601 0.239286154847804 0.624722167336395 1

…

- The logFC is calculated for the last level of both factors, so it corresponds to the logFC of treated line C vs control line C. Is it correct ?
- The pvalues correspond to the
**global**effect of the pollutant across all lines. Is it correct ?

## 1.2 The global line effect :

b<-nbinomLRT(m3, full= ~line + treatment, reduced= ~ treatment) res(b)

```
log2 fold change (MLE): treatment pollutant vs control
```

```
LRT p-value: '~ line + treatment' vs '~ treatment'
```

```
DataFrame with 2596 rows and 6 columns
```

```
baseMean log2FoldChange lfcSE stat pvalue
```

```
<numeric> <numeric> <numeric> <numeric> <numeric>
```

```
Prot1 1.35029077421343 0.110099773339937 0.56420941937807 0.533426184969613 0.765892780340315
```

```
Prot2 1.09110379087717 -0.303956103829664 0.575130622150613 0.927368054977961 0.628962257856992
```

```
Prot3 42.8582598378946 -0.047232281359725 0.0965362661213601 1.87892768732695 0.390837329103353
```

…

- The logFC is calculated for the last level of both factors, so it corresponds to the logFC of treated line C vs control line C, i.e., logFC are the same as in the previous command of section 1.1, is it correct ?
- The pvalues here correspond to the global “line” effect (i.e., not only the line C but A and B also) across all treatment attributes. Is it correct ?

## 1.3 - The global interaction effect :

c<-nbinomLRT(m3, full= ~line + treatment + treatment:line, reduced= ~ line +treatment) results(c)

```
log2 fold change (MLE): lineC.treatmentpollutant
```

```
LRT p-value: '~ line + treatment + treatment:line' vs '~ line + treatment'
```

```
DataFrame with 2596 rows and 6 columns
```

```
baseMean log2FoldChange lfcSE stat pvalue padj
```

```
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
```

```
Prot1 1.35029077421343 0.273519127005516 1.42292211159674 0.0388981336352145 0.980738846067103 1
```

```
Prot2 1.09110379087717 -0.236581829056236 1.41075467002527 0.20737551801291 0.901506740840279 1
```

```
Prot3 42.8582598378946 -0.0526182741727665 0.233486903301052 0.0535340911781361 0.973588016732097 1
```

- The logFCs correspond to the interaction term of lineC.treatmentpollutant only. Is it correct ?
- Do the pvalues here correspond to the global interaction effect ? (i.e., all interaction terms, i.e., not only the interaction effect of line C vs line A but take also into account the interaction of line B vs line A with pollutant and line B vs C with pollutant)?

# 2- Wald tests:

For all model analyses, I use the dispersion (link to the graph) calculated from the initial full model m3 written below.

## 2.1 - Pollutant effect for line A only (the main effect):

```
m3 <- DESeqDataSetFromMatrix(countData = SC, colData = phenodata, design = ~ line + treatment + line:treatment)
m3<- DESeq(m3, fitType='local')
plotDispEsts(m3)
results(m3, contrast=c("treatment", "control", "pollutant"), test="Wald")
``````
```

```
log2 fold change (MLE): treatment_pollutant_vs_control effect
```

```
Wald test p-value: treatment_pollutant_vs_control effect
```

```
DataFrame with 2596 rows and 6 columns
```

```
baseMean log2FoldChange lfcSE stat pvalue
```

```
<numeric> <numeric> <numeric> <numeric> <numeric>
```

```
Prot1 1.35029077421343 -0.0111964302736192 1.01800777183796 -0.0109983740629058 0.991224744056979
```

```
Prot2 1.09110379087717 -0.326531759157502 1.08346183664078 -0.301378182520852 0.763126129203888
```

```
Prot3 42.8582598378946 -0.0163049704831442 0.168724478873716 -0.0966366622791459 0.923014940447818
```

…

- Here the log2FCs correspond to the contrast of pollutant vs control for the line A only (i.e., the main effect = the first level), even if line A is not specified, whereas logFCs correspond to the last level when using nbinomLRT() function (see above, e.g. section 1.1). Is it correct?

Source : ?results :

For analyses using the likelihood ratio test (using nbinomLRT) [… ] log2 fold change is printed in the results table can be controlled using the name argument, or by default this will be the estimated coefficient for the last element of resultsNames(object).

[…]

# the condition effect for genotype I (the main effect) results(dds, contrast=c("condition","B","A"))

(-> here the first level of “genotype” factor is “genotype I” and the test employed is the Wald test)

- The pvalues refer to the contrast of pollutant vs control for the line A only, and
**not to the global effect across line A, B and C as it is the case when using nbinomLRT()**function of section 1.1 ?

## 2.1 - Pollutant effect for the line B only : (by default, test=”Wald”)

- Is it the pollutant effect for the line A (main effect) + the additional interaction effect of pollutant with line B ?

results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineB.treatmentpollutant")))

…

Source ?results :

# the condition effect for genotype III. # this is the main effectplusthe interaction term # (the extra condition effect in genotype III compared to genotype I). results(dds, contrast=list( c("condition_B_vs_A","genotypeIII.conditionB") ))

- I wonder here why we do not have to add also the main line B effect :

results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineB.treatmentpollutant", "line_B_vs_A"))) ?

## 2.3 - Pollutant effect for the line C only :

results(m3, contrast=list( c("treatment_pollutant_vs_control", "lineC.treatmentpollutant")))

…

## 2.4 - The interaction term for pollutant effect in line B vs A only:

resm3<- results(m3, contrast=list( c("lineB.treatmentpollutant")))

## 2.5 - The interaction term for pollutant effect in line C vs A only:

resm3<- results(m3, contrast=list( c("lineC.treatmentpollutant")))

## 2.6 - The Line B effect vs line A only (regardless "treatment" attributes):

results(m3, contrast=list( c("line_B_vs_A")))

Many thanks in advance for your help,

Best regards,

P. B.