Using DESeq2 for Differential Expression Analysis with Interaction Term
1
0
Entering edit mode
p.l.chen • 0
@8bbd0e94
Last seen 5 days ago
Taiwan

Hello

I'm a beginner at DESeq2. Currently, I'm trying to use the different design formulas to analyze the data from the Bioconductor package airway.

I follow the step in DESeq2 vignette: RNA-seq workflow to compute the statistics results. However, the error message below occurred when I'm specify the interaction term in the design formula.

Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.


My question is, the error occurs when I follow the instructions in example("results") to specify the design formula. Why does this error occur, and How can I generate the result with the interaction effect? ?

I will be very much apperciate if someone can help me with this problem.

1) loading data from package(airway)

> # Loading data
> library("airway")
> library("DESeq2")
> data(gse)
> gse
class: RangedSummarizedExperiment
dim: 58294 8
metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
assays(3): counts abundance length
rownames(58294): ENSG00000000003.14 ENSG00000000005.5 ... ENSG00000285993.1
ENSG00000285994.1
rowData names(1): gene_id
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(3): names donor condition


2) rename and relevel the variable

> # rename variable
> (gse$cell <- gse$donor)
[1] N61311  N61311  N052611 N052611 N080611 N080611 N061011 N061011
Levels: N052611 N061011 N080611 N61311

> (gse$dex <- gse$condition)
[1] Untreated     Dexamethasone Untreated     Dexamethasone Untreated     Dexamethasone Untreated
[8] Dexamethasone
Levels: Untreated Dexamethasone

> levels(gse$dex) = c("untrt", "trt") > levels(gse$dex)
[1] "untrt" "trt"


3) building DESeqDataSet with the design formula ~ cell + dex and do the analysis.

> # building DESeqDataSet
> dds <- DESeqDataSet(gse, design = ~ cell + dex)
using counts and average transcript lengths from tximeta

> dds
class: DESeqDataSet
dim: 58294 8
metadata(7): tximetaInfo quantInfo ... txdbInfo version
assays(3): counts abundance avgTxLength
rownames(58294): ENSG00000000003.14 ENSG00000000005.5 ... ENSG00000285993.1
ENSG00000285994.1
rowData names(1): gene_id
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(5): names donor condition cell dex

> # Filtering
> keep = rowSums(counts(dds)) > 1
> dds = dds[keep,]
> dim(dds)
[1] 31604     8

> # Defferential analysis
> design(dds)
~cell + dex

> dds = DESeq(dds)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

> resultsNames(dds)
[1] "Intercept"               "cell_N061011_vs_N052611" "cell_N080611_vs_N052611"
[4] "cell_N61311_vs_N052611"  "dex_trt_vs_untrt"

> results(dds, contrast = c("dex", "untrt", "trt"))
log2 fold change (MLE): dex untrt vs trt
Wald test p-value: dex untrt vs trt
DataFrame with 31604 rows and 6 columns
baseMean log2FoldChange     lfcSE      stat      pvalue       padj
<numeric>      <numeric> <numeric> <numeric>   <numeric>  <numeric>
ENSG00000000003.14 739.940717      0.3611537  0.106869  3.379419 0.000726392 0.00531137
ENSG00000000419.12 511.735722     -0.2063147  0.128665 -1.603509 0.108822318 0.29318870
ENSG00000000457.13 314.194855     -0.0378308  0.158633 -0.238479 0.811509461 0.92255697
ENSG00000000460.16  79.793622      0.1152590  0.314991  0.365912 0.714430444 0.87298038
ENSG00000000938.12   0.307267      1.3691185  3.503764  0.390757 0.695977205         NA
...                       ...            ...       ...       ...         ...        ...
ENSG00000285979.1   38.353886     -0.3423657  0.359511 -0.952310    0.340940   0.600750
ENSG00000285987.1    1.562508     -0.7064145  1.547295 -0.456548    0.647996         NA
ENSG00000285990.1    0.642315     -0.3647333  3.433276 -0.106235    0.915396         NA
ENSG00000285991.1   11.276284      0.1165515  0.748601  0.155692    0.876275   0.952921
ENSG00000285994.1    3.651041      0.0960094  1.068660  0.089841    0.928414         NA


4) Analyze the data with interaction term ~ cell + dex + cell:dex.

In this step, after I specify the design formula with the interaction term ~ cell + dex + cell:dex. The error occur when I tried to run DESeq() function on the data set.

The design formula I used ~ cell + dex + cell:dex is the same as the interaction design formula they demonstrate in example("results").

> # Defferential analysis using interaction term
> dds_int = dds
> design(dds_int) = formula(~ cell + dex + cell:dex)
> dds_int = DESeq(dds_int)
using pre-existing normalization factors
estimating dispersions
found already estimated dispersions, replacing these
Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.


5) I tried to sepcify the design formula while building the DESeqDataSet. However, the same error occurs when I tried the run DESeq() on the dataset

> dds_int = DESeqDataSet(gse, design = ~ cell + dex + cell:dex)
using counts and average transcript lengths from tximeta
> dim(dds_int)
[1] 58294     8
>
> keep = rowSums(counts(dds_int)) > 1
> dds_int = dds_int[keep,]
> dim(dds_int)
[1] 31604     8
>
> design(dds_int)
~cell + dex + cell:dex
>
> dds_int = DESeq(dds_int)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.


6) I attempted to create model.matrix and use it to run the DESeq Analysis. However, the same error still occurs.

> # model formula
> dds_int = dds
> attach(as.data.frame(colData(dds_int)))
The following objects are masked from as_data_frame(colData(dds_int)):

cell, condition, dex, donor, names

>
> mm = model.matrix( ~ cell + dex + cell:dex)
> design(dds_int) = mm
> design(dds_int)
(Intercept) cellN061011 cellN080611 cellN61311 dextrt cellN061011:dextrt cellN080611:dextrt
1           1           0           0          1      0                  0                  0
2           1           0           0          1      1                  0                  0
3           1           0           0          0      0                  0                  0
4           1           0           0          0      1                  0                  0
5           1           0           1          0      0                  0                  0
6           1           0           1          0      1                  0                  1
7           1           1           0          0      0                  0                  0
8           1           1           0          0      1                  1                  0
cellN61311:dextrt
1                 0
2                 1
3                 0
4                 0
5                 0
6                 0
7                 0
8                 0
attr(,"assign")
[1] 0 1 1 1 2 3 3 3
attr(,"contrasts")
attr(,"contrasts")$cell [1] "contr.treatment" attr(,"contrasts")$dex
[1] "contr.treatment"

>
> dds_int = DESeq(dds_int, test="Wald", modelMatrixType = "standard")
using supplied model matrix
using pre-existing normalization factors
estimating dispersions
found already estimated dispersions, replacing these
Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.


Two of the design formulas for the interaction term I create in the above code chunk are the same as the one demonstrated in example(results). I wonder why is the error keep occurring. How can I generate the result with the interaction effect?

Thank you all for your time.

0
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

The design matrix has the same number of samples and coefficients to fit, so estimation of dispersion is not possible...

Why does this error occur

This gets into the theory of linear models, but the key thing is that you have specified a design that generates a certain number of coefficients for DESeq2 to estimate, let's call that number p. If p = n, then the fitted values will be identical to the observed values, and there is no "degrees of freedom" remaining to estimate variance. You could consult a lineal model textbook, or ask someone at your institute familiar with linear models, if you want to know more about this.

How can I generate the result with the interaction effect?

You cannot specify that model with such data. You would need to have more samples, or less coefficients, to run DESeq2.

0
Entering edit mode

Michael Love Thank you for your quick answer. That explains a lot! Thanks!!