DESeq2 paired and multi factor comparison
2
1
Entering edit mode
shoko1018 ▴ 10
@shoko1018-7079
Last seen 9.4 years ago
United States

Hi,

I am trying to run paired and multi factor comparison by DESeq2 and keep getting errors. I have read other posts such as DESeq2 paired multifactor test and Yet another DESeq2 nested design contrast matrix question but still cannot figure out why my design does not work.

My sample table looks like:

Treatment Animal Time_point
A m1 d0
A m2 d0
A m3 d0
A m1 d4
A m2 d4
A m3 d4
A m1 d5
A m2 d5
A m3 d5
B m1 d0
B m2 d0
B m3 d0
B m1 d4
B m2 d4
B m3 d4
B m1 d5
B m2 d5
B m3 d5
C m1 d0
C m2 d0
C m3 d0
C m1 d4
C m2 d4
C m3 d4
C m1 d5
C m2 d5
C m3 d5

 The same animals in the same treatment group are paired.

My questions are:

  1. Differences between d0 vs d5 in each treatment
  2. Differences between d4 vs d5 in each treatment
  3. Differences between d0-d5 difference of each treatment

I have tried:

  1. ~ Animal + Time_point + Treatment + Time_point:Treatment
  2. ~ Treatment + Treatment:Animal+ Treatment:Time_point 

In both cases, I got an error:

"Error in DESeqDataSet(se, design = design, ignoreRank) :   the model matrix is not full rank, so the model cannot be fit as specified.   one or more variables or interaction terms in the design formula   are linear combinations of the others and must be removed"

What would be the right design formula in this case?

Thank you in advance.

Shoko

 

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] DESeq2_1.6.2 RcppArmadillo_0.4.500.0 Rcpp_0.11.3 [4] GenomicRanges_1.18.1 GenomeInfoDb_1.2.3 IRanges_2.0.0 [7] S4Vectors_0.4.0 BiocGenerics_0.12.1

loaded via a namespace (and not attached): [1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 [4] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 [7] Biobase_2.26.0 BiocParallel_1.0.0 brew_1.0-6 [10] checkmate_1.5.0 cluster_1.15.3 codetools_0.2-9 [13] colorspace_1.2-4 DBI_0.3.1 digest_0.6.4 [16] fail_1.2 foreach_1.4.2 foreign_0.8-61 [19] Formula_1.1-2 genefilter_1.48.1 geneplotter_1.44.0 [22] ggplot2_1.0.0 grid_3.1.2 gtable_0.1.2 [25] Hmisc_3.14-5 iterators_1.0.7 lattice_0.20-29 [28] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-35 [31] munsell_0.4.2 nnet_7.3-8 plyr_1.8.1 [34] proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.4 [37] rpart_4.1-8 RSQLite_1.0.0 scales_0.2.4 [40] sendmailR_1.2-1 splines_3.1.2 stringr_0.6.2 [43] survival_2.37-7 tools_3.1.2 XML_3.98-1.1 [46] xtable_1.7-4 XVector_0.6.0

deseq2 • 8.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

hi Shoko,

Can you go back and check over your code? That exact colData and those formula should produce full rank design matrices. Can you paste in the code you are using, and print the colData object you are using? (if colData is a DataFrame, you can use as.data.frame(colData) in order to print all the rows). 

And I would recommend the second design, ~ Treatment + Treatment:Animal + Treatment:Time_point

 

ADD COMMENT
0
Entering edit mode

hi Shoko,

You are encountering this error, because you have missing samples dispersed in the cross of treatment and time, which means you need to do a bit more work to run the model. More detail on what is happening inside DESeq(): when you specify a design with Treatment:Time_point, the base R function used internally (model.matrix) in DESeq() forms an interaction between the non-base level time points and all the treatments. However, as seen in the table you posted, some of these interaction terms cannot be fit because of missing data. For example, what should be the d4 vs d0 effect for the Treatment/Group 12? There are no d4 samples for Treatment/Group 12 so this effect cannot possibly be fit. So when model.matrix is called inside DESeq(), this results in a matrix with some columns with all 0's. You can manually remove these columns and fit such a model, and then you can build results tables, except for those comparisons which are impossible to make given the experimental design (e.g. the one mentioned above).

There is not currently a solution to simply use DESeq() and supply a modified design matrix (though I hope to implement this during this devel cycle).

However, in version 1.6 you can get around this, by calling the nbinomLRT function with a custom model matrix, and then extract contrasts using results() and the name or list type for the 'contrast' argument (see the description of contrast types in ?results). This would look like:

mm1 <- model.matrix(~ Treatment + Treatment:Animal + Treatment:Time_point)
idx <- which(colSums(mm1 == 0) == nrow(mm1))
mm1 <- mm1[,-idx] # removing the columns with all 0's
mm0 <- model.matrix(~ Treatment + Treatment:Animal) # this matrix is necessary to run nbinomLRT
design(dds) <- ~ Treatment + Treatment:Animal + Treatment:Time_point
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds, modelMatrix=mm1)
dds <- estimateDispersionsFit(dds)
dds <- estimateDispersionsMAP(dds, modelMatrix=mm1)
dds <- nbinomLRT(dds, full=mm1, reduced=mm0)
results(dds, name="TreatmentA.Time_pointd4", test="Wald")
results(dds, contrast=list("TreatmentB.Time_pointd4","TreatmentA.Time_pointd4"), test="Wald")

 

ADD REPLY
0
Entering edit mode

Thanks for the reply. I am trying your code but have problems creating DESeqDataSet object. When I tried "design(dds) <- ~ Treatment + Treatment:Animal + Treatment:Time_point", I had an error "object 'dds' not found". Thus, I tried to create DESeqDataSet object by

dds <- DESeqDataSetFromMatrix(
  countData = as.matrix(abundDF),
  colData = metaDF,
  design = formula(~ Treatment + Treatment:Animal + Treatment:Time_point)
)

However, it gives an error (that I originally had), "Error in DESeqDataSet(se, design = design, ignoreRank) :  the model matrix is not full rank, so the model cannot be fit as specified. one or more variables or interaction terms in the design formula are linear combinations of the others and must be removed"

Thanks again for your help.

Shoko

 

ADD REPLY
0
Entering edit mode

hi Shoko,

Sorry, I see the problem. Just use design = ~ Treatment. It will actually be skipped in all of the steps, where the matrices are provided instead, so it doesn't matter what to put here (except you need to bypass the checks on rank). I still need to work on the steps where user supplies a custom design matrix.

ADD REPLY
0
Entering edit mode

Thank you for all your help. Finally I had it run and got some results. Here is the summary code I used for the future reference.

dds <- DESeqDataSetFromMatrix(
  countData = as.matrix(abundDF),
  colData = metaDF,
  design = formula(~ Treatment)
)

mm1 <- model.matrix(~ Treatment + Treatment:Animal + Treatment:Time_point, metaDF)
idx <- which(colSums(mm1 == 0) == nrow(mm1))
mm1 <- mm1[,-idx]
mm0 <- model.matrix(~ Treatment + Treatment:Animal, metaDF)

design(dds) <- ~ Treatment + Treatment:Animal + Treatment:Time_point
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds, modelMatrix=mm1)

dds <- estimateDispersionsFit(dds)
dds <- estimateDispersionsMAP(dds, modelMatrix=mm1)

dds <- nbinomLRT(dds, full=mm1, reduced=mm0)

# Compare d5 v d0
Treatment1_d0v5 = results(dds, name="Treatment1.Time_pointd5", test="Wald")

# Compare d4 v d5
Treatment1_d4v5 = results(dds, contrast=list("Treatment1.Time_pointd5", "Treatment1.Time_pointd4"), test="Wald")

# Compare two different treatment
# Treatment1_v_Treatment2 = results(dds, contrast=list("Treatment1.Time_pointd5","Treatment2.Time_pointd5"), test="Wald")

ADD REPLY
0
Entering edit mode
Yes, except you have annotated this one in the wrong order: list("Treatment1.Time_pointd5", "Treatment1.Time_pointd4") is a 5 vs 4 effect. And you have a d5 vs d0 effect, which is correctly labeled in the comment, with a confusing variable name "...d0v5"
ADD REPLY
0
Entering edit mode
shoko1018 ▴ 10
@shoko1018-7079
Last seen 9.4 years ago
United States

Thank you for the quick response! My code follows. I have 434 rows in the colData object, so I printed out tables. Here, Treatment is "Group" and Animal is "Mice2".

dds <- DESeqDataSetFromMatrix(
  countData = as.matrix(abundDF),
  colData = metaDF,
  design = formula(~ Group + Group:Mice2 + Group:Time_point)
)

>table(metaDF$Group,metaDF$Time_point)
 
     d0 d4 d5 dn1
  1   6  4  6   5
  10  5  6  6   5
  11  6  6  6   6
  12  5  0  6   0
  13  6  0  5   0
  14  6  0  5   0
  15  6  0  6   0
  16  6  6  6   5
  17  6  6  6   6
  18  6  6  6   5
  19  6  6  6   5
  2   6  0  6   0
  20  6  6  6   5
  21  5  5  6   5
  22  6  6  6   6
  25  5  5  6   5
  26  5  7  6   6
  3   6  0  6   0
  4   6  0  5   0
  5   5  6  6   6
  6   6  0  5   0
  7   6  0  6   0
  8   6  5  6   6
  9   6  0  6   0
> table(metaDF$Group,metaDF$Mice2)
 
     1 2 3 4 5 6
  1  4 4 4 3 3 3
  10 4 4 3 3 4 4
  11 4 4 4 4 4 4
  12 2 2 2 1 2 2
  13 2 1 2 2 2 2
  14 1 2 2 2 2 2
  15 2 2 2 2 2 2
  16 4 4 3 4 4 4
  17 4 4 4 4 4 4
  18 4 4 4 4 3 4
  19 4 3 4 4 4 4
  2  2 2 2 2 2 2
  20 3 4 4 4 4 4
  21 3 4 4 3 3 4
  22 4 4 4 4 4 4
  25 4 3 4 4 2 4
  26 4 4 4 4 4 4
  3  2 2 2 2 2 2
  4  2 2 2 2 1 2
  5  4 4 3 4 4 4
  6  2 2 2 2 1 2
  7  2 2 2 2 2 2
  8  4 4 4 4 4 3
  9  2 2 2 2 2 2
ADD COMMENT

Login before adding your answer.

Traffic: 905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6