DESeq2 multifactorial formula
1
0
Entering edit mode
Ugo Borello ▴ 340
@ugo-borello-5753
Last seen 6.6 years ago
France
Good morning, I am trying to run DESeq2 with this design formula design <- (~batch+sex+tissue) This is what I do from a count matrix: >library(DESeq2) >countTable<- read.table('matrix.txt', header=TRUE, row.names=1) >ConDesign<- data.frame(row.names = colnames(countTable), batch = factor(c("1", "2", "3", "1", "2", "3")), sex = factor(c("F", "F", "M", "F", "F", "M")), tissue = factor(c("Cx", "Cx", "Cx", "BrS", "BrS", "BrS"))) >ConDesign batch sex tissue Cx_1 1 F Cx Cx_2 2 F Cx Cx_3 3 M Cx BrS_1 1 F BrS BrS_2 2 F BrS BrS_3 3 M BrS When I run >xx<- model.matrix(~batch+sex+tissue, ConDesign) I get >xx (Intercept) batch2 batch3 sexM tissueBrS Cx_1 1 0 0 0 0 Cx_2 1 1 0 0 0 Cx_3 1 0 1 1 0 BrS_1 1 0 0 0 1 BrS_2 1 1 0 0 1 BrS_3 1 0 1 1 1 But when I run: >dse<- DESeqDataSetFromMatrix(countData= countTable, colData = ConDesign, design = (~batch+sex+tissue)) I get this error message: Error in validObject(.Object) : invalid class ?DESeqDataSet? object: the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others Where is my mistake? Thank you for your help, Ugo > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.22.6 [5] DESeq2_1.0.19 RcppArmadillo_0.3.910.0 Rcpp_0.10.4 lattice_0.20-23 [9] Biobase_2.20.1 GenomicRanges_1.12.5 IRanges_1.18.3 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 genefilter_1.42.0 grid_3.0.1 locfit_1.5-9.1 RColorBrewer_1.0-5 [6] splines_3.0.1 stats4_3.0.1 survival_2.37-4 tools_3.0.1 XML_3.95-0.2 [11] xtable_1.7-1
DESeq2 DESeq2 • 4.0k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 26 minutes ago
United States
hi Ugo, The problem with the experimental design is that all the males are in batch 3, so you can't separate the effect of these two. There are many ways to name this problem: linearly dependent covariates, rank deficient design matrix, etc. My suggestion would be to remove the sex variable. This would then control for differences due to batch (and in the case of batch 3, absorbing the male effect). You cannot test for significance of the male effect anyway, because you wouldn't be able to tell apart the male effect from the batch 3 effect. hope this helps, Mike On Wed, Nov 20, 2013 at 10:03 AM, Ugo Borello <ugo.borello@inserm.fr> wrote: > Good morning, > > I am trying to run DESeq2 with this design formula > design <- (~batch+sex+tissue) > > This is what I do from a count matrix: > >library(DESeq2) > > >countTable<- read.table('matrix.txt', header=TRUE, row.names=1) > > > >ConDesign<- data.frame(row.names = colnames(countTable), > batch = factor(c("1", "2", "3", "1", "2", "3")), > sex = factor(c("F", "F", "M", "F", "F", "M")), > tissue = factor(c("Cx", "Cx", "Cx", > "BrS", "BrS", "BrS"))) > > >ConDesign > batch sex tissue > Cx_1 1 F Cx > Cx_2 2 F Cx > Cx_3 3 M Cx > BrS_1 1 F BrS > BrS_2 2 F BrS > BrS_3 3 M BrS > > > When I run > >xx<- model.matrix(~batch+sex+tissue, ConDesign) > > I get > >xx > (Intercept) batch2 batch3 sexM tissueBrS > Cx_1 1 0 0 0 0 > Cx_2 1 1 0 0 0 > Cx_3 1 0 1 1 0 > BrS_1 1 0 0 0 1 > BrS_2 1 1 0 0 1 > BrS_3 1 0 1 1 1 > > But when I run: > >dse<- DESeqDataSetFromMatrix(countData= countTable, > colData = ConDesign, > design = (~batch+sex+tissue)) > > > > I get this error message: > Error in validObject(.Object) : > invalid class ³DESeqDataSet² object: the model matrix is not full rank, > i.e. one or more variables in the design formula are linear combinations of > the others > > Where is my mistake? > > Thank you for your help, > Ugo > > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] org.Hs.eg.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 > AnnotationDbi_1.22.6 > [5] DESeq2_1.0.19 RcppArmadillo_0.3.910.0 Rcpp_0.10.4 > lattice_0.20-23 > [9] Biobase_2.20.1 GenomicRanges_1.12.5 IRanges_1.18.3 > BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 genefilter_1.42.0 grid_3.0.1 > locfit_1.5-9.1 > RColorBrewer_1.0-5 > [6] splines_3.0.1 stats4_3.0.1 survival_2.37-4 tools_3.0.1 > XML_3.95-0.2 > [11] xtable_1.7-1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thank you Ugo From: Michael Love <michaelisaiahlove@gmail.com> Date: Wed, 20 Nov 2013 10:21:10 -0500 To: Ugo Borello <ugo.borello@inserm.fr> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Subject: Re: [BioC] DESeq2 multifactorial formula hi Ugo, The problem with the experimental design is that all the males are in batch 3, so you can't separate the effect of these two. There are many ways to name this problem: linearly dependent covariates, rank deficient design matrix, etc. My suggestion would be to remove the sex variable. This would then control for differences due to batch (and in the case of batch 3, absorbing the male effect). ?You cannot test for significance of the male effect anyway, because you wouldn't be able to tell apart the male effect from the batch 3 effect. hope this helps, Mike On Wed, Nov 20, 2013 at 10:03 AM, Ugo Borello <ugo.borello@inserm.fr> wrote: > Good morning, > > I am trying to run DESeq2 with this design formula > design <- (~batch+sex+tissue) > > This is what I do from a count matrix: >> >library(DESeq2) > >> >countTable<- read.table('matrix.txt', header=TRUE, row.names=1) > > >> >ConDesign<- data.frame(row.names = colnames(countTable), > ? ? ? ? ? ? ? ? ? ? ? ?batch = factor(c("1", "2", "3", "1", "2", "3")), > ? ? ? ? ? ? ? ? ? ? ? ?sex = factor(c("F", "F", "M", "F", "F", "M")), > ? ? ? ? ? ? ? ? ? ? ? ?tissue = factor(c("Cx", "Cx", "Cx", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "BrS", "BrS", "BrS"))) > >> >ConDesign > ? ? ? ? ? ? ? ?batch sex ?tissue > Cx_1 ? ? ? ?1 ? ? ? F ? ? Cx > Cx_2 ? ? ? ?2 ? ? ? F ? ? Cx > Cx_3 ? ? ? ?3 ? ? ? M ? ?Cx > BrS_1 ? ? ? 1 ? ? ? F ? ?BrS > BrS_2 ? ? ? 2 ? ? ? F ? ?BrS > BrS_3 ? ? ? 3 ? ? ? M ? BrS > > > When I run >> >xx<- model.matrix(~batch+sex+tissue, ConDesign) > > I get >> >xx > ? ? ? ? ? ? ? ? (Intercept) batch2 batch3 sexM tissueBrS > Cx_1 ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ?0 ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?0 > Cx_2 ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ?1 ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?0 > Cx_3 ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ?0 ? ? ?1 ? ? ? ? ?1 ? ? ? ? ?0 > BrS_1 ? ? ? ? ? ? ? ? ? ? ?1 ? ? ?0 ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?1 > BrS_2 ? ? ? ? ? ? ? ? ? ? ?1 ? ? ?1 ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?1 > BrS_3 ? ? ? ? ? ? ? ? ? ? ?1 ? ? ?0 ? ? ?1 ? ? ? ? ?1 ? ? ? ? ?1 > > But when I run: >> >dse<- DESeqDataSetFromMatrix(countData= countTable, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?colData = ConDesign, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?design = ?(~batch+sex+tissue)) > > > > I get this error message: > Error in validObject(.Object) : > ? invalid class ��DESeqDataSet�� object: the model matrix is not full rank, > i.e. one or more variables in the design formula are linear combinations of > the others > > Where is my mistake? > > Thank you for your help, > Ugo > > >> > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel ?stats ? ? graphics ?grDevices utils ? ? datasets ?methods > base > > other attached packages: > ?[1] org.Hs.eg.db_2.9.0 ? ? ?RSQLite_0.11.4 ? ? ? ? ?DBI_0.2-7 > AnnotationDbi_1.22.6 > ?[5] DESeq2_1.0.19 ? ? ? ? ? RcppArmadillo_0.3.910.0 Rcpp_0.10.4 > lattice_0.20-23 > ?[9] Biobase_2.20.1 ? ? ? ? ?GenomicRanges_1.12.5 ? ?IRanges_1.18.3 > BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > ?[1] annotate_1.38.0 ? ?genefilter_1.42.0 ?grid_3.0.1 ? ? ? ? locfit_1.5-9.1 > RColorBrewer_1.0-5 > ?[6] splines_3.0.1 ? ? ?stats4_3.0.1 ? ? ? survival_2.37-4 ? ?tools_3.0.1 > XML_3.95-0.2 > [11] xtable_1.7-1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6