design model EdgeR
1
0
Entering edit mode
@christopher-t-gregg-4973
Last seen 10.4 years ago
Hi, We are working on a glm analysis in EdgeR and voom() and I am hoping we can help with our design. Currently, we are getting errors. We have 18 RNA-Seq biological replicates (samples), each sample has two columns of counts for each tag, one for the maternal allele and one for the paternal allele. Nine samples have mother A and father B, and nine samples have mother B and father A. we wish to test the following models in edgeR and voom() (~sample+allele+parent) vs (~sample+allele) (~sample+allele+allele:parent) vs (~sample+allele) Our design matrix is built as follows. Once concern is that the two columns for each sample are paired, so I give them the same sample ID. sample<-factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12 ,13,13,14,14,15,15,16,16,17,17,18,18)) #provides sample labels title<-c("A", "B") #provides allele labels allele<-factor(c(rep(title, 18))) parent1<-c("father", "mother") #parent labels for the first nine replicates parent2<-c("mother", "father") #parent labels for the second nine replicates parent<-factor(c(c(rep(parent1, 9)), c(rep(parent2, 9)))) model.matrix(~sample+allele+allele:parent)->design The errors we get with this approach are as follows: EdgeR > d <-estimateGLMCommonDisp(d, design) Error in solve.default(R, t(beta)) : system is computationally singular: reciprocal condition number = 3.227e-17 voom() > design<-model.matrix(~sample+allele+allele:parent) > y<-voom(ctx,design, plot=TRUE, lib.size=colSums(ctx)*nf) Coefficients not estimable: alleleA:parentmother Warning message: Partial NA coefficients for 159907 probe(s) Thank you very much for any help! best wishes, Chris > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.6.3 limma_3.12.0 loaded via a namespace (and not attached): [1] annotate_1.34.0 AnnotationDbi_1.18.0 Biobase_2.16.0 BiocGenerics_0.2.0 DBI_0.2-5 DESeq_1.9.7 genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 [10] IRanges_1.14.0 RColorBrewer_1.0-5 RSQLite_0.11.1 splines_2.15.0 stats4_2.15.0 survival_2.36-12 xtable_1.7-0 [[alternative HTML version deleted]]
edgeR edgeR • 1.6k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia
Dear Chris, Your data doesn't contain enough information to estimate the linear model that you are trying to fit, hence the error from edgeR and the warning from limma. The first thing to understand is that, in a paired design, the treatment factors are compared within pairs only. So, in your experiment, the various levels of parent and allele are compared within each sample. The basic problem is this. Factors allele and parent each take two levels, so in principle there are four possible combinations of these factors. However the combination A.father occurs only with B.mother, and A.mother occurs only with B.father. Hence the only comparisons you can make are A.father vs B.mother and A.mother vs B.father. In other words, there are only two independent quantities you can estimate. It is impossible for example to estimate the difference between A.father and A.mother because these never occur together for the same sample. In your linear model, you are asking the software to estimate three linear model coefficients to represent the differences between the different levels between allele and parent, when in reality only two are estimable. Specifically you are asking for parent effects within each allele. But there is no way to estimate these quantities, so the software gives an error. limma gives you an informative warning. The code > nonEstimable(design) [1] "alleleB:parentmother" tells you that you cannot estimate the parental mothervsfather comparison for alleleB. It would seem that you need to make a deep decision. Should you be reposing your scientific questions to focus on what can be estimated within samples? Or should you be recovering information from your data by making comparisons between the different samples? The latter can be done in principle using voom by treating sample as a random effect. It's hard to advise you on scientific decisions of this sort that require a deep understanding of the science of your problem. Can you consult a statistician at your university? Show them this email. The issues of setting up a design matrix in edgeR or limma are the same as for any univariate ANOVA type problem, so a good biostatistician should be able to give you reasonable advice on what you can estimate, even if they are not familiar with edgeR or limma or with genomic problems in general. Hope this helps Gordon > Date: Mon, 2 Jul 2012 21:59:27 -0600 > From: chris_utah <chris.gregg at="" neuro.utah.edu=""> > To: bioconductor at r-project.org Subject: > [BioC] design model EdgeR > > Hi, > > We are working on a glm analysis in EdgeR and voom() and I am hoping we > can help with our design. Currently, we are getting errors. > > We have 18 RNA-Seq biological replicates (samples), each sample has two > columns of counts for each tag, one for the maternal allele and one for > the paternal allele. Nine samples have mother A and father B, and nine > samples have mother B and father A. > > we wish to test the following models in edgeR and voom() > > (~sample+allele+parent) vs (~sample+allele) > (~sample+allele+allele:parent) vs (~sample+allele) > > Our design matrix is built as follows. Once concern is that the two > columns for each sample are paired, so I give them the same sample ID. > > sample<-factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12, 12,13,13,14,14,15,15,16,16,17,17,18,18)) > #provides sample labels > title<-c("A", "B") #provides allele labels > allele<-factor(c(rep(title, 18))) > parent1<-c("father", "mother") > #parent labels for the first nine replicates > parent2<-c("mother", "father") #parent labels for the second nine replicates > parent<-factor(c(c(rep(parent1, 9)), c(rep(parent2, 9)))) > > model.matrix(~sample+allele+allele:parent)->design > > The errors we get with this approach are as follows: > > EdgeR >> d <-estimateGLMCommonDisp(d, design) > Error in solve.default(R, t(beta)) : > system is computationally singular: reciprocal condition number = > 3.227e-17 > > > voom() >> design<-model.matrix(~sample+allele+allele:parent) y<-voom(ctx,design, >> plot=TRUE, lib.size=colSums(ctx)*nf) > Coefficients not estimable: alleleA:parentmother Warning message: > Partial NA coefficients for 159907 probe(s) > > Thank you very much for any help! > > best wishes, > > Chris > > >> sessionInfo() > R version 2.15.0 (2012-03-30) Platform: x86_64-apple- darwin9.8.0/x86_64 > (64-bit) > > locale: [1] > en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: [1] stats graphics grDevices utils datasets > methods base > > other attached packages: [1] edgeR_2.6.3 limma_3.12.0 > > loaded via a namespace (and not attached): [1] annotate_1.34.0 > AnnotationDbi_1.18.0 Biobase_2.16.0 BiocGenerics_0.2.0 DBI_0.2-5 > DESeq_1.9.7 genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 [10] > IRanges_1.14.0 RColorBrewer_1.0-5 RSQLite_0.11.1 splines_2.15.0 > stats4_2.15.0 survival_2.36-12 xtable_1.7-0 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6