Question: edgeR - paired samples with multifactorial design - errors
1
gravatar for Guest User
5.3 years ago by
Guest User12k
Guest User12k wrote:
Hi All, I had been trying to do DE analysis of my RNAseq experiment using edgeR and am having some isssues. The details of the Experiment and the R code I tried below: (a) Paired experimental design with 45 pairs (b) Treatment: "Before" and "After" (c) Phenotype: 1 & 2 Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking into account the paired design The R code tried: library(edgeR) counts<-read.delim(file="counts.dat",header=T) pair=factor(pdata$pair) Treat=factor( pdata$treat) Phenotype=factor(pdata$pheno) group<-paste(Treat,Phenotype,sep=".") design=model.matrix(~pair+Treat:Phenotype, data=counts) counts.DGEList<-DGEList(counts, group=group) y<-calcNormFactors(counts.DGEList) y<-estimateCommonDisp(y, design) y<-estimateGLMTrendedDisp(y, design) Error message I get: Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, : Design matrix not of full rank. The following coefficients not estimable: TreatBefore:Phenotype1 TreatBefore:Phenotype2 Any idea to solve this out? Thanks, Preethy -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.4.2 limma_3.18.13 -- Sent via the guest posting facility at bioconductor.org.
rnaseq • 1.4k views
ADD COMMENTlink modified 5.3 years ago by Devon Ryan200 • written 5.3 years ago by Guest User12k
Answer: edgeR - paired samples with multifactorial design - errors
0
gravatar for Devon Ryan
5.3 years ago by
Devon Ryan200
Germany
Devon Ryan200 wrote:
Hi Preethy, You likely want: design=model.matrix(~pair+Treat:Phenotype, data=pdata) If that still yields the error, then you'll need to share "pdata" or "design". Also, please don't crosspost on both this list and biostars ( https://www.biostars.org/p/98907/), it duplicates the community effort. Devon -- Devon Ryan, Ph.D. Email: dpryan@dpryan.com Laboratory for Molecular and Cellular Cognition German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn Germany <devon.ryan@dzne.de> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest@bioconductor.org>wrote: > > Hi All, > > I had been trying to do DE analysis of my RNAseq experiment using edgeR > and am having some isssues. The details of the Experiment and the R code I > tried below: > > (a) Paired experimental design with 45 pairs > (b) Treatment: "Before" and "After" > (c) Phenotype: 1 & 2 > Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking > into account the paired design > > The R code tried: > > library(edgeR) > counts<-read.delim(file="counts.dat",header=T) > pair=factor(pdata$pair) > Treat=factor( pdata$treat) > Phenotype=factor(pdata$pheno) > group<-paste(Treat,Phenotype,sep=".") > design=model.matrix(~pair+Treat:Phenotype, data=counts) > counts.DGEList<-DGEList(counts, group=group) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > Error message I get: > > Error in glmFit.default(y, design = design, dispersion = dispersion, > offset = offset, : > Design matrix not of full rank. The following coefficients not > estimable: > TreatBefore:Phenotype1 TreatBefore:Phenotype2 > > > Any idea to solve this out? > > Thanks, > Preethy > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.3 years ago by Devon Ryan200
Hi Devon, Thanks for the replies both on biostars and here. Sorry for crossposting. Both mailing lists and discussion have been very helpful to me. But, I rarely see replies from BioC package maintainers at biostars. All the samples are paired and I have them all correct - I mean none of them are empty I tried different designs. But ending up with the same error message. What I want to do: Getting DE genes between (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2.After) pdata here: pair=rep(c(1:45), each=2) Treat=rep(c("Before", "After"),45) Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2), each=2) pdata<-data.frame(pair,Treat,Phenotype) Preethy On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com> wrote: > Hi Preethy, > > You likely want: > > design=model.matrix(~pair+Treat:Phenotype, data=pdata) > > If that still yields the error, then you'll need to share "pdata" or > "design". Also, please don't crosspost on both this list and biostars ( > https://www.biostars.org/p/98907/), it duplicates the community effort. > > Devon > > > -- > Devon Ryan, Ph.D. > Email: dpryan@dpryan.com > Laboratory for Molecular and Cellular Cognition > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn > Germany > <devon.ryan@dzne.de> > > > On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest@bioconductor.org>wrote: > >> >> Hi All, >> >> I had been trying to do DE analysis of my RNAseq experiment using edgeR >> and am having some isssues. The details of the Experiment and the R code I >> tried below: >> >> (a) Paired experimental design with 45 pairs >> (b) Treatment: "Before" and "After" >> (c) Phenotype: 1 & 2 >> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >> into account the paired design >> >> The R code tried: >> >> library(edgeR) >> counts<-read.delim(file="counts.dat",header=T) >> pair=factor(pdata$pair) >> Treat=factor( pdata$treat) >> Phenotype=factor(pdata$pheno) >> group<-paste(Treat,Phenotype,sep=".") >> design=model.matrix(~pair+Treat:Phenotype, data=counts) >> counts.DGEList<-DGEList(counts, group=group) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> Error message I get: >> >> Error in glmFit.default(y, design = design, dispersion = dispersion, >> offset = offset, : >> Design matrix not of full rank. The following coefficients not >> estimable: >> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >> >> >> Any idea to solve this out? >> >> Thanks, >> Preethy >> >> -- output of sessionInfo(): >> >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] edgeR_3.4.2 limma_3.18.13 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLYlink written 5.3 years ago by Preethy Venkat Ram30
Hi Preethy, This experiment is very similar to the example in part 3.5 of the edgeR User's guide, starting on page 31. Best, Jim On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > Hi Devon, > > Thanks for the replies both on biostars and here. Sorry for > crossposting. Both > mailing lists and discussion have been very helpful to me. > > But, I rarely see replies from BioC package maintainers at biostars. > > All the samples are paired and I have them all correct - I mean none of > them are empty > I tried different designs. But ending up with the same error message. > What I want to do: Getting DE genes between > (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2.After) > > pdata here: > > pair=rep(c(1:45), each=2) > Treat=rep(c("Before", "After"),45) > Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, > 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, > 2), each=2) > pdata<-data.frame(pair,Treat,Phenotype) > > > Preethy > > > > On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan at="" dpryan.com=""> wrote: > >> Hi Preethy, >> >> You likely want: >> >> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >> >> If that still yields the error, then you'll need to share "pdata" or >> "design". Also, please don't crosspost on both this list and biostars ( >> https://www.biostars.org/p/98907/), it duplicates the community effort. >> >> Devon >> >> >> -- >> Devon Ryan, Ph.D. >> Email: dpryan at dpryan.com >> Laboratory for Molecular and Cellular Cognition >> German Centre for Neurodegenerative Diseases (DZNE) >> Ludwig-Erhard-Allee 2 >> 53175 Bonn >> Germany >> <devon.ryan at="" dzne.de=""> >> >> >> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest at="" bioconductor.org="">wrote: >> >>> Hi All, >>> >>> I had been trying to do DE analysis of my RNAseq experiment using edgeR >>> and am having some isssues. The details of the Experiment and the R code I >>> tried below: >>> >>> (a) Paired experimental design with 45 pairs >>> (b) Treatment: "Before" and "After" >>> (c) Phenotype: 1 & 2 >>> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >>> into account the paired design >>> >>> The R code tried: >>> >>> library(edgeR) >>> counts<-read.delim(file="counts.dat",header=T) >>> pair=factor(pdata$pair) >>> Treat=factor( pdata$treat) >>> Phenotype=factor(pdata$pheno) >>> group<-paste(Treat,Phenotype,sep=".") >>> design=model.matrix(~pair+Treat:Phenotype, data=counts) >>> counts.DGEList<-DGEList(counts, group=group) >>> y<-calcNormFactors(counts.DGEList) >>> y<-estimateCommonDisp(y, design) >>> y<-estimateGLMTrendedDisp(y, design) >>> >>> >>> Error message I get: >>> >>> Error in glmFit.default(y, design = design, dispersion = dispersion, >>> offset = offset, : >>> Design matrix not of full rank. The following coefficients not >>> estimable: >>> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >>> >>> >>> Any idea to solve this out? >>> >>> Thanks, >>> Preethy >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] edgeR_3.4.2 limma_3.18.13 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLYlink written 5.3 years ago by James W. MacDonald50k
Hi Jim, Thanks for the reply. I've tried that - with a slighlty modified code. I am sorry. But, I'm getting error again. Here: pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", "After"),45), levels=c("Before", "After")) Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2), each=2),levels=c("1","2")) design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) colnames(design) counts.DGEList<-DGEList(counts, group=Treat) y<-calcNormFactors(counts.DGEList) y<-estimateCommonDisp(y, design) y<-estimateGLMTrendedDisp(y, design) The error message here: > y<-estimateGLMTrendedDisp(y, design) Error in return(NA, ntags) : multi-argument returns are not permitted In addition: Warning message: In estimateGLMTrendedDisp.default(y = y$counts, design = design, : No residual df: cannot estimate dispersion One reason I can see is that the number of columns in my design matrix is "92" as they are more replicates/patients than in the 3.5 example. And there are only "90" columns in my "count" matrix. But, do you have any idea how can I solve this ? Thanks, Preethy On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Preethy, > > This experiment is very similar to the example in part 3.5 of the edgeR > User's guide, starting on page 31. > > Best, > > Jim > > > > On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > >> Hi Devon, >> >> Thanks for the replies both on biostars and here. Sorry for >> crossposting. Both >> mailing lists and discussion have been very helpful to me. >> >> But, I rarely see replies from BioC package maintainers at biostars. >> >> All the samples are paired and I have them all correct - I mean none of >> them are empty >> I tried different designs. But ending up with the same error message. >> What I want to do: Getting DE genes between >> (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2. >> After) >> >> pdata here: >> >> pair=rep(c(1:45), each=2) >> Treat=rep(c("Before", "After"),45) >> Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, >> 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, >> 2, >> 2), each=2) >> pdata<-data.frame(pair,Treat,Phenotype) >> >> >> Preethy >> >> >> >> On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com> wrote: >> >> Hi Preethy, >>> >>> You likely want: >>> >>> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >>> >>> If that still yields the error, then you'll need to share "pdata" or >>> "design". Also, please don't crosspost on both this list and biostars ( >>> https://www.biostars.org/p/98907/), it duplicates the community effort. >>> >>> Devon >>> >>> >>> -- >>> Devon Ryan, Ph.D. >>> Email: dpryan@dpryan.com >>> Laboratory for Molecular and Cellular Cognition >>> German Centre for Neurodegenerative Diseases (DZNE) >>> Ludwig-Erhard-Allee 2 >>> 53175 Bonn >>> Germany >>> <devon.ryan@dzne.de> >>> >>> >>> >>> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] < >>> guest@bioconductor.org>wrote: >>> >>> Hi All, >>>> >>>> I had been trying to do DE analysis of my RNAseq experiment using edgeR >>>> and am having some isssues. The details of the Experiment and the R >>>> code I >>>> tried below: >>>> >>>> (a) Paired experimental design with 45 pairs >>>> (b) Treatment: "Before" and "After" >>>> (c) Phenotype: 1 & 2 >>>> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >>>> into account the paired design >>>> >>>> The R code tried: >>>> >>>> library(edgeR) >>>> counts<-read.delim(file="counts.dat",header=T) >>>> pair=factor(pdata$pair) >>>> Treat=factor( pdata$treat) >>>> Phenotype=factor(pdata$pheno) >>>> group<-paste(Treat,Phenotype,sep=".") >>>> design=model.matrix(~pair+Treat:Phenotype, data=counts) >>>> counts.DGEList<-DGEList(counts, group=group) >>>> y<-calcNormFactors(counts.DGEList) >>>> y<-estimateCommonDisp(y, design) >>>> y<-estimateGLMTrendedDisp(y, design) >>>> >>>> >>>> Error message I get: >>>> >>>> Error in glmFit.default(y, design = design, dispersion = dispersion, >>>> offset = offset, : >>>> Design matrix not of full rank. The following coefficients not >>>> estimable: >>>> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >>>> >>>> >>>> Any idea to solve this out? >>>> >>>> Thanks, >>>> Preethy >>>> >>>> -- output of sessionInfo(): >>>> >>>> R version 3.1.0 (2014-04-10) >>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>>> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] edgeR_3.4.2 limma_3.18.13 >>>> >>>> >>>> -- >>>> Sent via the guest posting facility at bioconductor.org. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]
ADD REPLYlink written 5.3 years ago by Preethy Venkat Ram30
Hi Preethy, You need to re-read that section of edgeR, and in particular look at the patient column of the revised targets frame. Best, Jim On 4/25/2014 12:46 PM, Preethy Venkat Ram wrote: > Hi Jim, > > Thanks for the reply. I've tried that - with a slighlty modified > code. I am sorry. But, I'm getting error again. > > Here: > > pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", > "After"),45), levels=c("Before", "After")) > Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, > 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, > 2, 1, 2, 2, 1, 2, 2), each=2),levels=c("1","2")) > design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) > colnames(design) > counts.DGEList<-DGEList(counts, group=Treat) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > The error message here: > > > y<-estimateGLMTrendedDisp(y, design) > Error in return(NA, ntags) : multi-argument returns are not permitted > In addition: Warning message: > In estimateGLMTrendedDisp.default(y = y$counts, design = design, : > No residual df: cannot estimate dispersion > > > One reason I can see is that the number of columns in my design matrix > is "92" as they are more replicates/patients than in the 3.5 > example. And there are only "90" columns in my "count" matrix. > > But, do you have any idea how can I solve this ? > > Thanks, > Preethy > > > > > > On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Preethy, > > This experiment is very similar to the example in part 3.5 of the > edgeR User's guide, starting on page 31. > > Best, > > Jim > > > > On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > > Hi Devon, > > Thanks for the replies both on biostars and here. Sorry for > crossposting. Both > mailing lists and discussion have been very helpful to me. > > But, I rarely see replies from BioC package maintainers at > biostars. > > All the samples are paired and I have them all correct - I > mean none of > them are empty > I tried different designs. But ending up with the same error > message. > What I want to do: Getting DE genes between > (Phenotype1.Before-Phenotype1.After) & > (Phenotype2.Before-Phenotype2.After) > > pdata here: > > pair=rep(c(1:45), each=2) > Treat=rep(c("Before", "After"),45) > Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, > 2, 2, 2, 2, > 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, > 1, 2, 2, 1, 2, > 2), each=2) > pdata<-data.frame(pair,Treat,Phenotype) > > > Preethy > > > > On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan at="" dpryan.com=""> <mailto:dpryan at="" dpryan.com="">> wrote: > > Hi Preethy, > > You likely want: > > design=model.matrix(~pair+Treat:Phenotype, data=pdata) > > If that still yields the error, then you'll need to share > "pdata" or > "design". Also, please don't crosspost on both this list > and biostars ( > https://www.biostars.org/p/98907/), it duplicates the > community effort. > > Devon > > > -- > Devon Ryan, Ph.D. > Email: dpryan at dpryan.com <mailto:dpryan at="" dpryan.com=""> > Laboratory for Molecular and Cellular Cognition > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn > Germany > <devon.ryan at="" dzne.de="" <mailto:devon.ryan="" at="" dzne.de="">> > > > > On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] > <guest at="" bioconductor.org="" <mailto:guest="" at="" bioconductor.org="">>wrote: > > Hi All, > > I had been trying to do DE analysis of my RNAseq > experiment using edgeR > and am having some isssues. The details of the > Experiment and the R code I > tried below: > > (a) Paired experimental design with 45 pairs > (b) Treatment: "Before" and "After" > (c) Phenotype: 1 & 2 > Aim: Look for DE genes between Phenotype 1 and 2 upon > treatment taking > into account the paired design > > The R code tried: > > library(edgeR) > counts<-read.delim(file="counts.dat",header=T) > pair=factor(pdata$pair) > Treat=factor( pdata$treat) > Phenotype=factor(pdata$pheno) > group<-paste(Treat,Phenotype,sep=".") > design=model.matrix(~pair+Treat:Phenotype, data=counts) > counts.DGEList<-DGEList(counts, group=group) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > Error message I get: > > Error in glmFit.default(y, design = design, dispersion > = dispersion, > offset = offset, : > Design matrix not of full rank. The following > coefficients not > estimable: > TreatBefore:Phenotype1 TreatBefore:Phenotype2 > > > Any idea to solve this out? > > Thanks, > Preethy > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > > -- > Sent via the guest posting facility at > bioconductor.org <http: bioconductor.org="">. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLYlink written 5.3 years ago by James W. MacDonald50k
Hi Jim and All, Thanks really a lot for bringing that to my attention. It didn't work this way as well as I had different number of samples for each Phenotype unlike the edgeR example which had 3 patients for each "Disease" category. But, I took off the empty columns from the design matrix and it worked. Is the analysis done correctly this way ? I have one more doubt - Would it be possible to account for batch effect too with this type of design matrix ? Like this: design=model.matrix(~Phenotype+Phenotype:pair+ Phenotype:Treat+Phenotype:batch) Thanks, Preethy On Fri, Apr 25, 2014 at 8:00 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Preethy, > > You need to re-read that section of edgeR, and in particular look at the > patient column of the revised targets frame. > > Best, > > Jim > > > > On 4/25/2014 12:46 PM, Preethy Venkat Ram wrote: > >> Hi Jim, >> >> Thanks for the reply. I've tried that - with a slighlty modified code. I >> am sorry. But, I'm getting error again. >> >> Here: >> >> pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", >> "After"),45), levels=c("Before", "After")) >> Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, >> 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, >> 2, 1, 2, 2), each=2),levels=c("1","2")) >> design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) >> colnames(design) >> counts.DGEList<-DGEList(counts, group=Treat) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> The error message here: >> >> > y<-estimateGLMTrendedDisp(y, design) >> Error in return(NA, ntags) : multi-argument returns are not permitted >> In addition: Warning message: >> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >> No residual df: cannot estimate dispersion >> >> >> One reason I can see is that the number of columns in my design matrix >> is "92" as they are more replicates/patients than in the 3.5 example. And >> there are only "90" columns in my "count" matrix. >> >> But, do you have any idea how can I solve this ? >> >> Thanks, >> Preethy >> >> >> >> >> >> On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi Preethy, >> >> This experiment is very similar to the example in part 3.5 of the >> edgeR User's guide, starting on page 31. >> >> Best, >> >> Jim >> >> >> >> On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: >> >> Hi Devon, >> >> Thanks for the replies both on biostars and here. Sorry for >> crossposting. Both >> mailing lists and discussion have been very helpful to me. >> >> But, I rarely see replies from BioC package maintainers at >> biostars. >> >> All the samples are paired and I have them all correct - I >> mean none of >> them are empty >> I tried different designs. But ending up with the same error >> message. >> What I want to do: Getting DE genes between >> (Phenotype1.Before-Phenotype1.After) & >> (Phenotype2.Before-Phenotype2.After) >> >> pdata here: >> >> pair=rep(c(1:45), each=2) >> Treat=rep(c("Before", "After"),45) >> Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, >> 2, 2, 2, 2, >> 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, >> 1, 2, 2, 1, 2, >> 2), each=2) >> pdata<-data.frame(pair,Treat,Phenotype) >> >> >> Preethy >> >> >> >> On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com>> <mailto:dpryan@dpryan.com>> wrote: >> >> Hi Preethy, >> >> You likely want: >> >> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >> >> If that still yields the error, then you'll need to share >> "pdata" or >> "design". Also, please don't crosspost on both this list >> and biostars ( >> https://www.biostars.org/p/98907/), it duplicates the >> community effort. >> >> Devon >> >> >> -- >> Devon Ryan, Ph.D. >> Email: dpryan@dpryan.com <mailto:dpryan@dpryan.com> >> >> Laboratory for Molecular and Cellular Cognition >> German Centre for Neurodegenerative Diseases (DZNE) >> Ludwig-Erhard-Allee 2 >> 53175 Bonn >> Germany >> <devon.ryan@dzne.de <mailto:devon.ryan@dzne.de="">> >> >> >> >> >> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] >> <guest@bioconductor.org <mailto:guest@bioconductor.org="">> >>wrote: >> >> >> Hi All, >> >> I had been trying to do DE analysis of my RNAseq >> experiment using edgeR >> and am having some isssues. The details of the >> Experiment and the R code I >> tried below: >> >> (a) Paired experimental design with 45 pairs >> (b) Treatment: "Before" and "After" >> (c) Phenotype: 1 & 2 >> Aim: Look for DE genes between Phenotype 1 and 2 upon >> treatment taking >> into account the paired design >> >> The R code tried: >> >> library(edgeR) >> counts<-read.delim(file="counts.dat",header=T) >> pair=factor(pdata$pair) >> Treat=factor( pdata$treat) >> Phenotype=factor(pdata$pheno) >> group<-paste(Treat,Phenotype,sep=".") >> design=model.matrix(~pair+Treat:Phenotype, data=counts) >> counts.DGEList<-DGEList(counts, group=group) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> Error message I get: >> >> Error in glmFit.default(y, design = design, dispersion >> = dispersion, >> offset = offset, : >> Design matrix not of full rank. The following >> coefficients not >> estimable: >> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >> >> >> Any idea to solve this out? >> >> Thanks, >> Preethy >> >> -- output of sessionInfo(): >> >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] edgeR_3.4.2 limma_3.18.13 >> >> >> -- >> Sent via the guest posting facility at >> bioconductor.org <http: bioconductor.org="">. >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> <mailto:bioconductor@r-project.org> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics. >> conductor >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]
ADD REPLYlink written 5.3 years ago by Preethy Venkat Ram30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 279 users visited in the last hour