edgeR - paired samples with multifactorial design - errors
1
1
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Hi All, I had been trying to do DE analysis of my RNAseq experiment using edgeR and am having some isssues. The details of the Experiment and the R code I tried below: (a) Paired experimental design with 45 pairs (b) Treatment: "Before" and "After" (c) Phenotype: 1 & 2 Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking into account the paired design The R code tried: library(edgeR) counts<-read.delim(file="counts.dat",header=T) pair=factor(pdata$pair) Treat=factor( pdata$treat) Phenotype=factor(pdata$pheno) group<-paste(Treat,Phenotype,sep=".") design=model.matrix(~pair+Treat:Phenotype, data=counts) counts.DGEList<-DGEList(counts, group=group) y<-calcNormFactors(counts.DGEList) y<-estimateCommonDisp(y, design) y<-estimateGLMTrendedDisp(y, design) Error message I get: Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, : Design matrix not of full rank. The following coefficients not estimable: TreatBefore:Phenotype1 TreatBefore:Phenotype2 Any idea to solve this out? Thanks, Preethy -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.4.2 limma_3.18.13 -- Sent via the guest posting facility at bioconductor.org.
RNASeq RNASeq • 2.4k views
ADD COMMENT
0
Entering edit mode
Devon Ryan ▴ 200
@devon-ryan-6054
Last seen 9.0 years ago
Germany
Hi Preethy, You likely want: design=model.matrix(~pair+Treat:Phenotype, data=pdata) If that still yields the error, then you'll need to share "pdata" or "design". Also, please don't crosspost on both this list and biostars ( https://www.biostars.org/p/98907/), it duplicates the community effort. Devon -- Devon Ryan, Ph.D. Email: dpryan@dpryan.com Laboratory for Molecular and Cellular Cognition German Centre for Neurodegenerative Diseases (DZNE) Ludwig-Erhard-Allee 2 53175 Bonn Germany <devon.ryan@dzne.de> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest@bioconductor.org>wrote: > > Hi All, > > I had been trying to do DE analysis of my RNAseq experiment using edgeR > and am having some isssues. The details of the Experiment and the R code I > tried below: > > (a) Paired experimental design with 45 pairs > (b) Treatment: "Before" and "After" > (c) Phenotype: 1 & 2 > Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking > into account the paired design > > The R code tried: > > library(edgeR) > counts<-read.delim(file="counts.dat",header=T) > pair=factor(pdata$pair) > Treat=factor( pdata$treat) > Phenotype=factor(pdata$pheno) > group<-paste(Treat,Phenotype,sep=".") > design=model.matrix(~pair+Treat:Phenotype, data=counts) > counts.DGEList<-DGEList(counts, group=group) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > Error message I get: > > Error in glmFit.default(y, design = design, dispersion = dispersion, > offset = offset, : > Design matrix not of full rank. The following coefficients not > estimable: > TreatBefore:Phenotype1 TreatBefore:Phenotype2 > > > Any idea to solve this out? > > Thanks, > Preethy > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Devon, Thanks for the replies both on biostars and here. Sorry for crossposting. Both mailing lists and discussion have been very helpful to me. But, I rarely see replies from BioC package maintainers at biostars. All the samples are paired and I have them all correct - I mean none of them are empty I tried different designs. But ending up with the same error message. What I want to do: Getting DE genes between (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2.After) pdata here: pair=rep(c(1:45), each=2) Treat=rep(c("Before", "After"),45) Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2), each=2) pdata<-data.frame(pair,Treat,Phenotype) Preethy On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com> wrote: > Hi Preethy, > > You likely want: > > design=model.matrix(~pair+Treat:Phenotype, data=pdata) > > If that still yields the error, then you'll need to share "pdata" or > "design". Also, please don't crosspost on both this list and biostars ( > https://www.biostars.org/p/98907/), it duplicates the community effort. > > Devon > > > -- > Devon Ryan, Ph.D. > Email: dpryan@dpryan.com > Laboratory for Molecular and Cellular Cognition > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn > Germany > <devon.ryan@dzne.de> > > > On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest@bioconductor.org>wrote: > >> >> Hi All, >> >> I had been trying to do DE analysis of my RNAseq experiment using edgeR >> and am having some isssues. The details of the Experiment and the R code I >> tried below: >> >> (a) Paired experimental design with 45 pairs >> (b) Treatment: "Before" and "After" >> (c) Phenotype: 1 & 2 >> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >> into account the paired design >> >> The R code tried: >> >> library(edgeR) >> counts<-read.delim(file="counts.dat",header=T) >> pair=factor(pdata$pair) >> Treat=factor( pdata$treat) >> Phenotype=factor(pdata$pheno) >> group<-paste(Treat,Phenotype,sep=".") >> design=model.matrix(~pair+Treat:Phenotype, data=counts) >> counts.DGEList<-DGEList(counts, group=group) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> Error message I get: >> >> Error in glmFit.default(y, design = design, dispersion = dispersion, >> offset = offset, : >> Design matrix not of full rank. The following coefficients not >> estimable: >> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >> >> >> Any idea to solve this out? >> >> Thanks, >> Preethy >> >> -- output of sessionInfo(): >> >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] edgeR_3.4.2 limma_3.18.13 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Preethy, This experiment is very similar to the example in part 3.5 of the edgeR User's guide, starting on page 31. Best, Jim On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > Hi Devon, > > Thanks for the replies both on biostars and here. Sorry for > crossposting. Both > mailing lists and discussion have been very helpful to me. > > But, I rarely see replies from BioC package maintainers at biostars. > > All the samples are paired and I have them all correct - I mean none of > them are empty > I tried different designs. But ending up with the same error message. > What I want to do: Getting DE genes between > (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2.After) > > pdata here: > > pair=rep(c(1:45), each=2) > Treat=rep(c("Before", "After"),45) > Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, > 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, > 2), each=2) > pdata<-data.frame(pair,Treat,Phenotype) > > > Preethy > > > > On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan at="" dpryan.com=""> wrote: > >> Hi Preethy, >> >> You likely want: >> >> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >> >> If that still yields the error, then you'll need to share "pdata" or >> "design". Also, please don't crosspost on both this list and biostars ( >> https://www.biostars.org/p/98907/), it duplicates the community effort. >> >> Devon >> >> >> -- >> Devon Ryan, Ph.D. >> Email: dpryan at dpryan.com >> Laboratory for Molecular and Cellular Cognition >> German Centre for Neurodegenerative Diseases (DZNE) >> Ludwig-Erhard-Allee 2 >> 53175 Bonn >> Germany >> <devon.ryan at="" dzne.de=""> >> >> >> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] <guest at="" bioconductor.org="">wrote: >> >>> Hi All, >>> >>> I had been trying to do DE analysis of my RNAseq experiment using edgeR >>> and am having some isssues. The details of the Experiment and the R code I >>> tried below: >>> >>> (a) Paired experimental design with 45 pairs >>> (b) Treatment: "Before" and "After" >>> (c) Phenotype: 1 & 2 >>> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >>> into account the paired design >>> >>> The R code tried: >>> >>> library(edgeR) >>> counts<-read.delim(file="counts.dat",header=T) >>> pair=factor(pdata$pair) >>> Treat=factor( pdata$treat) >>> Phenotype=factor(pdata$pheno) >>> group<-paste(Treat,Phenotype,sep=".") >>> design=model.matrix(~pair+Treat:Phenotype, data=counts) >>> counts.DGEList<-DGEList(counts, group=group) >>> y<-calcNormFactors(counts.DGEList) >>> y<-estimateCommonDisp(y, design) >>> y<-estimateGLMTrendedDisp(y, design) >>> >>> >>> Error message I get: >>> >>> Error in glmFit.default(y, design = design, dispersion = dispersion, >>> offset = offset, : >>> Design matrix not of full rank. The following coefficients not >>> estimable: >>> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >>> >>> >>> Any idea to solve this out? >>> >>> Thanks, >>> Preethy >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] edgeR_3.4.2 limma_3.18.13 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Jim, Thanks for the reply. I've tried that - with a slighlty modified code. I am sorry. But, I'm getting error again. Here: pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", "After"),45), levels=c("Before", "After")) Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2), each=2),levels=c("1","2")) design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) colnames(design) counts.DGEList<-DGEList(counts, group=Treat) y<-calcNormFactors(counts.DGEList) y<-estimateCommonDisp(y, design) y<-estimateGLMTrendedDisp(y, design) The error message here: > y<-estimateGLMTrendedDisp(y, design) Error in return(NA, ntags) : multi-argument returns are not permitted In addition: Warning message: In estimateGLMTrendedDisp.default(y = y$counts, design = design, : No residual df: cannot estimate dispersion One reason I can see is that the number of columns in my design matrix is "92" as they are more replicates/patients than in the 3.5 example. And there are only "90" columns in my "count" matrix. But, do you have any idea how can I solve this ? Thanks, Preethy On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Preethy, > > This experiment is very similar to the example in part 3.5 of the edgeR > User's guide, starting on page 31. > > Best, > > Jim > > > > On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > >> Hi Devon, >> >> Thanks for the replies both on biostars and here. Sorry for >> crossposting. Both >> mailing lists and discussion have been very helpful to me. >> >> But, I rarely see replies from BioC package maintainers at biostars. >> >> All the samples are paired and I have them all correct - I mean none of >> them are empty >> I tried different designs. But ending up with the same error message. >> What I want to do: Getting DE genes between >> (Phenotype1.Before-Phenotype1.After) & (Phenotype2.Before- Phenotype2. >> After) >> >> pdata here: >> >> pair=rep(c(1:45), each=2) >> Treat=rep(c("Before", "After"),45) >> Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, >> 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, >> 2, >> 2), each=2) >> pdata<-data.frame(pair,Treat,Phenotype) >> >> >> Preethy >> >> >> >> On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com> wrote: >> >> Hi Preethy, >>> >>> You likely want: >>> >>> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >>> >>> If that still yields the error, then you'll need to share "pdata" or >>> "design". Also, please don't crosspost on both this list and biostars ( >>> https://www.biostars.org/p/98907/), it duplicates the community effort. >>> >>> Devon >>> >>> >>> -- >>> Devon Ryan, Ph.D. >>> Email: dpryan@dpryan.com >>> Laboratory for Molecular and Cellular Cognition >>> German Centre for Neurodegenerative Diseases (DZNE) >>> Ludwig-Erhard-Allee 2 >>> 53175 Bonn >>> Germany >>> <devon.ryan@dzne.de> >>> >>> >>> >>> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] < >>> guest@bioconductor.org>wrote: >>> >>> Hi All, >>>> >>>> I had been trying to do DE analysis of my RNAseq experiment using edgeR >>>> and am having some isssues. The details of the Experiment and the R >>>> code I >>>> tried below: >>>> >>>> (a) Paired experimental design with 45 pairs >>>> (b) Treatment: "Before" and "After" >>>> (c) Phenotype: 1 & 2 >>>> Aim: Look for DE genes between Phenotype 1 and 2 upon treatment taking >>>> into account the paired design >>>> >>>> The R code tried: >>>> >>>> library(edgeR) >>>> counts<-read.delim(file="counts.dat",header=T) >>>> pair=factor(pdata$pair) >>>> Treat=factor( pdata$treat) >>>> Phenotype=factor(pdata$pheno) >>>> group<-paste(Treat,Phenotype,sep=".") >>>> design=model.matrix(~pair+Treat:Phenotype, data=counts) >>>> counts.DGEList<-DGEList(counts, group=group) >>>> y<-calcNormFactors(counts.DGEList) >>>> y<-estimateCommonDisp(y, design) >>>> y<-estimateGLMTrendedDisp(y, design) >>>> >>>> >>>> Error message I get: >>>> >>>> Error in glmFit.default(y, design = design, dispersion = dispersion, >>>> offset = offset, : >>>> Design matrix not of full rank. The following coefficients not >>>> estimable: >>>> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >>>> >>>> >>>> Any idea to solve this out? >>>> >>>> Thanks, >>>> Preethy >>>> >>>> -- output of sessionInfo(): >>>> >>>> R version 3.1.0 (2014-04-10) >>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>>> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] edgeR_3.4.2 limma_3.18.13 >>>> >>>> >>>> -- >>>> Sent via the guest posting facility at bioconductor.org. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Preethy, You need to re-read that section of edgeR, and in particular look at the patient column of the revised targets frame. Best, Jim On 4/25/2014 12:46 PM, Preethy Venkat Ram wrote: > Hi Jim, > > Thanks for the reply. I've tried that - with a slighlty modified > code. I am sorry. But, I'm getting error again. > > Here: > > pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", > "After"),45), levels=c("Before", "After")) > Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, > 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, > 2, 1, 2, 2, 1, 2, 2), each=2),levels=c("1","2")) > design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) > colnames(design) > counts.DGEList<-DGEList(counts, group=Treat) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > The error message here: > > > y<-estimateGLMTrendedDisp(y, design) > Error in return(NA, ntags) : multi-argument returns are not permitted > In addition: Warning message: > In estimateGLMTrendedDisp.default(y = y$counts, design = design, : > No residual df: cannot estimate dispersion > > > One reason I can see is that the number of columns in my design matrix > is "92" as they are more replicates/patients than in the 3.5 > example. And there are only "90" columns in my "count" matrix. > > But, do you have any idea how can I solve this ? > > Thanks, > Preethy > > > > > > On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Preethy, > > This experiment is very similar to the example in part 3.5 of the > edgeR User's guide, starting on page 31. > > Best, > > Jim > > > > On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: > > Hi Devon, > > Thanks for the replies both on biostars and here. Sorry for > crossposting. Both > mailing lists and discussion have been very helpful to me. > > But, I rarely see replies from BioC package maintainers at > biostars. > > All the samples are paired and I have them all correct - I > mean none of > them are empty > I tried different designs. But ending up with the same error > message. > What I want to do: Getting DE genes between > (Phenotype1.Before-Phenotype1.After) & > (Phenotype2.Before-Phenotype2.After) > > pdata here: > > pair=rep(c(1:45), each=2) > Treat=rep(c("Before", "After"),45) > Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, > 2, 2, 2, 2, > 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, > 1, 2, 2, 1, 2, > 2), each=2) > pdata<-data.frame(pair,Treat,Phenotype) > > > Preethy > > > > On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan at="" dpryan.com=""> <mailto:dpryan at="" dpryan.com="">> wrote: > > Hi Preethy, > > You likely want: > > design=model.matrix(~pair+Treat:Phenotype, data=pdata) > > If that still yields the error, then you'll need to share > "pdata" or > "design". Also, please don't crosspost on both this list > and biostars ( > https://www.biostars.org/p/98907/), it duplicates the > community effort. > > Devon > > > -- > Devon Ryan, Ph.D. > Email: dpryan at dpryan.com <mailto:dpryan at="" dpryan.com=""> > Laboratory for Molecular and Cellular Cognition > German Centre for Neurodegenerative Diseases (DZNE) > Ludwig-Erhard-Allee 2 > 53175 Bonn > Germany > <devon.ryan at="" dzne.de="" <mailto:devon.ryan="" at="" dzne.de="">> > > > > On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] > <guest at="" bioconductor.org="" <mailto:guest="" at="" bioconductor.org="">>wrote: > > Hi All, > > I had been trying to do DE analysis of my RNAseq > experiment using edgeR > and am having some isssues. The details of the > Experiment and the R code I > tried below: > > (a) Paired experimental design with 45 pairs > (b) Treatment: "Before" and "After" > (c) Phenotype: 1 & 2 > Aim: Look for DE genes between Phenotype 1 and 2 upon > treatment taking > into account the paired design > > The R code tried: > > library(edgeR) > counts<-read.delim(file="counts.dat",header=T) > pair=factor(pdata$pair) > Treat=factor( pdata$treat) > Phenotype=factor(pdata$pheno) > group<-paste(Treat,Phenotype,sep=".") > design=model.matrix(~pair+Treat:Phenotype, data=counts) > counts.DGEList<-DGEList(counts, group=group) > y<-calcNormFactors(counts.DGEList) > y<-estimateCommonDisp(y, design) > y<-estimateGLMTrendedDisp(y, design) > > > Error message I get: > > Error in glmFit.default(y, design = design, dispersion > = dispersion, > offset = offset, : > Design matrix not of full rank. The following > coefficients not > estimable: > TreatBefore:Phenotype1 TreatBefore:Phenotype2 > > > Any idea to solve this out? > > Thanks, > Preethy > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > > -- > Sent via the guest posting facility at > bioconductor.org <http: bioconductor.org="">. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Jim and All, Thanks really a lot for bringing that to my attention. It didn't work this way as well as I had different number of samples for each Phenotype unlike the edgeR example which had 3 patients for each "Disease" category. But, I took off the empty columns from the design matrix and it worked. Is the analysis done correctly this way ? I have one more doubt - Would it be possible to account for batch effect too with this type of design matrix ? Like this: design=model.matrix(~Phenotype+Phenotype:pair+ Phenotype:Treat+Phenotype:batch) Thanks, Preethy On Fri, Apr 25, 2014 at 8:00 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Preethy, > > You need to re-read that section of edgeR, and in particular look at the > patient column of the revised targets frame. > > Best, > > Jim > > > > On 4/25/2014 12:46 PM, Preethy Venkat Ram wrote: > >> Hi Jim, >> >> Thanks for the reply. I've tried that - with a slighlty modified code. I >> am sorry. But, I'm getting error again. >> >> Here: >> >> pair=factor(rep(c(1:45), each=2)).Treat=factor( rep(c("Before", >> "After"),45), levels=c("Before", "After")) >> Phenotype=factor(rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, >> 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, >> 2, 1, 2, 2), each=2),levels=c("1","2")) >> design=model.matrix(~Phenotype+Phenotype:pair+Phenotype:Treat) >> colnames(design) >> counts.DGEList<-DGEList(counts, group=Treat) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> The error message here: >> >> > y<-estimateGLMTrendedDisp(y, design) >> Error in return(NA, ntags) : multi-argument returns are not permitted >> In addition: Warning message: >> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >> No residual df: cannot estimate dispersion >> >> >> One reason I can see is that the number of columns in my design matrix >> is "92" as they are more replicates/patients than in the 3.5 example. And >> there are only "90" columns in my "count" matrix. >> >> But, do you have any idea how can I solve this ? >> >> Thanks, >> Preethy >> >> >> >> >> >> On Fri, Apr 25, 2014 at 6:45 PM, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi Preethy, >> >> This experiment is very similar to the example in part 3.5 of the >> edgeR User's guide, starting on page 31. >> >> Best, >> >> Jim >> >> >> >> On 4/25/2014 11:29 AM, Preethy Venkat Ram wrote: >> >> Hi Devon, >> >> Thanks for the replies both on biostars and here. Sorry for >> crossposting. Both >> mailing lists and discussion have been very helpful to me. >> >> But, I rarely see replies from BioC package maintainers at >> biostars. >> >> All the samples are paired and I have them all correct - I >> mean none of >> them are empty >> I tried different designs. But ending up with the same error >> message. >> What I want to do: Getting DE genes between >> (Phenotype1.Before-Phenotype1.After) & >> (Phenotype2.Before-Phenotype2.After) >> >> pdata here: >> >> pair=rep(c(1:45), each=2) >> Treat=rep(c("Before", "After"),45) >> Phenotype=rep(c(1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, >> 2, 2, 2, 2, >> 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, >> 1, 2, 2, 1, 2, >> 2), each=2) >> pdata<-data.frame(pair,Treat,Phenotype) >> >> >> Preethy >> >> >> >> On Fri, Apr 25, 2014 at 4:53 PM, Devon Ryan <dpryan@dpryan.com>> <mailto:dpryan@dpryan.com>> wrote: >> >> Hi Preethy, >> >> You likely want: >> >> design=model.matrix(~pair+Treat:Phenotype, data=pdata) >> >> If that still yields the error, then you'll need to share >> "pdata" or >> "design". Also, please don't crosspost on both this list >> and biostars ( >> https://www.biostars.org/p/98907/), it duplicates the >> community effort. >> >> Devon >> >> >> -- >> Devon Ryan, Ph.D. >> Email: dpryan@dpryan.com <mailto:dpryan@dpryan.com> >> >> Laboratory for Molecular and Cellular Cognition >> German Centre for Neurodegenerative Diseases (DZNE) >> Ludwig-Erhard-Allee 2 >> 53175 Bonn >> Germany >> <devon.ryan@dzne.de <mailto:devon.ryan@dzne.de="">> >> >> >> >> >> On Fri, Apr 25, 2014 at 11:15 AM, Preethy [guest] >> <guest@bioconductor.org <mailto:guest@bioconductor.org="">> >>wrote: >> >> >> Hi All, >> >> I had been trying to do DE analysis of my RNAseq >> experiment using edgeR >> and am having some isssues. The details of the >> Experiment and the R code I >> tried below: >> >> (a) Paired experimental design with 45 pairs >> (b) Treatment: "Before" and "After" >> (c) Phenotype: 1 & 2 >> Aim: Look for DE genes between Phenotype 1 and 2 upon >> treatment taking >> into account the paired design >> >> The R code tried: >> >> library(edgeR) >> counts<-read.delim(file="counts.dat",header=T) >> pair=factor(pdata$pair) >> Treat=factor( pdata$treat) >> Phenotype=factor(pdata$pheno) >> group<-paste(Treat,Phenotype,sep=".") >> design=model.matrix(~pair+Treat:Phenotype, data=counts) >> counts.DGEList<-DGEList(counts, group=group) >> y<-calcNormFactors(counts.DGEList) >> y<-estimateCommonDisp(y, design) >> y<-estimateGLMTrendedDisp(y, design) >> >> >> Error message I get: >> >> Error in glmFit.default(y, design = design, dispersion >> = dispersion, >> offset = offset, : >> Design matrix not of full rank. The following >> coefficients not >> estimable: >> TreatBefore:Phenotype1 TreatBefore:Phenotype2 >> >> >> Any idea to solve this out? >> >> Thanks, >> Preethy >> >> -- output of sessionInfo(): >> >> R version 3.1.0 (2014-04-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] edgeR_3.4.2 limma_3.18.13 >> >> >> -- >> Sent via the guest posting facility at >> bioconductor.org <http: bioconductor.org="">. >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> <mailto:bioconductor@r-project.org> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics. >> conductor >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6