Question

edgeR: design matrix for different condition

0

Entering edit mode

KJ Lim ▴ 420

@kj-lim-5288

Last seen 5.3 years ago

Finland

Dear Prof Gordon, Good day. Thanks for your time to update the edgeR User's Guide. It is useful. I combined all my experiment factors into one combined factor and described the experiment matrix like the example in the User's guide: > Group <- factor(paste(targets$treat, targets$tree,sep=".")) > cbind(targets,Group=Group) > hl.design <- model.matrix(~0+Group) But, I encountered an error when I performed the estimate dispersion for my data > hl <- estimateGLMCommonDisp(hl, hl.design) Warning message: In estimateGLMCommonDisp.default(y = y$counts, design = design, : No residual df: setting dispersion to NA I tried to figure out what I have done wrong, unfortunately, I have no luck on that. Could you or the community kindly please light me for this matter? Thank you very much for your time. > sessionInfo() R version 2.15.1 (2012-06-22) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.6.10 limma_3.12.1 loaded via a namespace (and not attached): [1] tcltk_2.15.1 tools_2.15.1 Best regards, KJ Lim On 31 July 2012 02:24, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear KH Kim, > > No, you have not understood me correctly. I did not suggest that you change > from edgeR to limma. I suggested that you read the limma documentation > because the design matrix is the same for edgeR as it is for limma, so the > limma documentation would help you create the design matrix for your edgeR > analysis. > > I thought from your last email that you had already done this and that you > had completed the edgeR analysis satisfactorily. > > I wrote more documentation for the edgeR User's Guide a couple of days ago, > trying to give advice for experiments such as yours. You might find that > Section 3.3 of > > http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/ edgeRUsersGuide.pdf > > gives more explanation. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > Tel: (03) 9345 2326, Fax (03) 9347 0852, > http://www.statsci.org/smyth > > > On Mon, 30 Jul 2012, KJ Lim wrote: > >> Dear Prof Gordon, >> >> Good day. >> >> I have read the section 8.5 of the Limma manual as you suggested in >> previous emails. Thanks. >> >> If I understand you correctly, you would suggest me to carry out my DE >> analysis with limma package if I would like to learn the which genes >> are express in treeHS compare to treeLS (vice versa) at i.e. 24H. >> >> May I ask how can I generate the EList, "eset", in order to fit in the >> linear model as mentioned in the limma manual >> >> fit <- lmFit(eset, design) >> fit <- eBayes(fit) >> >> Please correct me if I'm wrong. >> >> Thank you very much for your time and help. >> >> Best regards, >> KJ Lim >> >> >> >> On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> >>> Dear KJ Lim, >>> >>> Thanks for the rephrasing, which is clearer. I would have liked you to >>> mention however whether you read the documentation that I refered you do. >>> >>> >>> On Thu, 26 Jul 2012, KJ Lim wrote: >>> >>>> Dear Prof Gordon, >>>> >>>> Good day. Thanks for your prompt replied. >>>> >>>> Please allow me to rephrase my previous question. >>>> >>>> This model, ~tree+treat, is assumed effect of the time of treatment >>>> will be the same irrespective of genotype. >>> >>> >>> >>> Yes, this model makes that assumption. >>> >>> >>>> Thus, test for the coef=3 will gives me the differential expression at >>>> 96H >>>> irrespective of genotype. >>> >>> >>> >>> Actually coef=3 refers to 24H, according to the design matrix in your >>> original email. A test for coef=3 would test for DE at 24H vs 0H, >>> irrespective of genotype. But only if the assumption mentioned above is >>> true, which is unlikely given the rest of your email. >>> >>> >>>> I would like to learn the differential expression of treeHS vs treeLS >>>> at specific time points i.e. 24H or if possible across the whole time >>>> points. Should I fit my model as >>>> >>>> model A: ~tree*treat OR model B: ~tree+tree:treat ? >>> >>> >>> >>> These models are equivalent, so it is just a matter of convenience which >>> one >>> you use, as I tried to explain in the limma documentation I refered you >>> to. >>> >>> >>>> The design matrix columns of model B give: >>>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >>>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >>>> "treeLS:treat96H" >>>> >>>> Am I doing right if I fit the model B and test for the coef=3:4 to >>>> learn the differential expression of treeHS vs treeLS at specific time >>>> points i.e. 03H? Does this test could tells me which set of genes >>>> up/down-regulated in treeLS or treeHS? >>> >>> >>> >>> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS. >>> So >>> testing for both coefficients coef=3:4 tests for a 3H effect in either >>> treeLS or treeHS. >>> >>> Best wishes >>> Gordon >>> >>> >>>> I'm not good in statistic and I'm learning, kindly please correct me >>>> if I'm wrong. >>>> >>>> Thank you very much for your time and suggestion. >>>> >>>> Best regards, >>>> KJ Lim >>>> >>>> >>>> >>>> >>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>> >>>>> >>>>> >>>>> Dear KJ Lim, >>>>> >>>>> I don't quite understand your question, because you seem to be asking >>>>> for >>>>> something that isn't a test for differential expression, which is what >>>>> edgeR >>>>> does. You question "I would like to learn the genes are express in >>>>> treeHS >>>>> but not treeLS and vice versa" seems to be about absolute expression >>>>> levels. >>>>> edgeR doesn't test for genes that are not expressed in particular >>>>> conditions. >>>>> >>>>> I'll refer you to the limma section on interaction models in case that >>>>> helps, see Section 8.5 of: >>>>> >>>>> >>>>> >>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/ doc/usersguide.pdf >>>>> >>>>> Setting up a design matrix is the same for edgeR as it is for limma. >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>>>> Subject: [BioC] edgeR: design matrix for different condition >>>>>> >>>>>> Dear the edgeR community, >>>>>> >>>>>> Good day. >>>>>> >>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >>>>>> 24H, 96H). >>>>>> >>>>>> I first assumed that the treatment effect will be the same for each >>>>>> genotype and I have the design matrix as: >>>>>> >>>>>> design <- model.matrix(~tree+treat) >>>>>> >>>>>>> design >>>>>> >>>>>> >>>>>> >>>>>> (Intercept) treat03H treat24H treat96H treeLS >>>>>> 1 1 0 0 0 0 >>>>>> 2 1 0 0 0 0 >>>>>> 3 1 1 0 0 0 >>>>>> 4 1 1 0 0 0 >>>>>> 5 1 0 1 0 0 >>>>>> 6 1 0 1 0 0 >>>>>> 7 1 0 0 1 0 >>>>>> 8 1 0 0 1 0 >>>>>> 9 1 0 0 0 1 >>>>>> 10 1 0 0 0 1 >>>>>> 11 1 1 0 0 1 >>>>>> 12 1 1 0 0 1 >>>>>> 13 1 0 1 0 1 >>>>>> 14 1 0 1 0 1 >>>>>> 15 1 0 0 1 1 >>>>>> 16 1 0 0 1 1 >>>>>> >>>>>> I used coef=4 to test for the differential expressions between treeHS >>>>>> and treeLS within the time of treatment, coef=3 to learn the >>>>>> differential expressions in 2 genotypes at time of treatment 96H. >>>>>> >>>>>> ** I would like to learn the genes are express in treeHS but not >>>>>> treeLS and vice versa at certain time of treatment let's say 24H or >>>>>> across the whole time of treatment, should I have the design matrix as >>>>>> below or more steps I need to do? >>>>>> >>>>>> design <- model.matrix(~tree*treat) >>>>>> >>>>>> Kindly please light me on this. I appreciate very much for your help >>>>>> and >>>>>> time. >>>>>> >>>>>> Have a nice day. >>>>>> >>>>>> Best regards, >>>>>> KJ Lim >>> >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the >>> addressee. >>> You must not disclose, forward, print or use it without the permission of >>> the sender. >>> ______________________________________________________________________ >> >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

limma edgeR limma edgeR • 1.7k views

ADD COMMENT • link updated 13.4 years ago by Gordon Smyth 53k • written 13.4 years ago by KJ Lim ▴ 420

score 0 · Answer 1 · 2012-08-06

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 22 hours ago

WEHI, Melbourne, Australia

The error message is pretty self-explanatory. Apparently your design matrix has as many columns as there are libraries, so there are no degrees of freedom left from which to estimate variability. According to the code and output you have given, this message should be impossible, so you must have changed the data in some way that I cannot see. If you had shown the output from cbind(targets,Group=Group), then I might have been able to say something useful. Best wishes Gordon On Mon, 6 Aug 2012, KJ Lim wrote: > Dear Prof Gordon, > > Good day. > > Thanks for your time to update the edgeR User's Guide. It is useful. > > I combined all my experiment factors into one combined factor and > described the experiment matrix like the example in the User's guide: > > > Group <- factor(paste(targets$treat, targets$tree,sep=".")) > > cbind(targets,Group=Group) > > hl.design <- model.matrix(~0+Group) > > But, I encountered an error when I performed the estimate dispersion > for my data > > > hl <- estimateGLMCommonDisp(hl, hl.design) > Warning message: > In estimateGLMCommonDisp.default(y = y$counts, design = design, : > No residual df: setting dispersion to NA > > I tried to figure out what I have done wrong, unfortunately, I have no > luck on that. Could you or the community kindly please light me for > this matter? > > Thank you very much for your time. > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: i486-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.6.10 limma_3.12.1 > > loaded via a namespace (and not attached): > [1] tcltk_2.15.1 tools_2.15.1 > > Best regards, > KJ Lim > > > > On 31 July 2012 02:24, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Dear KH Kim, >> >> No, you have not understood me correctly. I did not suggest that you change >> from edgeR to limma. I suggested that you read the limma documentation >> because the design matrix is the same for edgeR as it is for limma, so the >> limma documentation would help you create the design matrix for your edgeR >> analysis. >> >> I thought from your last email that you had already done this and that you >> had completed the edgeR analysis satisfactorily. >> >> I wrote more documentation for the edgeR User's Guide a couple of days ago, >> trying to give advice for experiments such as yours. You might find that >> Section 3.3 of >> >> http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc /edgeRUsersGuide.pdf >> >> gives more explanation. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> Tel: (03) 9345 2326, Fax (03) 9347 0852, >> http://www.statsci.org/smyth >> >> >> On Mon, 30 Jul 2012, KJ Lim wrote: >> >>> Dear Prof Gordon, >>> >>> Good day. >>> >>> I have read the section 8.5 of the Limma manual as you suggested in >>> previous emails. Thanks. >>> >>> If I understand you correctly, you would suggest me to carry out my DE >>> analysis with limma package if I would like to learn the which genes >>> are express in treeHS compare to treeLS (vice versa) at i.e. 24H. >>> >>> May I ask how can I generate the EList, "eset", in order to fit in the >>> linear model as mentioned in the limma manual >>> >>> fit <- lmFit(eset, design) >>> fit <- eBayes(fit) >>> >>> Please correct me if I'm wrong. >>> >>> Thank you very much for your time and help. >>> >>> Best regards, >>> KJ Lim >>> >>> >>> >>> On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>> >>>> Dear KJ Lim, >>>> >>>> Thanks for the rephrasing, which is clearer. I would have liked you to >>>> mention however whether you read the documentation that I refered you do. >>>> >>>> >>>> On Thu, 26 Jul 2012, KJ Lim wrote: >>>> >>>>> Dear Prof Gordon, >>>>> >>>>> Good day. Thanks for your prompt replied. >>>>> >>>>> Please allow me to rephrase my previous question. >>>>> >>>>> This model, ~tree+treat, is assumed effect of the time of treatment >>>>> will be the same irrespective of genotype. >>>> >>>> >>>> >>>> Yes, this model makes that assumption. >>>> >>>> >>>>> Thus, test for the coef=3 will gives me the differential expression at >>>>> 96H >>>>> irrespective of genotype. >>>> >>>> >>>> >>>> Actually coef=3 refers to 24H, according to the design matrix in your >>>> original email. A test for coef=3 would test for DE at 24H vs 0H, >>>> irrespective of genotype. But only if the assumption mentioned above is >>>> true, which is unlikely given the rest of your email. >>>> >>>> >>>>> I would like to learn the differential expression of treeHS vs treeLS >>>>> at specific time points i.e. 24H or if possible across the whole time >>>>> points. Should I fit my model as >>>>> >>>>> model A: ~tree*treat OR model B: ~tree+tree:treat ? >>>> >>>> >>>> >>>> These models are equivalent, so it is just a matter of convenience which >>>> one >>>> you use, as I tried to explain in the limma documentation I refered you >>>> to. >>>> >>>> >>>>> The design matrix columns of model B give: >>>>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >>>>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >>>>> "treeLS:treat96H" >>>>> >>>>> Am I doing right if I fit the model B and test for the coef=3:4 to >>>>> learn the differential expression of treeHS vs treeLS at specific time >>>>> points i.e. 03H? Does this test could tells me which set of genes >>>>> up/down-regulated in treeLS or treeHS? >>>> >>>> >>>> >>>> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS. >>>> So >>>> testing for both coefficients coef=3:4 tests for a 3H effect in either >>>> treeLS or treeHS. >>>> >>>> Best wishes >>>> Gordon >>>> >>>> >>>>> I'm not good in statistic and I'm learning, kindly please correct me >>>>> if I'm wrong. >>>>> >>>>> Thank you very much for your time and suggestion. >>>>> >>>>> Best regards, >>>>> KJ Lim >>>>> >>>>> >>>>> >>>>> >>>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Dear KJ Lim, >>>>>> >>>>>> I don't quite understand your question, because you seem to be asking >>>>>> for >>>>>> something that isn't a test for differential expression, which is what >>>>>> edgeR >>>>>> does. You question "I would like to learn the genes are express in >>>>>> treeHS >>>>>> but not treeLS and vice versa" seems to be about absolute expression >>>>>> levels. >>>>>> edgeR doesn't test for genes that are not expressed in particular >>>>>> conditions. >>>>>> >>>>>> I'll refer you to the limma section on interaction models in case that >>>>>> helps, see Section 8.5 of: >>>>>> >>>>>> >>>>>> >>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst /doc/usersguide.pdf >>>>>> >>>>>> Setting up a design matrix is the same for edgeR as it is for limma. >>>>>> >>>>>> Best wishes >>>>>> Gordon >>>>>> >>>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>>>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>>>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>>>>> Subject: [BioC] edgeR: design matrix for different condition >>>>>>> >>>>>>> Dear the edgeR community, >>>>>>> >>>>>>> Good day. >>>>>>> >>>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >>>>>>> 24H, 96H). >>>>>>> >>>>>>> I first assumed that the treatment effect will be the same for each >>>>>>> genotype and I have the design matrix as: >>>>>>> >>>>>>> design <- model.matrix(~tree+treat) >>>>>>> >>>>>>>> design >>>>>>> >>>>>>> >>>>>>> >>>>>>> (Intercept) treat03H treat24H treat96H treeLS >>>>>>> 1 1 0 0 0 0 >>>>>>> 2 1 0 0 0 0 >>>>>>> 3 1 1 0 0 0 >>>>>>> 4 1 1 0 0 0 >>>>>>> 5 1 0 1 0 0 >>>>>>> 6 1 0 1 0 0 >>>>>>> 7 1 0 0 1 0 >>>>>>> 8 1 0 0 1 0 >>>>>>> 9 1 0 0 0 1 >>>>>>> 10 1 0 0 0 1 >>>>>>> 11 1 1 0 0 1 >>>>>>> 12 1 1 0 0 1 >>>>>>> 13 1 0 1 0 1 >>>>>>> 14 1 0 1 0 1 >>>>>>> 15 1 0 0 1 1 >>>>>>> 16 1 0 0 1 1 >>>>>>> >>>>>>> I used coef=4 to test for the differential expressions between treeHS >>>>>>> and treeLS within the time of treatment, coef=3 to learn the >>>>>>> differential expressions in 2 genotypes at time of treatment 96H. >>>>>>> >>>>>>> ** I would like to learn the genes are express in treeHS but not >>>>>>> treeLS and vice versa at certain time of treatment let's say 24H or >>>>>>> across the whole time of treatment, should I have the design matrix as >>>>>>> below or more steps I need to do? >>>>>>> >>>>>>> design <- model.matrix(~tree*treat) >>>>>>> >>>>>>> Kindly please light me on this. I appreciate very much for your help >>>>>>> and >>>>>>> time. >>>>>>> >>>>>>> Have a nice day. >>>>>>> >>>>>>> Best regards, >>>>>>> KJ Lim ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.4 years ago Gordon Smyth 53k

0

Entering edit mode

Dear Prof Gordon, Good day. Thanks for you prompt replied. Below is the output of cbind(targets,Group=Group) > cbind(targets,Group=Group) files treat tree X Group 1 ./rawData/HS0H01.txt 00H H1 NA 00H.H1 2 ./rawData/HS0H02.txt 00H H2 NA 00H.H2 3 ./rawData/HS3H01.txt 03H H3 NA 03H.H3 4 ./rawData/HS3H02.txt 03H H4 NA 03H.H4 5 ./rawData/HS1D01.txt 24H H5 NA 24H.H5 6 ./rawData/HS1D02.txt 24H H6 NA 24H.H6 7 ./rawData/HS4D01.txt 96H H7 NA 96H.H7 8 ./rawData/HS4D02.txt 96H H8 NA 96H.H8 9 ./rawData/LS0H01.txt 00H L1 NA 00H.L1 10 ./rawData/LS0H02.txt 00H L2 NA 00H.L2 11 ./rawData/LS3H01.txt 03H L3 NA 03H.L3 12 ./rawData/LS3H02.txt 03H L4 NA 03H.L4 13 ./rawData/LS1D01.txt 24H L5 NA 24H.L5 14 ./rawData/LS1D02.txt 24H L6 NA 24H.L6 15 ./rawData/LS4D01.txt 96H L7 NA 96H.L7 16 ./rawData/LS4D02.txt 96H L8 NA 96H.L8 Thank you for your time. Best regards, KJ Lim On 6 August 2012 07:33, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > The error message is pretty self-explanatory. Apparently your design matrix > has as many columns as there are libraries, so there are no degrees of > freedom left from which to estimate variability. According to the code and > output you have given, this message should be impossible, so you must have > changed the data in some way that I cannot see. > > If you had shown the output from cbind(targets,Group=Group), then I might > have been able to say something useful. > > Best wishes > Gordon > > > On Mon, 6 Aug 2012, KJ Lim wrote: > >> Dear Prof Gordon, >> >> Good day. >> >> Thanks for your time to update the edgeR User's Guide. It is useful. >> >> I combined all my experiment factors into one combined factor and >> described the experiment matrix like the example in the User's guide: >> >> > Group <- factor(paste(targets$treat, targets$tree,sep=".")) >> > cbind(targets,Group=Group) >> > hl.design <- model.matrix(~0+Group) >> >> But, I encountered an error when I performed the estimate dispersion >> for my data >> >> > hl <- estimateGLMCommonDisp(hl, hl.design) >> Warning message: >> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >> No residual df: setting dispersion to NA >> >> I tried to figure out what I have done wrong, unfortunately, I have no >> luck on that. Could you or the community kindly please light me for >> this matter? >> >> Thank you very much for your time. >> >>> sessionInfo() >> >> R version 2.15.1 (2012-06-22) >> Platform: i486-pc-linux-gnu (32-bit) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] edgeR_2.6.10 limma_3.12.1 >> >> loaded via a namespace (and not attached): >> [1] tcltk_2.15.1 tools_2.15.1 >> >> Best regards, >> KJ Lim >> >> >> >> On 31 July 2012 02:24, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> >>> Dear KH Kim, >>> >>> No, you have not understood me correctly. I did not suggest that you >>> change >>> from edgeR to limma. I suggested that you read the limma documentation >>> because the design matrix is the same for edgeR as it is for limma, so >>> the >>> limma documentation would help you create the design matrix for your >>> edgeR >>> analysis. >>> >>> I thought from your last email that you had already done this and that >>> you >>> had completed the edgeR analysis satisfactorily. >>> >>> I wrote more documentation for the edgeR User's Guide a couple of days >>> ago, >>> trying to give advice for experiments such as yours. You might find that >>> Section 3.3 of >>> >>> >>> http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/do c/edgeRUsersGuide.pdf >>> >>> gives more explanation. >>> >>> Best wishes >>> Gordon >>> >>> --------------------------------------------- >>> Professor Gordon K Smyth, >>> Bioinformatics Division, >>> Walter and Eliza Hall Institute of Medical Research, >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>> Tel: (03) 9345 2326, Fax (03) 9347 0852, >>> http://www.statsci.org/smyth >>> >>> >>> On Mon, 30 Jul 2012, KJ Lim wrote: >>> >>>> Dear Prof Gordon, >>>> >>>> Good day. >>>> >>>> I have read the section 8.5 of the Limma manual as you suggested in >>>> previous emails. Thanks. >>>> >>>> If I understand you correctly, you would suggest me to carry out my DE >>>> analysis with limma package if I would like to learn the which genes >>>> are express in treeHS compare to treeLS (vice versa) at i.e. 24H. >>>> >>>> May I ask how can I generate the EList, "eset", in order to fit in the >>>> linear model as mentioned in the limma manual >>>> >>>> fit <- lmFit(eset, design) >>>> fit <- eBayes(fit) >>>> >>>> Please correct me if I'm wrong. >>>> >>>> Thank you very much for your time and help. >>>> >>>> Best regards, >>>> KJ Lim >>>> >>>> >>>> >>>> On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>> >>>>> >>>>> Dear KJ Lim, >>>>> >>>>> Thanks for the rephrasing, which is clearer. I would have liked you to >>>>> mention however whether you read the documentation that I refered you >>>>> do. >>>>> >>>>> >>>>> On Thu, 26 Jul 2012, KJ Lim wrote: >>>>> >>>>>> Dear Prof Gordon, >>>>>> >>>>>> Good day. Thanks for your prompt replied. >>>>>> >>>>>> Please allow me to rephrase my previous question. >>>>>> >>>>>> This model, ~tree+treat, is assumed effect of the time of treatment >>>>>> will be the same irrespective of genotype. >>>>> >>>>> >>>>> >>>>> >>>>> Yes, this model makes that assumption. >>>>> >>>>> >>>>>> Thus, test for the coef=3 will gives me the differential expression at >>>>>> 96H >>>>>> irrespective of genotype. >>>>> >>>>> >>>>> >>>>> >>>>> Actually coef=3 refers to 24H, according to the design matrix in your >>>>> original email. A test for coef=3 would test for DE at 24H vs 0H, >>>>> irrespective of genotype. But only if the assumption mentioned above >>>>> is >>>>> true, which is unlikely given the rest of your email. >>>>> >>>>> >>>>>> I would like to learn the differential expression of treeHS vs treeLS >>>>>> at specific time points i.e. 24H or if possible across the whole time >>>>>> points. Should I fit my model as >>>>>> >>>>>> model A: ~tree*treat OR model B: ~tree+tree:treat ? >>>>> >>>>> >>>>> >>>>> >>>>> These models are equivalent, so it is just a matter of convenience >>>>> which >>>>> one >>>>> you use, as I tried to explain in the limma documentation I refered you >>>>> to. >>>>> >>>>> >>>>>> The design matrix columns of model B give: >>>>>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >>>>>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >>>>>> "treeLS:treat96H" >>>>>> >>>>>> Am I doing right if I fit the model B and test for the coef=3:4 to >>>>>> learn the differential expression of treeHS vs treeLS at specific time >>>>>> points i.e. 03H? Does this test could tells me which set of genes >>>>>> up/down-regulated in treeLS or treeHS? >>>>> >>>>> >>>>> >>>>> >>>>> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS. >>>>> So >>>>> testing for both coefficients coef=3:4 tests for a 3H effect in either >>>>> treeLS or treeHS. >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>> >>>>>> I'm not good in statistic and I'm learning, kindly please correct me >>>>>> if I'm wrong. >>>>>> >>>>>> Thank you very much for your time and suggestion. >>>>>> >>>>>> Best regards, >>>>>> KJ Lim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dear KJ Lim, >>>>>>> >>>>>>> I don't quite understand your question, because you seem to be asking >>>>>>> for >>>>>>> something that isn't a test for differential expression, which is >>>>>>> what >>>>>>> edgeR >>>>>>> does. You question "I would like to learn the genes are express in >>>>>>> treeHS >>>>>>> but not treeLS and vice versa" seems to be about absolute expression >>>>>>> levels. >>>>>>> edgeR doesn't test for genes that are not expressed in particular >>>>>>> conditions. >>>>>>> >>>>>>> I'll refer you to the limma section on interaction models in case >>>>>>> that >>>>>>> helps, see Section 8.5 of: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/ins t/doc/usersguide.pdf >>>>>>> >>>>>>> Setting up a design matrix is the same for edgeR as it is for limma. >>>>>>> >>>>>>> Best wishes >>>>>>> Gordon >>>>>>> >>>>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>>>>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>>>>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>>>>>> Subject: [BioC] edgeR: design matrix for different condition >>>>>>>> >>>>>>>> Dear the edgeR community, >>>>>>>> >>>>>>>> Good day. >>>>>>>> >>>>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >>>>>>>> 24H, 96H). >>>>>>>> >>>>>>>> I first assumed that the treatment effect will be the same for each >>>>>>>> genotype and I have the design matrix as: >>>>>>>> >>>>>>>> design <- model.matrix(~tree+treat) >>>>>>>> >>>>>>>>> design >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> (Intercept) treat03H treat24H treat96H treeLS >>>>>>>> 1 1 0 0 0 0 >>>>>>>> 2 1 0 0 0 0 >>>>>>>> 3 1 1 0 0 0 >>>>>>>> 4 1 1 0 0 0 >>>>>>>> 5 1 0 1 0 0 >>>>>>>> 6 1 0 1 0 0 >>>>>>>> 7 1 0 0 1 0 >>>>>>>> 8 1 0 0 1 0 >>>>>>>> 9 1 0 0 0 1 >>>>>>>> 10 1 0 0 0 1 >>>>>>>> 11 1 1 0 0 1 >>>>>>>> 12 1 1 0 0 1 >>>>>>>> 13 1 0 1 0 1 >>>>>>>> 14 1 0 1 0 1 >>>>>>>> 15 1 0 0 1 1 >>>>>>>> 16 1 0 0 1 1 >>>>>>>> >>>>>>>> I used coef=4 to test for the differential expressions between >>>>>>>> treeHS >>>>>>>> and treeLS within the time of treatment, coef=3 to learn the >>>>>>>> differential expressions in 2 genotypes at time of treatment 96H. >>>>>>>> >>>>>>>> ** I would like to learn the genes are express in treeHS but not >>>>>>>> treeLS and vice versa at certain time of treatment let's say 24H or >>>>>>>> across the whole time of treatment, should I have the design matrix >>>>>>>> as >>>>>>>> below or more steps I need to do? >>>>>>>> >>>>>>>> design <- model.matrix(~tree*treat) >>>>>>>> >>>>>>>> Kindly please light me on this. I appreciate very much for your help >>>>>>>> and >>>>>>>> time. >>>>>>>> >>>>>>>> Have a nice day. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> KJ Lim > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 13.4 years ago KJ Lim ▴ 420

0

Entering edit mode

Dear KJ Lim, Well, your "tree" column has 16 different entries, a different entry for every library. Naturally this causes a problem. So either you have no replication of any experimental condition, or you have mistakingly used different labels for the same condition. In your earlier email below, you claimed to have just two genotypes (tree=HS and tree=LS), but in your latest targets file you have 16 different genotypes. I'm guessing that the genotypes are actually H and L, so the entries in the targets file should be just H and L. That's as much advice as I have time to give you, and I hope that others will help you further. I feel that I've given you good general advice that is applicable to your experiment. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, http://www.statsci.org/smyth On Mon, 6 Aug 2012, KJ Lim wrote: > Dear Prof Gordon, > > Good day. Thanks for you prompt replied. > > Below is the output of cbind(targets,Group=Group) > >> cbind(targets,Group=Group) > files treat tree X Group > 1 ./rawData/HS0H01.txt 00H H1 NA 00H.H1 > 2 ./rawData/HS0H02.txt 00H H2 NA 00H.H2 > 3 ./rawData/HS3H01.txt 03H H3 NA 03H.H3 > 4 ./rawData/HS3H02.txt 03H H4 NA 03H.H4 > 5 ./rawData/HS1D01.txt 24H H5 NA 24H.H5 > 6 ./rawData/HS1D02.txt 24H H6 NA 24H.H6 > 7 ./rawData/HS4D01.txt 96H H7 NA 96H.H7 > 8 ./rawData/HS4D02.txt 96H H8 NA 96H.H8 > 9 ./rawData/LS0H01.txt 00H L1 NA 00H.L1 > 10 ./rawData/LS0H02.txt 00H L2 NA 00H.L2 > 11 ./rawData/LS3H01.txt 03H L3 NA 03H.L3 > 12 ./rawData/LS3H02.txt 03H L4 NA 03H.L4 > 13 ./rawData/LS1D01.txt 24H L5 NA 24H.L5 > 14 ./rawData/LS1D02.txt 24H L6 NA 24H.L6 > 15 ./rawData/LS4D01.txt 96H L7 NA 96H.L7 > 16 ./rawData/LS4D02.txt 96H L8 NA 96H.L8 > > Thank you for your time. > > Best regards, > KJ Lim > > > On 6 August 2012 07:33, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> The error message is pretty self-explanatory. Apparently your design matrix >> has as many columns as there are libraries, so there are no degrees of >> freedom left from which to estimate variability. According to the code and >> output you have given, this message should be impossible, so you must have >> changed the data in some way that I cannot see. >> >> If you had shown the output from cbind(targets,Group=Group), then I might >> have been able to say something useful. >> >> Best wishes >> Gordon >> >> >> On Mon, 6 Aug 2012, KJ Lim wrote: >> >>> Dear Prof Gordon, >>> >>> Good day. >>> >>> Thanks for your time to update the edgeR User's Guide. It is useful. >>> >>> I combined all my experiment factors into one combined factor and >>> described the experiment matrix like the example in the User's guide: >>> >>> > Group <- factor(paste(targets$treat, targets$tree,sep=".")) >>> > cbind(targets,Group=Group) >>> > hl.design <- model.matrix(~0+Group) >>> >>> But, I encountered an error when I performed the estimate dispersion >>> for my data >>> >>> > hl <- estimateGLMCommonDisp(hl, hl.design) >>> Warning message: >>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>> No residual df: setting dispersion to NA >>> >>> I tried to figure out what I have done wrong, unfortunately, I have no >>> luck on that. Could you or the community kindly please light me for >>> this matter? >>> >>> Thank you very much for your time. >>> >>>> sessionInfo() >>> >>> R version 2.15.1 (2012-06-22) >>> Platform: i486-pc-linux-gnu (32-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] edgeR_2.6.10 limma_3.12.1 >>> >>> loaded via a namespace (and not attached): >>> [1] tcltk_2.15.1 tools_2.15.1 >>> >>> Best regards, >>> KJ Lim >>> >>> >>> >>> On 31 July 2012 02:24, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>> >>>> Dear KH Kim, >>>> >>>> No, you have not understood me correctly. I did not suggest that you >>>> change >>>> from edgeR to limma. I suggested that you read the limma documentation >>>> because the design matrix is the same for edgeR as it is for limma, so >>>> the >>>> limma documentation would help you create the design matrix for your >>>> edgeR >>>> analysis. >>>> >>>> I thought from your last email that you had already done this and that >>>> you >>>> had completed the edgeR analysis satisfactorily. >>>> >>>> I wrote more documentation for the edgeR User's Guide a couple of days >>>> ago, >>>> trying to give advice for experiments such as yours. You might find that >>>> Section 3.3 of >>>> >>>> >>>> http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/d oc/edgeRUsersGuide.pdf >>>> >>>> gives more explanation. >>>> >>>> Best wishes >>>> Gordon >>>> >>>> --------------------------------------------- >>>> Professor Gordon K Smyth, >>>> Bioinformatics Division, >>>> Walter and Eliza Hall Institute of Medical Research, >>>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>>> Tel: (03) 9345 2326, Fax (03) 9347 0852, >>>> http://www.statsci.org/smyth >>>> >>>> >>>> On Mon, 30 Jul 2012, KJ Lim wrote: >>>> >>>>> Dear Prof Gordon, >>>>> >>>>> Good day. >>>>> >>>>> I have read the section 8.5 of the Limma manual as you suggested in >>>>> previous emails. Thanks. >>>>> >>>>> If I understand you correctly, you would suggest me to carry out my DE >>>>> analysis with limma package if I would like to learn the which genes >>>>> are express in treeHS compare to treeLS (vice versa) at i.e. 24H. >>>>> >>>>> May I ask how can I generate the EList, "eset", in order to fit in the >>>>> linear model as mentioned in the limma manual >>>>> >>>>> fit <- lmFit(eset, design) >>>>> fit <- eBayes(fit) >>>>> >>>>> Please correct me if I'm wrong. >>>>> >>>>> Thank you very much for your time and help. >>>>> >>>>> Best regards, >>>>> KJ Lim >>>>> >>>>> >>>>> >>>>> On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>> >>>>>> >>>>>> Dear KJ Lim, >>>>>> >>>>>> Thanks for the rephrasing, which is clearer. I would have liked you to >>>>>> mention however whether you read the documentation that I refered you >>>>>> do. >>>>>> >>>>>> >>>>>> On Thu, 26 Jul 2012, KJ Lim wrote: >>>>>> >>>>>>> Dear Prof Gordon, >>>>>>> >>>>>>> Good day. Thanks for your prompt replied. >>>>>>> >>>>>>> Please allow me to rephrase my previous question. >>>>>>> >>>>>>> This model, ~tree+treat, is assumed effect of the time of treatment >>>>>>> will be the same irrespective of genotype. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Yes, this model makes that assumption. >>>>>> >>>>>> >>>>>>> Thus, test for the coef=3 will gives me the differential expression at >>>>>>> 96H >>>>>>> irrespective of genotype. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Actually coef=3 refers to 24H, according to the design matrix in your >>>>>> original email. A test for coef=3 would test for DE at 24H vs 0H, >>>>>> irrespective of genotype. But only if the assumption mentioned above >>>>>> is >>>>>> true, which is unlikely given the rest of your email. >>>>>> >>>>>> >>>>>>> I would like to learn the differential expression of treeHS vs treeLS >>>>>>> at specific time points i.e. 24H or if possible across the whole time >>>>>>> points. Should I fit my model as >>>>>>> >>>>>>> model A: ~tree*treat OR model B: ~tree+tree:treat ? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> These models are equivalent, so it is just a matter of convenience >>>>>> which >>>>>> one >>>>>> you use, as I tried to explain in the limma documentation I refered you >>>>>> to. >>>>>> >>>>>> >>>>>>> The design matrix columns of model B give: >>>>>>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >>>>>>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >>>>>>> "treeLS:treat96H" >>>>>>> >>>>>>> Am I doing right if I fit the model B and test for the coef=3:4 to >>>>>>> learn the differential expression of treeHS vs treeLS at specific time >>>>>>> points i.e. 03H? Does this test could tells me which set of genes >>>>>>> up/down-regulated in treeLS or treeHS? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in LS. >>>>>> So >>>>>> testing for both coefficients coef=3:4 tests for a 3H effect in either >>>>>> treeLS or treeHS. >>>>>> >>>>>> Best wishes >>>>>> Gordon >>>>>> >>>>>> >>>>>>> I'm not good in statistic and I'm learning, kindly please correct me >>>>>>> if I'm wrong. >>>>>>> >>>>>>> Thank you very much for your time and suggestion. >>>>>>> >>>>>>> Best regards, >>>>>>> KJ Lim >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Dear KJ Lim, >>>>>>>> >>>>>>>> I don't quite understand your question, because you seem to be asking >>>>>>>> for >>>>>>>> something that isn't a test for differential expression, which is >>>>>>>> what >>>>>>>> edgeR >>>>>>>> does. You question "I would like to learn the genes are express in >>>>>>>> treeHS >>>>>>>> but not treeLS and vice versa" seems to be about absolute expression >>>>>>>> levels. >>>>>>>> edgeR doesn't test for genes that are not expressed in particular >>>>>>>> conditions. >>>>>>>> >>>>>>>> I'll refer you to the limma section on interaction models in case >>>>>>>> that >>>>>>>> helps, see Section 8.5 of: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/in st/doc/usersguide.pdf >>>>>>>> >>>>>>>> Setting up a design matrix is the same for edgeR as it is for limma. >>>>>>>> >>>>>>>> Best wishes >>>>>>>> Gordon >>>>>>>> >>>>>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>>>>>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>>>>>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>>>>>>> Subject: [BioC] edgeR: design matrix for different condition >>>>>>>>> >>>>>>>>> Dear the edgeR community, >>>>>>>>> >>>>>>>>> Good day. >>>>>>>>> >>>>>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>>>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, >>>>>>>>> 24H, 96H). >>>>>>>>> >>>>>>>>> I first assumed that the treatment effect will be the same for each >>>>>>>>> genotype and I have the design matrix as: >>>>>>>>> >>>>>>>>> design <- model.matrix(~tree+treat) >>>>>>>>> >>>>>>>>>> design >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> (Intercept) treat03H treat24H treat96H treeLS >>>>>>>>> 1 1 0 0 0 0 >>>>>>>>> 2 1 0 0 0 0 >>>>>>>>> 3 1 1 0 0 0 >>>>>>>>> 4 1 1 0 0 0 >>>>>>>>> 5 1 0 1 0 0 >>>>>>>>> 6 1 0 1 0 0 >>>>>>>>> 7 1 0 0 1 0 >>>>>>>>> 8 1 0 0 1 0 >>>>>>>>> 9 1 0 0 0 1 >>>>>>>>> 10 1 0 0 0 1 >>>>>>>>> 11 1 1 0 0 1 >>>>>>>>> 12 1 1 0 0 1 >>>>>>>>> 13 1 0 1 0 1 >>>>>>>>> 14 1 0 1 0 1 >>>>>>>>> 15 1 0 0 1 1 >>>>>>>>> 16 1 0 0 1 1 >>>>>>>>> >>>>>>>>> I used coef=4 to test for the differential expressions between >>>>>>>>> treeHS >>>>>>>>> and treeLS within the time of treatment, coef=3 to learn the >>>>>>>>> differential expressions in 2 genotypes at time of treatment 96H. >>>>>>>>> >>>>>>>>> ** I would like to learn the genes are express in treeHS but not >>>>>>>>> treeLS and vice versa at certain time of treatment let's say 24H or >>>>>>>>> across the whole time of treatment, should I have the design matrix >>>>>>>>> as >>>>>>>>> below or more steps I need to do? >>>>>>>>> >>>>>>>>> design <- model.matrix(~tree*treat) >>>>>>>>> >>>>>>>>> Kindly please light me on this. I appreciate very much for your help >>>>>>>>> and >>>>>>>>> time. >>>>>>>>> >>>>>>>>> Have a nice day. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> KJ Lim ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 13.4 years ago Gordon Smyth 53k

0

Entering edit mode

Dear Prof Gordon, Thank you very much for your prompt replied. You are right about the genotype, I have two genotypes for my experiment. In this case, the mistake I made was label them differently. I will correct them and continue with the estimate dispersion as soon as I back to office. Thanks a lot for your advice, your time as well as your help. I appreciate that very much. Have a nice day. Best regards, KJ Lim On 6 August 2012 10:11, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear KJ Lim, > > Well, your "tree" column has 16 different entries, a different entry for > every library. Naturally this causes a problem. > > So either you have no replication of any experimental condition, or you have > mistakingly used different labels for the same condition. > > In your earlier email below, you claimed to have just two genotypes (tree=HS > and tree=LS), but in your latest targets file you have 16 different > genotypes. I'm guessing that the genotypes are actually H and L, so the > entries in the targets file should be just H and L. > > That's as much advice as I have time to give you, and I hope that others > will help you further. I feel that I've given you good general advice that > is applicable to your experiment. > > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > Tel: (03) 9345 2326, Fax (03) 9347 0852, > http://www.statsci.org/smyth > > On Mon, 6 Aug 2012, KJ Lim wrote: > >> Dear Prof Gordon, >> >> Good day. Thanks for you prompt replied. >> >> Below is the output of cbind(targets,Group=Group) >> >>> cbind(targets,Group=Group) >> >> files treat tree X Group >> 1 ./rawData/HS0H01.txt 00H H1 NA 00H.H1 >> 2 ./rawData/HS0H02.txt 00H H2 NA 00H.H2 >> 3 ./rawData/HS3H01.txt 03H H3 NA 03H.H3 >> 4 ./rawData/HS3H02.txt 03H H4 NA 03H.H4 >> 5 ./rawData/HS1D01.txt 24H H5 NA 24H.H5 >> 6 ./rawData/HS1D02.txt 24H H6 NA 24H.H6 >> 7 ./rawData/HS4D01.txt 96H H7 NA 96H.H7 >> 8 ./rawData/HS4D02.txt 96H H8 NA 96H.H8 >> 9 ./rawData/LS0H01.txt 00H L1 NA 00H.L1 >> 10 ./rawData/LS0H02.txt 00H L2 NA 00H.L2 >> 11 ./rawData/LS3H01.txt 03H L3 NA 03H.L3 >> 12 ./rawData/LS3H02.txt 03H L4 NA 03H.L4 >> 13 ./rawData/LS1D01.txt 24H L5 NA 24H.L5 >> 14 ./rawData/LS1D02.txt 24H L6 NA 24H.L6 >> 15 ./rawData/LS4D01.txt 96H L7 NA 96H.L7 >> 16 ./rawData/LS4D02.txt 96H L8 NA 96H.L8 >> >> Thank you for your time. >> >> Best regards, >> KJ Lim >> >> >> On 6 August 2012 07:33, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> >>> The error message is pretty self-explanatory. Apparently your design >>> matrix >>> has as many columns as there are libraries, so there are no degrees of >>> freedom left from which to estimate variability. According to the code >>> and >>> output you have given, this message should be impossible, so you must >>> have >>> changed the data in some way that I cannot see. >>> >>> If you had shown the output from cbind(targets,Group=Group), then I might >>> have been able to say something useful. >>> >>> Best wishes >>> Gordon >>> >>> >>> On Mon, 6 Aug 2012, KJ Lim wrote: >>> >>>> Dear Prof Gordon, >>>> >>>> Good day. >>>> >>>> Thanks for your time to update the edgeR User's Guide. It is useful. >>>> >>>> I combined all my experiment factors into one combined factor and >>>> described the experiment matrix like the example in the User's guide: >>>> >>>> > Group <- factor(paste(targets$treat, targets$tree,sep=".")) >>>> > cbind(targets,Group=Group) >>>> > hl.design <- model.matrix(~0+Group) >>>> >>>> But, I encountered an error when I performed the estimate dispersion >>>> for my data >>>> >>>> > hl <- estimateGLMCommonDisp(hl, hl.design) >>>> Warning message: >>>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>>> No residual df: setting dispersion to NA >>>> >>>> I tried to figure out what I have done wrong, unfortunately, I have no >>>> luck on that. Could you or the community kindly please light me for >>>> this matter? >>>> >>>> Thank you very much for your time. >>>> >>>>> sessionInfo() >>>> >>>> >>>> R version 2.15.1 (2012-06-22) >>>> Platform: i486-pc-linux-gnu (32-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >>>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >>>> [7] LC_PAPER=C LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] edgeR_2.6.10 limma_3.12.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] tcltk_2.15.1 tools_2.15.1 >>>> >>>> Best regards, >>>> KJ Lim >>>> >>>> >>>> >>>> On 31 July 2012 02:24, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>> >>>>> >>>>> Dear KH Kim, >>>>> >>>>> No, you have not understood me correctly. I did not suggest that you >>>>> change >>>>> from edgeR to limma. I suggested that you read the limma documentation >>>>> because the design matrix is the same for edgeR as it is for limma, so >>>>> the >>>>> limma documentation would help you create the design matrix for your >>>>> edgeR >>>>> analysis. >>>>> >>>>> I thought from your last email that you had already done this and that >>>>> you >>>>> had completed the edgeR analysis satisfactorily. >>>>> >>>>> I wrote more documentation for the edgeR User's Guide a couple of days >>>>> ago, >>>>> trying to give advice for experiments such as yours. You might find >>>>> that >>>>> Section 3.3 of >>>>> >>>>> >>>>> >>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/ doc/edgeRUsersGuide.pdf >>>>> >>>>> gives more explanation. >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>> --------------------------------------------- >>>>> Professor Gordon K Smyth, >>>>> Bioinformatics Division, >>>>> Walter and Eliza Hall Institute of Medical Research, >>>>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>>>> Tel: (03) 9345 2326, Fax (03) 9347 0852, >>>>> http://www.statsci.org/smyth >>>>> >>>>> >>>>> On Mon, 30 Jul 2012, KJ Lim wrote: >>>>> >>>>>> Dear Prof Gordon, >>>>>> >>>>>> Good day. >>>>>> >>>>>> I have read the section 8.5 of the Limma manual as you suggested in >>>>>> previous emails. Thanks. >>>>>> >>>>>> If I understand you correctly, you would suggest me to carry out my DE >>>>>> analysis with limma package if I would like to learn the which genes >>>>>> are express in treeHS compare to treeLS (vice versa) at i.e. 24H. >>>>>> >>>>>> May I ask how can I generate the EList, "eset", in order to fit in the >>>>>> linear model as mentioned in the limma manual >>>>>> >>>>>> fit <- lmFit(eset, design) >>>>>> fit <- eBayes(fit) >>>>>> >>>>>> Please correct me if I'm wrong. >>>>>> >>>>>> Thank you very much for your time and help. >>>>>> >>>>>> Best regards, >>>>>> KJ Lim >>>>>> >>>>>> >>>>>> >>>>>> On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dear KJ Lim, >>>>>>> >>>>>>> Thanks for the rephrasing, which is clearer. I would have liked you >>>>>>> to >>>>>>> mention however whether you read the documentation that I refered you >>>>>>> do. >>>>>>> >>>>>>> >>>>>>> On Thu, 26 Jul 2012, KJ Lim wrote: >>>>>>> >>>>>>>> Dear Prof Gordon, >>>>>>>> >>>>>>>> Good day. Thanks for your prompt replied. >>>>>>>> >>>>>>>> Please allow me to rephrase my previous question. >>>>>>>> >>>>>>>> This model, ~tree+treat, is assumed effect of the time of >>>>>>>> treatment >>>>>>>> will be the same irrespective of genotype. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Yes, this model makes that assumption. >>>>>>> >>>>>>> >>>>>>>> Thus, test for the coef=3 will gives me the differential expression >>>>>>>> at >>>>>>>> 96H >>>>>>>> irrespective of genotype. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Actually coef=3 refers to 24H, according to the design matrix in your >>>>>>> original email. A test for coef=3 would test for DE at 24H vs 0H, >>>>>>> irrespective of genotype. But only if the assumption mentioned above >>>>>>> is >>>>>>> true, which is unlikely given the rest of your email. >>>>>>> >>>>>>> >>>>>>>> I would like to learn the differential expression of treeHS vs >>>>>>>> treeLS >>>>>>>> at specific time points i.e. 24H or if possible across the whole >>>>>>>> time >>>>>>>> points. Should I fit my model as >>>>>>>> >>>>>>>> model A: ~tree*treat OR model B: ~tree+tree:treat ? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> These models are equivalent, so it is just a matter of convenience >>>>>>> which >>>>>>> one >>>>>>> you use, as I tried to explain in the limma documentation I refered >>>>>>> you >>>>>>> to. >>>>>>> >>>>>>> >>>>>>>> The design matrix columns of model B give: >>>>>>>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >>>>>>>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >>>>>>>> "treeLS:treat96H" >>>>>>>> >>>>>>>> Am I doing right if I fit the model B and test for the coef=3:4 to >>>>>>>> learn the differential expression of treeHS vs treeLS at specific >>>>>>>> time >>>>>>>> points i.e. 03H? Does this test could tells me which set of genes >>>>>>>> up/down-regulated in treeLS or treeHS? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in >>>>>>> LS. >>>>>>> So >>>>>>> testing for both coefficients coef=3:4 tests for a 3H effect in >>>>>>> either >>>>>>> treeLS or treeHS. >>>>>>> >>>>>>> Best wishes >>>>>>> Gordon >>>>>>> >>>>>>> >>>>>>>> I'm not good in statistic and I'm learning, kindly please correct >>>>>>>> me >>>>>>>> if I'm wrong. >>>>>>>> >>>>>>>> Thank you very much for your time and suggestion. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> KJ Lim >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Dear KJ Lim, >>>>>>>>> >>>>>>>>> I don't quite understand your question, because you seem to be >>>>>>>>> asking >>>>>>>>> for >>>>>>>>> something that isn't a test for differential expression, which is >>>>>>>>> what >>>>>>>>> edgeR >>>>>>>>> does. You question "I would like to learn the genes are express in >>>>>>>>> treeHS >>>>>>>>> but not treeLS and vice versa" seems to be about absolute >>>>>>>>> expression >>>>>>>>> levels. >>>>>>>>> edgeR doesn't test for genes that are not expressed in particular >>>>>>>>> conditions. >>>>>>>>> >>>>>>>>> I'll refer you to the limma section on interaction models in case >>>>>>>>> that >>>>>>>>> helps, see Section 8.5 of: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/i nst/doc/usersguide.pdf >>>>>>>>> >>>>>>>>> Setting up a design matrix is the same for edgeR as it is for >>>>>>>>> limma. >>>>>>>>> >>>>>>>>> Best wishes >>>>>>>>> Gordon >>>>>>>>> >>>>>>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>>>>>>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>>>>>>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>>>>>>>> Subject: [BioC] edgeR: design matrix for different condition >>>>>>>>>> >>>>>>>>>> Dear the edgeR community, >>>>>>>>>> >>>>>>>>>> Good day. >>>>>>>>>> >>>>>>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>>>>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, >>>>>>>>>> 3H, >>>>>>>>>> 24H, 96H). >>>>>>>>>> >>>>>>>>>> I first assumed that the treatment effect will be the same for >>>>>>>>>> each >>>>>>>>>> genotype and I have the design matrix as: >>>>>>>>>> >>>>>>>>>> design <- model.matrix(~tree+treat) >>>>>>>>>> >>>>>>>>>>> design >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (Intercept) treat03H treat24H treat96H treeLS >>>>>>>>>> 1 1 0 0 0 0 >>>>>>>>>> 2 1 0 0 0 0 >>>>>>>>>> 3 1 1 0 0 0 >>>>>>>>>> 4 1 1 0 0 0 >>>>>>>>>> 5 1 0 1 0 0 >>>>>>>>>> 6 1 0 1 0 0 >>>>>>>>>> 7 1 0 0 1 0 >>>>>>>>>> 8 1 0 0 1 0 >>>>>>>>>> 9 1 0 0 0 1 >>>>>>>>>> 10 1 0 0 0 1 >>>>>>>>>> 11 1 1 0 0 1 >>>>>>>>>> 12 1 1 0 0 1 >>>>>>>>>> 13 1 0 1 0 1 >>>>>>>>>> 14 1 0 1 0 1 >>>>>>>>>> 15 1 0 0 1 1 >>>>>>>>>> 16 1 0 0 1 1 >>>>>>>>>> >>>>>>>>>> I used coef=4 to test for the differential expressions between >>>>>>>>>> treeHS >>>>>>>>>> and treeLS within the time of treatment, coef=3 to learn the >>>>>>>>>> differential expressions in 2 genotypes at time of treatment 96H. >>>>>>>>>> >>>>>>>>>> ** I would like to learn the genes are express in treeHS but not >>>>>>>>>> treeLS and vice versa at certain time of treatment let's say 24H >>>>>>>>>> or >>>>>>>>>> across the whole time of treatment, should I have the design >>>>>>>>>> matrix >>>>>>>>>> as >>>>>>>>>> below or more steps I need to do? >>>>>>>>>> >>>>>>>>>> design <- model.matrix(~tree*treat) >>>>>>>>>> >>>>>>>>>> Kindly please light me on this. I appreciate very much for your >>>>>>>>>> help >>>>>>>>>> and >>>>>>>>>> time. >>>>>>>>>> >>>>>>>>>> Have a nice day. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> KJ Lim > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 13.4 years ago KJ Lim ▴ 420

0

Entering edit mode

Dear Prof Gordon, Thanks for the correction, you are right about the genotype label. I managed to carry out the estimate dispersion for my data. Thank you so much for your time and help. Best regards, KJ Lim On 6 August 2012 10:32, KJ Lim <jinkeanlim at="" gmail.com=""> wrote: > Dear Prof Gordon, > > Thank you very much for your prompt replied. > > You are right about the genotype, I have two genotypes for my > experiment. In this case, the mistake I made was label them > differently. I will correct them and continue with the estimate > dispersion as soon as I back to office. > > Thanks a lot for your advice, your time as well as your help. I > appreciate that very much. > > Have a nice day. > > Best regards, > KJ Lim > > > On 6 August 2012 10:11, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Dear KJ Lim, >> >> Well, your "tree" column has 16 different entries, a different entry for >> every library. Naturally this causes a problem. >> >> So either you have no replication of any experimental condition, or you have >> mistakingly used different labels for the same condition. >> >> In your earlier email below, you claimed to have just two genotypes (tree=HS >> and tree=LS), but in your latest targets file you have 16 different >> genotypes. I'm guessing that the genotypes are actually H and L, so the >> entries in the targets file should be just H and L. >> >> That's as much advice as I have time to give you, and I hope that others >> will help you further. I feel that I've given you good general advice that >> is applicable to your experiment. >> >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> Tel: (03) 9345 2326, Fax (03) 9347 0852, >> http://www.statsci.org/smyth >> >> On Mon, 6 Aug 2012, KJ Lim wrote: >> >>> Dear Prof Gordon, >>> >>> Good day. Thanks for you prompt replied. >>> >>> Below is the output of cbind(targets,Group=Group) >>> >>>> cbind(targets,Group=Group) >>> >>> files treat tree X Group >>> 1 ./rawData/HS0H01.txt 00H H1 NA 00H.H1 >>> 2 ./rawData/HS0H02.txt 00H H2 NA 00H.H2 >>> 3 ./rawData/HS3H01.txt 03H H3 NA 03H.H3 >>> 4 ./rawData/HS3H02.txt 03H H4 NA 03H.H4 >>> 5 ./rawData/HS1D01.txt 24H H5 NA 24H.H5 >>> 6 ./rawData/HS1D02.txt 24H H6 NA 24H.H6 >>> 7 ./rawData/HS4D01.txt 96H H7 NA 96H.H7 >>> 8 ./rawData/HS4D02.txt 96H H8 NA 96H.H8 >>> 9 ./rawData/LS0H01.txt 00H L1 NA 00H.L1 >>> 10 ./rawData/LS0H02.txt 00H L2 NA 00H.L2 >>> 11 ./rawData/LS3H01.txt 03H L3 NA 03H.L3 >>> 12 ./rawData/LS3H02.txt 03H L4 NA 03H.L4 >>> 13 ./rawData/LS1D01.txt 24H L5 NA 24H.L5 >>> 14 ./rawData/LS1D02.txt 24H L6 NA 24H.L6 >>> 15 ./rawData/LS4D01.txt 96H L7 NA 96H.L7 >>> 16 ./rawData/LS4D02.txt 96H L8 NA 96H.L8 >>> >>> Thank you for your time. >>> >>> Best regards, >>> KJ Lim >>> >>> >>> On 6 August 2012 07:33, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>> >>>> The error message is pretty self-explanatory. Apparently your design >>>> matrix >>>> has as many columns as there are libraries, so there are no degrees of >>>> freedom left from which to estimate variability. According to the code >>>> and >>>> output you have given, this message should be impossible, so you must >>>> have >>>> changed the data in some way that I cannot see. >>>> >>>> If you had shown the output from cbind(targets,Group=Group), then I might >>>> have been able to say something useful. >>>> >>>> Best wishes >>>> Gordon >>>> >>>> >>>> On Mon, 6 Aug 2012, KJ Lim wrote: >>>> >>>>> Dear Prof Gordon, >>>>> >>>>> Good day. >>>>> >>>>> Thanks for your time to update the edgeR User's Guide. It is useful. >>>>> >>>>> I combined all my experiment factors into one combined factor and >>>>> described the experiment matrix like the example in the User's guide: >>>>> >>>>> > Group <- factor(paste(targets$treat, targets$tree,sep=".")) >>>>> > cbind(targets,Group=Group) >>>>> > hl.design <- model.matrix(~0+Group) >>>>> >>>>> But, I encountered an error when I performed the estimate dispersion >>>>> for my data >>>>> >>>>> > hl <- estimateGLMCommonDisp(hl, hl.design) >>>>> Warning message: >>>>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>>>> No residual df: setting dispersion to NA >>>>> >>>>> I tried to figure out what I have done wrong, unfortunately, I have no >>>>> luck on that. Could you or the community kindly please light me for >>>>> this matter? >>>>> >>>>> Thank you very much for your time. >>>>> >>>>>> sessionInfo() >>>>> >>>>> >>>>> R version 2.15.1 (2012-06-22) >>>>> Platform: i486-pc-linux-gnu (32-bit) >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >>>>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >>>>> [7] LC_PAPER=C LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] edgeR_2.6.10 limma_3.12.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] tcltk_2.15.1 tools_2.15.1 >>>>> >>>>> Best regards, >>>>> KJ Lim >>>>> >>>>> >>>>> >>>>> On 31 July 2012 02:24, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>> >>>>>> >>>>>> Dear KH Kim, >>>>>> >>>>>> No, you have not understood me correctly. I did not suggest that you >>>>>> change >>>>>> from edgeR to limma. I suggested that you read the limma documentation >>>>>> because the design matrix is the same for edgeR as it is for limma, so >>>>>> the >>>>>> limma documentation would help you create the design matrix for your >>>>>> edgeR >>>>>> analysis. >>>>>> >>>>>> I thought from your last email that you had already done this and that >>>>>> you >>>>>> had completed the edgeR analysis satisfactorily. >>>>>> >>>>>> I wrote more documentation for the edgeR User's Guide a couple of days >>>>>> ago, >>>>>> trying to give advice for experiments such as yours. You might find >>>>>> that >>>>>> Section 3.3 of >>>>>> >>>>>> >>>>>> >>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst /doc/edgeRUsersGuide.pdf >>>>>> >>>>>> gives more explanation. >>>>>> >>>>>> Best wishes >>>>>> Gordon >>>>>> >>>>>> --------------------------------------------- >>>>>> Professor Gordon K Smyth, >>>>>> Bioinformatics Division, >>>>>> Walter and Eliza Hall Institute of Medical Research, >>>>>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>>>>> Tel: (03) 9345 2326, Fax (03) 9347 0852, >>>>>> http://www.statsci.org/smyth >>>>>> >>>>>> >>>>>> On Mon, 30 Jul 2012, KJ Lim wrote: >>>>>> >>>>>>> Dear Prof Gordon, >>>>>>> >>>>>>> Good day. >>>>>>> >>>>>>> I have read the section 8.5 of the Limma manual as you suggested in >>>>>>> previous emails. Thanks. >>>>>>> >>>>>>> If I understand you correctly, you would suggest me to carry out my DE >>>>>>> analysis with limma package if I would like to learn the which genes >>>>>>> are express in treeHS compare to treeLS (vice versa) at i.e. 24H. >>>>>>> >>>>>>> May I ask how can I generate the EList, "eset", in order to fit in the >>>>>>> linear model as mentioned in the limma manual >>>>>>> >>>>>>> fit <- lmFit(eset, design) >>>>>>> fit <- eBayes(fit) >>>>>>> >>>>>>> Please correct me if I'm wrong. >>>>>>> >>>>>>> Thank you very much for your time and help. >>>>>>> >>>>>>> Best regards, >>>>>>> KJ Lim >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 27 July 2012 02:31, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Dear KJ Lim, >>>>>>>> >>>>>>>> Thanks for the rephrasing, which is clearer. I would have liked you >>>>>>>> to >>>>>>>> mention however whether you read the documentation that I refered you >>>>>>>> do. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, 26 Jul 2012, KJ Lim wrote: >>>>>>>> >>>>>>>>> Dear Prof Gordon, >>>>>>>>> >>>>>>>>> Good day. Thanks for your prompt replied. >>>>>>>>> >>>>>>>>> Please allow me to rephrase my previous question. >>>>>>>>> >>>>>>>>> This model, ~tree+treat, is assumed effect of the time of >>>>>>>>> treatment >>>>>>>>> will be the same irrespective of genotype. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Yes, this model makes that assumption. >>>>>>>> >>>>>>>> >>>>>>>>> Thus, test for the coef=3 will gives me the differential expression >>>>>>>>> at >>>>>>>>> 96H >>>>>>>>> irrespective of genotype. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Actually coef=3 refers to 24H, according to the design matrix in your >>>>>>>> original email. A test for coef=3 would test for DE at 24H vs 0H, >>>>>>>> irrespective of genotype. But only if the assumption mentioned above >>>>>>>> is >>>>>>>> true, which is unlikely given the rest of your email. >>>>>>>> >>>>>>>> >>>>>>>>> I would like to learn the differential expression of treeHS vs >>>>>>>>> treeLS >>>>>>>>> at specific time points i.e. 24H or if possible across the whole >>>>>>>>> time >>>>>>>>> points. Should I fit my model as >>>>>>>>> >>>>>>>>> model A: ~tree*treat OR model B: ~tree+tree:treat ? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> These models are equivalent, so it is just a matter of convenience >>>>>>>> which >>>>>>>> one >>>>>>>> you use, as I tried to explain in the limma documentation I refered >>>>>>>> you >>>>>>>> to. >>>>>>>> >>>>>>>> >>>>>>>>> The design matrix columns of model B give: >>>>>>>>> "(Intercept)" "treeLS" "treeHS:treat03H" "treeLS:treat03H" >>>>>>>>> "treeHS:treat24H" "treeLS:treat24H" "treeHS:treat96H" >>>>>>>>> "treeLS:treat96H" >>>>>>>>> >>>>>>>>> Am I doing right if I fit the model B and test for the coef=3:4 to >>>>>>>>> learn the differential expression of treeHS vs treeLS at specific >>>>>>>>> time >>>>>>>>> points i.e. 03H? Does this test could tells me which set of genes >>>>>>>>> up/down-regulated in treeLS or treeHS? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Yes. Coef 3 tests for a 3H effect in HS. Coef 4 tests for a 3H in >>>>>>>> LS. >>>>>>>> So >>>>>>>> testing for both coefficients coef=3:4 tests for a 3H effect in >>>>>>>> either >>>>>>>> treeLS or treeHS. >>>>>>>> >>>>>>>> Best wishes >>>>>>>> Gordon >>>>>>>> >>>>>>>> >>>>>>>>> I'm not good in statistic and I'm learning, kindly please correct >>>>>>>>> me >>>>>>>>> if I'm wrong. >>>>>>>>> >>>>>>>>> Thank you very much for your time and suggestion. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> KJ Lim >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Dear KJ Lim, >>>>>>>>>> >>>>>>>>>> I don't quite understand your question, because you seem to be >>>>>>>>>> asking >>>>>>>>>> for >>>>>>>>>> something that isn't a test for differential expression, which is >>>>>>>>>> what >>>>>>>>>> edgeR >>>>>>>>>> does. You question "I would like to learn the genes are express in >>>>>>>>>> treeHS >>>>>>>>>> but not treeLS and vice versa" seems to be about absolute >>>>>>>>>> expression >>>>>>>>>> levels. >>>>>>>>>> edgeR doesn't test for genes that are not expressed in particular >>>>>>>>>> conditions. >>>>>>>>>> >>>>>>>>>> I'll refer you to the limma section on interaction models in case >>>>>>>>>> that >>>>>>>>>> helps, see Section 8.5 of: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/ inst/doc/usersguide.pdf >>>>>>>>>> >>>>>>>>>> Setting up a design matrix is the same for edgeR as it is for >>>>>>>>>> limma. >>>>>>>>>> >>>>>>>>>> Best wishes >>>>>>>>>> Gordon >>>>>>>>>> >>>>>>>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300 >>>>>>>>>>> From: KJ Lim <jinkeanlim at="" gmail.com=""> >>>>>>>>>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>>>>>>>>> Subject: [BioC] edgeR: design matrix for different condition >>>>>>>>>>> >>>>>>>>>>> Dear the edgeR community, >>>>>>>>>>> >>>>>>>>>>> Good day. >>>>>>>>>>> >>>>>>>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2 >>>>>>>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, >>>>>>>>>>> 3H, >>>>>>>>>>> 24H, 96H). >>>>>>>>>>> >>>>>>>>>>> I first assumed that the treatment effect will be the same for >>>>>>>>>>> each >>>>>>>>>>> genotype and I have the design matrix as: >>>>>>>>>>> >>>>>>>>>>> design <- model.matrix(~tree+treat) >>>>>>>>>>> >>>>>>>>>>>> design >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> (Intercept) treat03H treat24H treat96H treeLS >>>>>>>>>>> 1 1 0 0 0 0 >>>>>>>>>>> 2 1 0 0 0 0 >>>>>>>>>>> 3 1 1 0 0 0 >>>>>>>>>>> 4 1 1 0 0 0 >>>>>>>>>>> 5 1 0 1 0 0 >>>>>>>>>>> 6 1 0 1 0 0 >>>>>>>>>>> 7 1 0 0 1 0 >>>>>>>>>>> 8 1 0 0 1 0 >>>>>>>>>>> 9 1 0 0 0 1 >>>>>>>>>>> 10 1 0 0 0 1 >>>>>>>>>>> 11 1 1 0 0 1 >>>>>>>>>>> 12 1 1 0 0 1 >>>>>>>>>>> 13 1 0 1 0 1 >>>>>>>>>>> 14 1 0 1 0 1 >>>>>>>>>>> 15 1 0 0 1 1 >>>>>>>>>>> 16 1 0 0 1 1 >>>>>>>>>>> >>>>>>>>>>> I used coef=4 to test for the differential expressions between >>>>>>>>>>> treeHS >>>>>>>>>>> and treeLS within the time of treatment, coef=3 to learn the >>>>>>>>>>> differential expressions in 2 genotypes at time of treatment 96H. >>>>>>>>>>> >>>>>>>>>>> ** I would like to learn the genes are express in treeHS but not >>>>>>>>>>> treeLS and vice versa at certain time of treatment let's say 24H >>>>>>>>>>> or >>>>>>>>>>> across the whole time of treatment, should I have the design >>>>>>>>>>> matrix >>>>>>>>>>> as >>>>>>>>>>> below or more steps I need to do? >>>>>>>>>>> >>>>>>>>>>> design <- model.matrix(~tree*treat) >>>>>>>>>>> >>>>>>>>>>> Kindly please light me on this. I appreciate very much for your >>>>>>>>>>> help >>>>>>>>>>> and >>>>>>>>>>> time. >>>>>>>>>>> >>>>>>>>>>> Have a nice day. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> KJ Lim >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________

ADD REPLY • link 13.4 years ago KJ Lim ▴ 420

0

Entering edit mode

Dear KJ Lim, The problem is that you are following an example from the User's Guide that goes with the developmental version of edgeR, a newer version of egeR which has not yet been released. The syntax for the glmLRT() function has been changed compared with the official release version of edgeR that you are using. For your version of edgeR you need glmLRT(y, hl.fit, contrast=hl.contrasts[,"HvsL.00"]) instead of glmLRT(hl.fit, contrast=hl.contrasts[,"HvsL.00"]) In other words, you can follow the examples from the developmental User's Guide, but you will need to add an extra argument to each glmLRT call. The first argument of glmLRT() needs to be the data object used to create the fit object for glmLRT() in your version of edgeR. Don't forget that you need to look at the help pages for the functions you are using, not just the User's Guide, because the help pages explain the correct syntax for each function. Best wishes Gordon ----------------- original message ------------------ [BioC] edgeR: Likelihood ratio test error KJ Lim jinkeanlim at gmail.com Tue Aug 7 15:58:10 CEST 2012 Dear edgeR users community, Good day. I'm analysing my RNA-Seq data which has 2 different genotypes (treeHS, treeLS) and time of treatment (0H, 3H, 24H, 96H) with edgeR. I would like to learn the differential expression of treeHS vs treeLS at specific time points i.e. 24H. Thanks to Prof Gordon and his colleagues, the latest edgeR user's guide (Chapter 3) is very useful for my case. I encountered an error message when I carried out the likelihood ratio test with either makeContrasts or coefficients approach. The error message: > hl.contrasts <- makeContrasts(HvsL.00 = H.00H-L.00H, HvsL.03 = (H.03H-H.00H)-(L.03H-L.00H),LvsH.00 = L.00H-H.00H,LvsH.03 = (L.03H-L.00H)-(H.03H-H.00H), levels=hl.design) > lrt.HvsL00 <- glmLRT(hl.fit, contrast=hl.contrasts[,"HvsL.00"]) Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : dims [product 0] do not match the length of object [11] ---------------------------------------------------------------------- ----------------- > colnames(hl.fit) [1] "H.00H" "H.03H" "H.24H" "H.96H" "L.00H" "L.03H" "L.24H" "L.96H" > lrt.HvsL00 <- glmLRT(hl.fit,contrast=c(1,0,0,0,-1,0,0,0)) Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : dims [product 0] do not match the length of object [11] Could the community kindly please light me on this matter? What correction I should make in this case? Thank you very much for your help. Best regards, KJ Lim > sessionInfo() R version 2.15.1 (2012-06-22) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] edgeR_2.6.10 limma_3.12.1 loaded via a namespace (and not attached): [1] tools_2.15.1 On 6 August 2012 10:11, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear KJ Lim, > > Well, your "tree" column has 16 different entries, a different entry for > every library. Naturally this causes a problem. > > So either you have no replication of any experimental condition, or you > have mistakingly used different labels for the same condition. > > In your earlier email below, you claimed to have just two genotypes > (tree=HS and tree=LS), but in your latest targets file you have 16 > different genotypes. I'm guessing that the genotypes are actually H and > L, so the entries in the targets file should be just H and L. > > That's as much advice as I have time to give you, and I hope that others > will help you further. I feel that I've given you good general advice > that is applicable to your experiment. > > > Best wishes Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > Tel: (03) 9345 2326, Fax (03) 9347 0852, > http://www.statsci.org/smyth > > On Mon, 6 Aug 2012, KJ Lim wrote: > >> Dear Prof Gordon, >> >> Good day. Thanks for you prompt replied. >> >> Below is the output of cbind(targets,Group=Group) >> >>> cbind(targets,Group=Group) >> >> files treat tree X Group >> 1 ./rawData/HS0H01.txt 00H H1 NA 00H.H1 >> 2 ./rawData/HS0H02.txt 00H H2 NA 00H.H2 >> 3 ./rawData/HS3H01.txt 03H H3 NA 03H.H3 >> 4 ./rawData/HS3H02.txt 03H H4 NA 03H.H4 >> 5 ./rawData/HS1D01.txt 24H H5 NA 24H.H5 >> 6 ./rawData/HS1D02.txt 24H H6 NA 24H.H6 >> 7 ./rawData/HS4D01.txt 96H H7 NA 96H.H7 >> 8 ./rawData/HS4D02.txt 96H H8 NA 96H.H8 >> 9 ./rawData/LS0H01.txt 00H L1 NA 00H.L1 >> 10 ./rawData/LS0H02.txt 00H L2 NA 00H.L2 >> 11 ./rawData/LS3H01.txt 03H L3 NA 03H.L3 >> 12 ./rawData/LS3H02.txt 03H L4 NA 03H.L4 >> 13 ./rawData/LS1D01.txt 24H L5 NA 24H.L5 >> 14 ./rawData/LS1D02.txt 24H L6 NA 24H.L6 >> 15 ./rawData/LS4D01.txt 96H L7 NA 96H.L7 >> 16 ./rawData/LS4D02.txt 96H L8 NA 96H.L8 >> >> Thank you for your time. >> >> Best regards, >> KJ Lim >> >> >> On 6 August 2012 07:33, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> >>> The error message is pretty self-explanatory. Apparently your design >>> matrix has as many columns as there are libraries, so there are no >>> degrees of freedom left from which to estimate variability. >>> According to the code and output you have given, this message should >>> be impossible, so you must have changed the data in some way that I >>> cannot see. >>> >>> If you had shown the output from cbind(targets,Group=Group), then I >>> might have been able to say something useful. >>> >>> Best wishes >>> Gordon >>> >>> >>> On Mon, 6 Aug 2012, KJ Lim wrote: >>> >>>> Dear Prof Gordon, >>>> >>>> Good day. >>>> >>>> Thanks for your time to update the edgeR User's Guide. It is useful. >>>> >>>> I combined all my experiment factors into one combined factor and >>>> described the experiment matrix like the example in the User's guide: >>>> >>>> > Group <- factor(paste(targets$treat, targets$tree,sep=".")) >>>> > cbind(targets,Group=Group) >>>> > hl.design <- model.matrix(~0+Group) >>>> >>>> But, I encountered an error when I performed the estimate dispersion >>>> for my data >>>> >>>> > hl <- estimateGLMCommonDisp(hl, hl.design) >>>> Warning message: >>>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>>> No residual df: setting dispersion to NA >>>> >>>> I tried to figure out what I have done wrong, unfortunately, I have >>>> no luck on that. Could you or the community kindly please light me >>>> for this matter? >>>> >>>> Thank you very much for your time. >>>> >>>>> sessionInfo() >>>> >>>> >>>> R version 2.15.1 (2012-06-22) >>>> Platform: i486-pc-linux-gnu (32-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >>>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >>>> [7] LC_PAPER=C LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] edgeR_2.6.10 limma_3.12.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] tcltk_2.15.1 tools_2.15.1 >>>> >>>> Best regards, >>>> KJ Lim ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 13.4 years ago Gordon Smyth 53k

0

Entering edit mode

Dear Prof Gordon, Good day. Thanks for your prompt replied. > Don't forget that you need to look at the help pages for the functions you are using, not just the User's Guide, because the help pages explain the correct syntax for each function. Thanks for pointing out this. I will keep that in mind. Thanks for your time and advice. Have a nice day. Best regards, KJ Lim Don't forget that you need to look at the help pages for the functions you are using, not just the User's Guide, because the help pages explain the correct syntax for each function. On 8 August 2012 10:13, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > > Dear KJ Lim, > > The problem is that you are following an example from the User's Guide that goes with the developmental version of edgeR, a newer version of egeR which has not yet been released. The syntax for the glmLRT() function has been changed compared with the official release version of edgeR that you are using. > > For your version of edgeR you need > > glmLRT(y, hl.fit, contrast=hl.contrasts[,"HvsL.00"]) > > instead of > > > glmLRT(hl.fit, contrast=hl.contrasts[,"HvsL.00"]) > > In other words, you can follow the examples from the developmental User's Guide, but you will need to add an extra argument to each glmLRT call. The first argument of glmLRT() needs to be the data object used to create the fit object for glmLRT() in your version of edgeR. > > Don't forget that you need to look at the help pages for the functions you are using, not just the User's Guide, because the help pages explain the correct syntax for each function. > > Best wishes > Gordon > > ----------------- original message ------------------ > [BioC] edgeR: Likelihood ratio test error > KJ Lim jinkeanlim at gmail.com > Tue Aug 7 15:58:10 CEST 2012 > > > Dear edgeR users community, > > Good day. > > I'm analysing my RNA-Seq data which has 2 different genotypes (treeHS, > treeLS) and time of treatment (0H, 3H, > 24H, 96H) with edgeR. I would like to learn the differential > expression of treeHS vs treeLS at specific time points i.e. 24H. > > Thanks to Prof Gordon and his colleagues, the latest edgeR user's > guide (Chapter 3) is very useful for my case. > > I encountered an error message when I carried out the likelihood ratio > test with either makeContrasts or coefficients approach. The error > message: > > > hl.contrasts <- makeContrasts(HvsL.00 = H.00H-L.00H, HvsL.03 = > (H.03H-H.00H)-(L.03H-L.00H),LvsH.00 = L.00H-H.00H,LvsH.03 = > (L.03H-L.00H)-(H.03H-H.00H), levels=hl.design) > > > lrt.HvsL00 <- glmLRT(hl.fit, contrast=hl.contrasts[,"HvsL.00"]) > Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : > dims [product 0] do not match the length of object [11] > > -------------------------------------------------------------------- ------------------- > > colnames(hl.fit) > [1] "H.00H" "H.03H" "H.24H" "H.96H" "L.00H" "L.03H" "L.24H" "L.96H" > > > lrt.HvsL00 <- glmLRT(hl.fit,contrast=c(1,0,0,0,-1,0,0,0)) > Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : > dims [product 0] do not match the length of object [11] > > > Could the community kindly please light me on this matter? What > correction I should make in this case? > > Thank you very much for your help. > > Best regards, > KJ Lim > >> sessionInfo() > > R version 2.15.1 (2012-06-22) > Platform: i486-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] edgeR_2.6.10 limma_3.12.1 > > loaded via a namespace (and not attached): > [1] tools_2.15.1 > > > > On 6 August 2012 10:11, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> >> Dear KJ Lim, >> >> Well, your "tree" column has 16 different entries, a different entry for >> every library. Naturally this causes a problem. >> >> So either you have no replication of any experimental condition, or you have mistakingly used different labels for the same condition. >> >> In your earlier email below, you claimed to have just two genotypes (tree=HS and tree=LS), but in your latest targets file you have 16 different genotypes. I'm guessing that the genotypes are actually H and L, so the entries in the targets file should be just H and L. >> >> That's as much advice as I have time to give you, and I hope that others will help you further. I feel that I've given you good general advice that is applicable to your experiment. >> >> >> Best wishes Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> Tel: (03) 9345 2326, Fax (03) 9347 0852, >> http://www.statsci.org/smyth >> >> On Mon, 6 Aug 2012, KJ Lim wrote: >> >>> Dear Prof Gordon, >>> >>> Good day. Thanks for you prompt replied. >>> >>> Below is the output of cbind(targets,Group=Group) >>> >>>> cbind(targets,Group=Group) >>> >>> >>> files treat tree X Group >>> 1 ./rawData/HS0H01.txt 00H H1 NA 00H.H1 >>> 2 ./rawData/HS0H02.txt 00H H2 NA 00H.H2 >>> 3 ./rawData/HS3H01.txt 03H H3 NA 03H.H3 >>> 4 ./rawData/HS3H02.txt 03H H4 NA 03H.H4 >>> 5 ./rawData/HS1D01.txt 24H H5 NA 24H.H5 >>> 6 ./rawData/HS1D02.txt 24H H6 NA 24H.H6 >>> 7 ./rawData/HS4D01.txt 96H H7 NA 96H.H7 >>> 8 ./rawData/HS4D02.txt 96H H8 NA 96H.H8 >>> 9 ./rawData/LS0H01.txt 00H L1 NA 00H.L1 >>> 10 ./rawData/LS0H02.txt 00H L2 NA 00H.L2 >>> 11 ./rawData/LS3H01.txt 03H L3 NA 03H.L3 >>> 12 ./rawData/LS3H02.txt 03H L4 NA 03H.L4 >>> 13 ./rawData/LS1D01.txt 24H L5 NA 24H.L5 >>> 14 ./rawData/LS1D02.txt 24H L6 NA 24H.L6 >>> 15 ./rawData/LS4D01.txt 96H L7 NA 96H.L7 >>> 16 ./rawData/LS4D02.txt 96H L8 NA 96H.L8 >>> >>> Thank you for your time. >>> >>> Best regards, >>> KJ Lim >>> >>> >>> On 6 August 2012 07:33, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>>> >>>> >>>> The error message is pretty self-explanatory. Apparently your design matrix has as many columns as there are libraries, so there are no degrees of freedom left from which to estimate variability. According to the code and output you have given, this message should be impossible, so you must have changed the data in some way that I cannot see. >>>> >>>> If you had shown the output from cbind(targets,Group=Group), then I might have been able to say something useful. >>>> >>>> Best wishes >>>> Gordon >>>> >>>> >>>> On Mon, 6 Aug 2012, KJ Lim wrote: >>>> >>>>> Dear Prof Gordon, >>>>> >>>>> Good day. >>>>> >>>>> Thanks for your time to update the edgeR User's Guide. It is useful. >>>>> >>>>> I combined all my experiment factors into one combined factor and >>>>> described the experiment matrix like the example in the User's guide: >>>>> >>>>> > Group <- factor(paste(targets$treat, targets$tree,sep=".")) >>>>> > cbind(targets,Group=Group) >>>>> > hl.design <- model.matrix(~0+Group) >>>>> >>>>> But, I encountered an error when I performed the estimate dispersion >>>>> for my data >>>>> >>>>> > hl <- estimateGLMCommonDisp(hl, hl.design) >>>>> Warning message: >>>>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>>>> No residual df: setting dispersion to NA >>>>> >>>>> I tried to figure out what I have done wrong, unfortunately, I have no luck on that. Could you or the community kindly please light me for this matter? >>>>> >>>>> Thank you very much for your time. >>>>> >>>>>> sessionInfo() >>>>> >>>>> >>>>> >>>>> R version 2.15.1 (2012-06-22) >>>>> Platform: i486-pc-linux-gnu (32-bit) >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >>>>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >>>>> [7] LC_PAPER=C LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] edgeR_2.6.10 limma_3.12.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] tcltk_2.15.1 tools_2.15.1 >>>>> >>>>> Best regards, >>>>> KJ Lim > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 13.4 years ago KJ Lim ▴ 420