Search
Question: EdgeR: paired samples together with independant samples
0
gravatar for Maria Keays
5.9 years ago by
Maria Keays30
Maria Keays30 wrote:
Hello, I read this thread and related user guide material with interest because I am working with a very similar data set with paired samples. However, I'm having trouble which I think stems from my data being unbalanced? I have four patients with a disease and three without, and within that for some patients I have replicates but for others I do not. I've created a design matrix as described on p32 of the 27 October 2012 edgeR user's guide, but when I try to estimate the common dispersion using estimateGLMCommonDisp() it tells me: "Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset) : Design matrix not of full rank. The following coefficients not estimable: DiseaseHealthy:Patient4" I guess because I have 4 patients in the diseased set and only 3 in the healthy set? If I remove Patient4 and try again, I'm able to continue the analysis successfully, but I'd obviously like to be able to include all the data -- is that possible? If so, could you explain how to do it? The original annotations for my data are below: Disease Patient Treatment disease1 1 control disease1 1 control disease1 1 control disease1 2 control disease1 3 control disease1 3 control disease1 4 control disease1 1 treat disease1 1 treat disease1 1 treat disease1 2 treat disease1 3 treat disease1 3 treat disease1 4 treat healthy 5 control healthy 6 control healthy 6 control healthy 6 control healthy 7 control healthy 7 control healthy 5 treat healthy 6 treat healthy 6 treat healthy 6 treat healthy 7 treat healthy 7 treat As I was following the user's guide I amended the "Patient" labels so it looked like this when I created the design matrix: Disease Patient Treatment disease1 1 control disease1 1 control disease1 1 control disease1 2 control disease1 3 control disease1 3 control disease1 4 control disease1 1 treat disease1 1 treat disease1 1 treat disease1 2 treat disease1 3 treat disease1 3 treat disease1 4 treat healthy 1 control healthy 2 control healthy 2 control healthy 2 control healthy 3 control healthy 3 control healthy 1 treat healthy 2 treat healthy 2 treat healthy 2 treat healthy 3 treat healthy 3 treat Thanks! Maria On 25/10/2012 06:18, Gordon K Smyth wrote: > Dear Anna, > > You are right to recognise that the analysis of this sort of design is > more complex than many other experiments, because it includes > comparisons both within and between patients. I have included a new > section in the edgeR User's Guide based on your experiment that > describes the analysis. This will appear in the official release of > edgeR in a couple of days. In the meantime, see pages 31-33 of: > > http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf > > Best wishes > Gordon > >> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT) >> From: "anna [guest]" <guest at="" bioconductor.org=""> >> To: bioconductor at r-project.org, m.nadira at yahoo.fr >> Subject: [BioC] EdgeR: paired samples together with independant >> samples >> >> >> Hello, >> I am using EdgeR to analyse my RNAseq data. >> >> I have: >> >> cells from 3 healthy patients , either treated or not with a hormone . >> >> cells from 3 patients with disease D1, either treated or not with the >> hormone >> >> cells from 3 patients with disease D2, either treated or not with the >> hormone. >> >> I would like to know what is wrong in the response to the hormone in >> patients with disease D1 and D2. >> >> I don't know how to combine paired comparisons, with pairwise >> comparisons, in a unique glm analysis. >> >> thank you very much, >> anna >> >> -- output of sessionInfo(): >> >> R version 2.15.1 (2012-06-22) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 >> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C >> [5] LC_TIME=French_France.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] tools_2.15.1 >> > > ______________________________________________________________________ > The information in this email is confidential and intend...{{dropped:4}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink modified 5.9 years ago by Gordon Smyth35k • written 5.9 years ago by Maria Keays30
0
gravatar for Gordon Smyth
5.9 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:
Dear Maria, Thanks for the specific reference to the documentation that you've followed. Yes, you are correct, the error is arising because there is no 4th patient in the healthy group. If you have a look at your design matrix, you will see that there is a column called DiseaseHealthy:Patient4 that consists entirely of zeros. It should be column 8, but check: design[,8] The easiest way to proceed is simply to remove that column manually from the design matrix: design2 <- design[,-8] Your experiment has another issue, in that you have repeat samples on several of the patients. Are these biological replicates? If not, if they are just technical replicates, then they should be collapsed into one library before analysis. Best wishes Gordon > Date: Tue, 06 Nov 2012 09:19:08 +0000 > From: Maria Keays <mkeays at="" ebi.ac.uk=""> > To: bioconductor at r-project.org > Subject: Re: [BioC] EdgeR: paired samples together with independant > samples > > Hello, > > I read this thread and related user guide material with interest because > I am working with a very similar data set with paired samples. However, > I'm having trouble which I think stems from my data being unbalanced? I > have four patients with a disease and three without, and within that for > some patients I have replicates but for others I do not. I've created a > design matrix as described on p32 of the 27 October 2012 edgeR user's > guide, but when I try to estimate the common dispersion using > estimateGLMCommonDisp() it tells me: > > "Error in glmFit.default(y, design = design, dispersion = dispersion, > offset = offset) : > Design matrix not of full rank. The following coefficients not > estimable: > DiseaseHealthy:Patient4" > > I guess because I have 4 patients in the diseased set and only 3 in the > healthy set? If I remove Patient4 and try again, I'm able to continue > the analysis successfully, but I'd obviously like to be able to include > all the data -- is that possible? If so, could you explain how to do it? > > The original annotations for my data are below: > > Disease Patient Treatment > disease1 1 control > disease1 1 control > disease1 1 control > disease1 2 control > disease1 3 control > disease1 3 control > disease1 4 control > disease1 1 treat > disease1 1 treat > disease1 1 treat > disease1 2 treat > disease1 3 treat > disease1 3 treat > disease1 4 treat > healthy 5 control > healthy 6 control > healthy 6 control > healthy 6 control > healthy 7 control > healthy 7 control > healthy 5 treat > healthy 6 treat > healthy 6 treat > healthy 6 treat > healthy 7 treat > healthy 7 treat > > As I was following the user's guide I amended the "Patient" labels so it > looked like this when I created the design matrix: > > Disease Patient Treatment > disease1 1 control > disease1 1 control > disease1 1 control > disease1 2 control > disease1 3 control > disease1 3 control > disease1 4 control > disease1 1 treat > disease1 1 treat > disease1 1 treat > disease1 2 treat > disease1 3 treat > disease1 3 treat > disease1 4 treat > healthy 1 control > healthy 2 control > healthy 2 control > healthy 2 control > healthy 3 control > healthy 3 control > healthy 1 treat > healthy 2 treat > healthy 2 treat > healthy 2 treat > healthy 3 treat > healthy 3 treat > > Thanks! > Maria > > > On 25/10/2012 06:18, Gordon K Smyth wrote: >> Dear Anna, >> >> You are right to recognise that the analysis of this sort of design is >> more complex than many other experiments, because it includes >> comparisons both within and between patients. I have included a new >> section in the edgeR User's Guide based on your experiment that >> describes the analysis. This will appear in the official release of >> edgeR in a couple of days. In the meantime, see pages 31-33 of: >> >> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf >> >> Best wishes >> Gordon >> >>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT) >>> From: "anna [guest]" <guest at="" bioconductor.org=""> >>> To: bioconductor at r-project.org, m.nadira at yahoo.fr >>> Subject: [BioC] EdgeR: paired samples together with independant >>> samples >>> >>> >>> Hello, >>> I am using EdgeR to analyse my RNAseq data. >>> >>> I have: >>> >>> cells from 3 healthy patients , either treated or not with a hormone . >>> >>> cells from 3 patients with disease D1, either treated or not with the >>> hormone >>> >>> cells from 3 patients with disease D2, either treated or not with the >>> hormone. >>> >>> I would like to know what is wrong in the response to the hormone in >>> patients with disease D1 and D2. >>> >>> I don't know how to combine paired comparisons, with pairwise >>> comparisons, in a unique glm analysis. >>> >>> thank you very much, >>> anna >>> >>> -- output of sessionInfo(): >>> >>> R version 2.15.1 (2012-06-22) >>> Platform: i386-pc-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 >>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C >>> [5] LC_TIME=French_France.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.15.1 >>> ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENTlink written 5.9 years ago by Gordon Smyth35k
Dear Gordon, Thanks very much for the helpful advice. I'm treating them as biological replicates -- they are cell cultures and it's just that I have multiple separately treated/untreated pairs of cultures from some patients and only one treated/untreated pair for others. So although some cultures came from the same patient, they were all treated separately and then RNA was extracted from each culture. Would you say that's the right thing to do? Thanks and best wishes, Maria On 07/11/2012 00:01, Gordon K Smyth wrote: > Dear Maria, > > Thanks for the specific reference to the documentation that you've > followed. > > Yes, you are correct, the error is arising because there is no 4th > patient in the healthy group. If you have a look at your design > matrix, you will see that there is a column called > DiseaseHealthy:Patient4 that consists entirely of zeros. It should be > column 8, but check: > > design[,8] > > The easiest way to proceed is simply to remove that column manually > from the design matrix: > > design2 <- design[,-8] > > Your experiment has another issue, in that you have repeat samples on > several of the patients. Are these biological replicates? If not, if > they are just technical replicates, then they should be collapsed into > one library before analysis. > > Best wishes > Gordon > >> Date: Tue, 06 Nov 2012 09:19:08 +0000 >> From: Maria Keays <mkeays at="" ebi.ac.uk=""> >> To: bioconductor at r-project.org >> Subject: Re: [BioC] EdgeR: paired samples together with independant >> samples >> >> Hello, >> >> I read this thread and related user guide material with interest because >> I am working with a very similar data set with paired samples. However, >> I'm having trouble which I think stems from my data being unbalanced? I >> have four patients with a disease and three without, and within that for >> some patients I have replicates but for others I do not. I've created a >> design matrix as described on p32 of the 27 October 2012 edgeR user's >> guide, but when I try to estimate the common dispersion using >> estimateGLMCommonDisp() it tells me: >> >> "Error in glmFit.default(y, design = design, dispersion = dispersion, >> offset = offset) : >> Design matrix not of full rank. The following coefficients not >> estimable: >> DiseaseHealthy:Patient4" >> >> I guess because I have 4 patients in the diseased set and only 3 in the >> healthy set? If I remove Patient4 and try again, I'm able to continue >> the analysis successfully, but I'd obviously like to be able to include >> all the data -- is that possible? If so, could you explain how to do it? >> >> The original annotations for my data are below: >> >> Disease Patient Treatment >> disease1 1 control >> disease1 1 control >> disease1 1 control >> disease1 2 control >> disease1 3 control >> disease1 3 control >> disease1 4 control >> disease1 1 treat >> disease1 1 treat >> disease1 1 treat >> disease1 2 treat >> disease1 3 treat >> disease1 3 treat >> disease1 4 treat >> healthy 5 control >> healthy 6 control >> healthy 6 control >> healthy 6 control >> healthy 7 control >> healthy 7 control >> healthy 5 treat >> healthy 6 treat >> healthy 6 treat >> healthy 6 treat >> healthy 7 treat >> healthy 7 treat >> >> As I was following the user's guide I amended the "Patient" labels so it >> looked like this when I created the design matrix: >> >> Disease Patient Treatment >> disease1 1 control >> disease1 1 control >> disease1 1 control >> disease1 2 control >> disease1 3 control >> disease1 3 control >> disease1 4 control >> disease1 1 treat >> disease1 1 treat >> disease1 1 treat >> disease1 2 treat >> disease1 3 treat >> disease1 3 treat >> disease1 4 treat >> healthy 1 control >> healthy 2 control >> healthy 2 control >> healthy 2 control >> healthy 3 control >> healthy 3 control >> healthy 1 treat >> healthy 2 treat >> healthy 2 treat >> healthy 2 treat >> healthy 3 treat >> healthy 3 treat >> >> Thanks! >> Maria >> >> >> On 25/10/2012 06:18, Gordon K Smyth wrote: >>> Dear Anna, >>> >>> You are right to recognise that the analysis of this sort of design is >>> more complex than many other experiments, because it includes >>> comparisons both within and between patients. I have included a new >>> section in the edgeR User's Guide based on your experiment that >>> describes the analysis. This will appear in the official release of >>> edgeR in a couple of days. In the meantime, see pages 31-33 of: >>> >>> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf >>> >>> Best wishes >>> Gordon >>> >>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT) >>>> From: "anna [guest]" <guest at="" bioconductor.org=""> >>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr >>>> Subject: [BioC] EdgeR: paired samples together with independant >>>> samples >>>> >>>> >>>> Hello, >>>> I am using EdgeR to analyse my RNAseq data. >>>> >>>> I have: >>>> >>>> cells from 3 healthy patients , either treated or not with a hormone . >>>> >>>> cells from 3 patients with disease D1, either treated or not with the >>>> hormone >>>> >>>> cells from 3 patients with disease D2, either treated or not with the >>>> hormone. >>>> >>>> I would like to know what is wrong in the response to the hormone in >>>> patients with disease D1 and D2. >>>> >>>> I don't know how to combine paired comparisons, with pairwise >>>> comparisons, in a unique glm analysis. >>>> >>>> thank you very much, >>>> anna >>>> >>>> -- output of sessionInfo(): >>>> >>>> R version 2.15.1 (2012-06-22) >>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 >>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C >>>> [5] LC_TIME=French_France.1252 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> loaded via a namespace (and not attached): >>>> [1] tools_2.15.1 >>>> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}
ADD REPLYlink written 5.9 years ago by Maria Keays30
Dear Maria, Sounds ok from what you say not to collapse libraries. However, if the three treated cultures and three untreated cultures for one patient are truly three pairs, then this pairing should be reflected in the analysis. You can handle this by numbering the samples by paired culture from 1 to 7 instead of numbering by patient. An MDS plot could guide you in judging whether there are baseline differences between the different pairs for one patient, and hence whether your pairing should be by culture instead of by patient. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Wed, 7 Nov 2012, Maria Keays wrote: > Dear Gordon, > > Thanks very much for the helpful advice. I'm treating them as biological > replicates -- they are cell cultures and it's just that I have multiple > separately treated/untreated pairs of cultures from some patients and only > one treated/untreated pair for others. So although some cultures came from > the same patient, they were all treated separately and then RNA was extracted > from each culture. Would you say that's the right thing to do? > > Thanks and best wishes, > Maria > > > On 07/11/2012 00:01, Gordon K Smyth wrote: >> Dear Maria, >> >> Thanks for the specific reference to the documentation that you've >> followed. >> >> Yes, you are correct, the error is arising because there is no 4th patient >> in the healthy group. If you have a look at your design matrix, you will >> see that there is a column called DiseaseHealthy:Patient4 that consists >> entirely of zeros. It should be column 8, but check: >> >> design[,8] >> >> The easiest way to proceed is simply to remove that column manually from >> the design matrix: >> >> design2 <- design[,-8] >> >> Your experiment has another issue, in that you have repeat samples on >> several of the patients. Are these biological replicates? If not, if they >> are just technical replicates, then they should be collapsed into one >> library before analysis. >> >> Best wishes >> Gordon >> >>> Date: Tue, 06 Nov 2012 09:19:08 +0000 >>> From: Maria Keays <mkeays at="" ebi.ac.uk=""> >>> To: bioconductor at r-project.org >>> Subject: Re: [BioC] EdgeR: paired samples together with independant >>> samples >>> >>> Hello, >>> >>> I read this thread and related user guide material with interest because >>> I am working with a very similar data set with paired samples. However, >>> I'm having trouble which I think stems from my data being unbalanced? I >>> have four patients with a disease and three without, and within that for >>> some patients I have replicates but for others I do not. I've created a >>> design matrix as described on p32 of the 27 October 2012 edgeR user's >>> guide, but when I try to estimate the common dispersion using >>> estimateGLMCommonDisp() it tells me: >>> >>> "Error in glmFit.default(y, design = design, dispersion = dispersion, >>> offset = offset) : >>> Design matrix not of full rank. The following coefficients not >>> estimable: >>> DiseaseHealthy:Patient4" >>> >>> I guess because I have 4 patients in the diseased set and only 3 in the >>> healthy set? If I remove Patient4 and try again, I'm able to continue >>> the analysis successfully, but I'd obviously like to be able to include >>> all the data -- is that possible? If so, could you explain how to do it? >>> >>> The original annotations for my data are below: >>> >>> Disease Patient Treatment >>> disease1 1 control >>> disease1 1 control >>> disease1 1 control >>> disease1 2 control >>> disease1 3 control >>> disease1 3 control >>> disease1 4 control >>> disease1 1 treat >>> disease1 1 treat >>> disease1 1 treat >>> disease1 2 treat >>> disease1 3 treat >>> disease1 3 treat >>> disease1 4 treat >>> healthy 5 control >>> healthy 6 control >>> healthy 6 control >>> healthy 6 control >>> healthy 7 control >>> healthy 7 control >>> healthy 5 treat >>> healthy 6 treat >>> healthy 6 treat >>> healthy 6 treat >>> healthy 7 treat >>> healthy 7 treat >>> >>> As I was following the user's guide I amended the "Patient" labels so it >>> looked like this when I created the design matrix: >>> >>> Disease Patient Treatment >>> disease1 1 control >>> disease1 1 control >>> disease1 1 control >>> disease1 2 control >>> disease1 3 control >>> disease1 3 control >>> disease1 4 control >>> disease1 1 treat >>> disease1 1 treat >>> disease1 1 treat >>> disease1 2 treat >>> disease1 3 treat >>> disease1 3 treat >>> disease1 4 treat >>> healthy 1 control >>> healthy 2 control >>> healthy 2 control >>> healthy 2 control >>> healthy 3 control >>> healthy 3 control >>> healthy 1 treat >>> healthy 2 treat >>> healthy 2 treat >>> healthy 2 treat >>> healthy 3 treat >>> healthy 3 treat >>> >>> Thanks! >>> Maria >>> >>> >>> On 25/10/2012 06:18, Gordon K Smyth wrote: >>>> Dear Anna, >>>> >>>> You are right to recognise that the analysis of this sort of design is >>>> more complex than many other experiments, because it includes >>>> comparisons both within and between patients. I have included a new >>>> section in the edgeR User's Guide based on your experiment that >>>> describes the analysis. This will appear in the official release of >>>> edgeR in a couple of days. In the meantime, see pages 31-33 of: >>>> >>>> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT) >>>>> From: "anna [guest]" <guest at="" bioconductor.org=""> >>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr >>>>> Subject: [BioC] EdgeR: paired samples together with independant >>>>> samples >>>>> >>>>> >>>>> Hello, >>>>> I am using EdgeR to analyse my RNAseq data. >>>>> >>>>> I have: >>>>> >>>>> cells from 3 healthy patients , either treated or not with a hormone . >>>>> >>>>> cells from 3 patients with disease D1, either treated or not with the >>>>> hormone >>>>> >>>>> cells from 3 patients with disease D2, either treated or not with the >>>>> hormone. >>>>> >>>>> I would like to know what is wrong in the response to the hormone in >>>>> patients with disease D1 and D2. >>>>> >>>>> I don't know how to combine paired comparisons, with pairwise >>>>> comparisons, in a unique glm analysis. >>>>> >>>>> thank you very much, >>>>> anna >>>>> >>>>> -- output of sessionInfo(): >>>>> >>>>> R version 2.15.1 (2012-06-22) >>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 >>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C >>>>> [5] LC_TIME=French_France.1252 >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] tools_2.15.1 >>>>> ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLYlink written 5.9 years ago by Gordon Smyth35k
Dear Gordon, I have another question about this analysis. Previously I performed an analysis on the same data but without incorporating effects of patient. My design matrix had columns: "Disease1.Treat", "Disease1.Control", "Healthy.Treat", "Healthy.Control", and I then tested for genes showing a significant interaction between disease and treatment using the contrast ((Disease1.Treat - Disease1.Control) - (Healthy.Treat - Healthy.Control)). I think this is what is explained on pages 25-26 of the edgeR users guide (Oct 27 2012 version). Now I want to take into account patient effects as well, so I have my design matrix with columns: [1] "(Intercept)" "DiseaseDisease1" [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" [7] "DiseaseDisease1:Patient4" "DiseaseHealthy:TreatmentTreat" [9] "DiseaseDisease1:TreatmentTreat" Reading the explanation on pages 32-33 of the users guide, to do the equivalent contrast to find genes showing significant interaction between disease and treatment, should I simply use: lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,-1,1)) ? I think this is what the guide is saying, but I just want to make sure... Thanks and best wishes, Maria On 07/11/2012 22:55, Gordon K Smyth wrote: > Dear Maria, > > Sounds ok from what you say not to collapse libraries. However, if > the three treated cultures and three untreated cultures for one > patient are truly three pairs, then this pairing should be reflected > in the analysis. You can handle this by numbering the samples by > paired culture from 1 to 7 instead of numbering by patient. > > An MDS plot could guide you in judging whether there are baseline > differences between the different pairs for one patient, and hence > whether your pairing should be by culture instead of by patient. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > On Wed, 7 Nov 2012, Maria Keays wrote: > >> Dear Gordon, >> >> Thanks very much for the helpful advice. I'm treating them as >> biological replicates -- they are cell cultures and it's just that I >> have multiple separately treated/untreated pairs of cultures from >> some patients and only one treated/untreated pair for others. So >> although some cultures came from the same patient, they were all >> treated separately and then RNA was extracted from each culture. >> Would you say that's the right thing to do? >> >> Thanks and best wishes, >> Maria >> >> >> On 07/11/2012 00:01, Gordon K Smyth wrote: >>> Dear Maria, >>> >>> Thanks for the specific reference to the documentation that you've >>> followed. >>> >>> Yes, you are correct, the error is arising because there is no 4th >>> patient in the healthy group. If you have a look at your design >>> matrix, you will see that there is a column called >>> DiseaseHealthy:Patient4 that consists entirely of zeros. It should >>> be column 8, but check: >>> >>> design[,8] >>> >>> The easiest way to proceed is simply to remove that column manually >>> from the design matrix: >>> >>> design2 <- design[,-8] >>> >>> Your experiment has another issue, in that you have repeat samples >>> on several of the patients. Are these biological replicates? If >>> not, if they are just technical replicates, then they should be >>> collapsed into one library before analysis. >>> >>> Best wishes >>> Gordon >>> >>>> Date: Tue, 06 Nov 2012 09:19:08 +0000 >>>> From: Maria Keays <mkeays at="" ebi.ac.uk=""> >>>> To: bioconductor at r-project.org >>>> Subject: Re: [BioC] EdgeR: paired samples together with independant >>>> samples >>>> >>>> Hello, >>>> >>>> I read this thread and related user guide material with interest >>>> because >>>> I am working with a very similar data set with paired samples. >>>> However, >>>> I'm having trouble which I think stems from my data being >>>> unbalanced? I >>>> have four patients with a disease and three without, and within >>>> that for >>>> some patients I have replicates but for others I do not. I've >>>> created a >>>> design matrix as described on p32 of the 27 October 2012 edgeR user's >>>> guide, but when I try to estimate the common dispersion using >>>> estimateGLMCommonDisp() it tells me: >>>> >>>> "Error in glmFit.default(y, design = design, dispersion = dispersion, >>>> offset = offset) : >>>> Design matrix not of full rank. The following coefficients not >>>> estimable: >>>> DiseaseHealthy:Patient4" >>>> >>>> I guess because I have 4 patients in the diseased set and only 3 in >>>> the >>>> healthy set? If I remove Patient4 and try again, I'm able to continue >>>> the analysis successfully, but I'd obviously like to be able to >>>> include >>>> all the data -- is that possible? If so, could you explain how to >>>> do it? >>>> >>>> The original annotations for my data are below: >>>> >>>> Disease Patient Treatment >>>> disease1 1 control >>>> disease1 1 control >>>> disease1 1 control >>>> disease1 2 control >>>> disease1 3 control >>>> disease1 3 control >>>> disease1 4 control >>>> disease1 1 treat >>>> disease1 1 treat >>>> disease1 1 treat >>>> disease1 2 treat >>>> disease1 3 treat >>>> disease1 3 treat >>>> disease1 4 treat >>>> healthy 5 control >>>> healthy 6 control >>>> healthy 6 control >>>> healthy 6 control >>>> healthy 7 control >>>> healthy 7 control >>>> healthy 5 treat >>>> healthy 6 treat >>>> healthy 6 treat >>>> healthy 6 treat >>>> healthy 7 treat >>>> healthy 7 treat >>>> >>>> As I was following the user's guide I amended the "Patient" labels >>>> so it >>>> looked like this when I created the design matrix: >>>> >>>> Disease Patient Treatment >>>> disease1 1 control >>>> disease1 1 control >>>> disease1 1 control >>>> disease1 2 control >>>> disease1 3 control >>>> disease1 3 control >>>> disease1 4 control >>>> disease1 1 treat >>>> disease1 1 treat >>>> disease1 1 treat >>>> disease1 2 treat >>>> disease1 3 treat >>>> disease1 3 treat >>>> disease1 4 treat >>>> healthy 1 control >>>> healthy 2 control >>>> healthy 2 control >>>> healthy 2 control >>>> healthy 3 control >>>> healthy 3 control >>>> healthy 1 treat >>>> healthy 2 treat >>>> healthy 2 treat >>>> healthy 2 treat >>>> healthy 3 treat >>>> healthy 3 treat >>>> >>>> Thanks! >>>> Maria >>>> >>>> >>>> On 25/10/2012 06:18, Gordon K Smyth wrote: >>>>> Dear Anna, >>>>> >>>>> You are right to recognise that the analysis of this sort of >>>>> design is >>>>> more complex than many other experiments, because it includes >>>>> comparisons both within and between patients. I have included a new >>>>> section in the edgeR User's Guide based on your experiment that >>>>> describes the analysis. This will appear in the official release of >>>>> edgeR in a couple of days. In the meantime, see pages 31-33 of: >>>>> >>>>> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT) >>>>>> From: "anna [guest]" <guest at="" bioconductor.org=""> >>>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr >>>>>> Subject: [BioC] EdgeR: paired samples together with independant >>>>>> samples >>>>>> >>>>>> >>>>>> Hello, >>>>>> I am using EdgeR to analyse my RNAseq data. >>>>>> >>>>>> I have: >>>>>> >>>>>> cells from 3 healthy patients , either treated or not with a >>>>>> hormone . >>>>>> >>>>>> cells from 3 patients with disease D1, either treated or not with >>>>>> the >>>>>> hormone >>>>>> >>>>>> cells from 3 patients with disease D2, either treated or not with >>>>>> the >>>>>> hormone. >>>>>> >>>>>> I would like to know what is wrong in the response to the hormone in >>>>>> patients with disease D1 and D2. >>>>>> >>>>>> I don't know how to combine paired comparisons, with pairwise >>>>>> comparisons, in a unique glm analysis. >>>>>> >>>>>> thank you very much, >>>>>> anna >>>>>> >>>>>> -- output of sessionInfo(): >>>>>> >>>>>> R version 2.15.1 (2012-06-22) >>>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>>> >>>>>> locale: >>>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 >>>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C >>>>>> [5] LC_TIME=French_France.1252 >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] tools_2.15.1 >>>>>> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}
ADD REPLYlink written 5.9 years ago by Maria Keays30
On Mon, 12 Nov 2012, Maria Keays wrote: > Dear Gordon, > > I have another question about this analysis. Previously I performed an > analysis on the same data but without incorporating effects of patient. My > design matrix had columns: "Disease1.Treat", "Disease1.Control", > "Healthy.Treat", "Healthy.Control", and I then tested for genes showing a > significant interaction between disease and treatment using the contrast > ((Disease1.Treat - Disease1.Control) - (Healthy.Treat - Healthy.Control)). I > think this is what is explained on pages 25-26 of the edgeR users guide (Oct > 27 2012 version). > > Now I want to take into account patient effects as well, so I have my design > matrix with columns: > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [7] "DiseaseDisease1:Patient4" "DiseaseHealthy:TreatmentTreat" > [9] "DiseaseDisease1:TreatmentTreat" > > Reading the explanation on pages 32-33 of the users guide, to do the > equivalent contrast to find genes showing significant interaction between > disease and treatment, should I simply use: > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,-1,1)) ? Yes. Gordon > I think this is what the guide is saying, but I just want to make sure... > > Thanks and best wishes, > Maria > > > On 07/11/2012 22:55, Gordon K Smyth wrote: >> Dear Maria, >> >> Sounds ok from what you say not to collapse libraries. However, if the >> three treated cultures and three untreated cultures for one patient are >> truly three pairs, then this pairing should be reflected in the analysis. >> You can handle this by numbering the samples by paired culture from 1 to 7 >> instead of numbering by patient. >> >> An MDS plot could guide you in judging whether there are baseline >> differences between the different pairs for one patient, and hence whether >> your pairing should be by culture instead of by patient. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Wed, 7 Nov 2012, Maria Keays wrote: >> >>> Dear Gordon, >>> >>> Thanks very much for the helpful advice. I'm treating them as biological >>> replicates -- they are cell cultures and it's just that I have multiple >>> separately treated/untreated pairs of cultures from some patients and only >>> one treated/untreated pair for others. So although some cultures came from >>> the same patient, they were all treated separately and then RNA was >>> extracted from each culture. Would you say that's the right thing to do? >>> >>> Thanks and best wishes, >>> Maria >>> >>> >>> On 07/11/2012 00:01, Gordon K Smyth wrote: >>>> Dear Maria, >>>> >>>> Thanks for the specific reference to the documentation that you've >>>> followed. >>>> >>>> Yes, you are correct, the error is arising because there is no 4th >>>> patient in the healthy group. If you have a look at your design matrix, >>>> you will see that there is a column called DiseaseHealthy:Patient4 that >>>> consists entirely of zeros. It should be column 8, but check: >>>> >>>> design[,8] >>>> >>>> The easiest way to proceed is simply to remove that column manually from >>>> the design matrix: >>>> >>>> design2 <- design[,-8] >>>> >>>> Your experiment has another issue, in that you have repeat samples on >>>> several of the patients. Are these biological replicates? If not, if >>>> they are just technical replicates, then they should be collapsed into >>>> one library before analysis. >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Tue, 06 Nov 2012 09:19:08 +0000 >>>>> From: Maria Keays <mkeays at="" ebi.ac.uk=""> >>>>> To: bioconductor at r-project.org >>>>> Subject: Re: [BioC] EdgeR: paired samples together with independant >>>>> samples >>>>> >>>>> Hello, >>>>> >>>>> I read this thread and related user guide material with interest because >>>>> I am working with a very similar data set with paired samples. However, >>>>> I'm having trouble which I think stems from my data being unbalanced? I >>>>> have four patients with a disease and three without, and within that for >>>>> some patients I have replicates but for others I do not. I've created a >>>>> design matrix as described on p32 of the 27 October 2012 edgeR user's >>>>> guide, but when I try to estimate the common dispersion using >>>>> estimateGLMCommonDisp() it tells me: >>>>> >>>>> "Error in glmFit.default(y, design = design, dispersion = dispersion, >>>>> offset = offset) : >>>>> Design matrix not of full rank. The following coefficients not >>>>> estimable: >>>>> DiseaseHealthy:Patient4" >>>>> >>>>> I guess because I have 4 patients in the diseased set and only 3 in the >>>>> healthy set? If I remove Patient4 and try again, I'm able to continue >>>>> the analysis successfully, but I'd obviously like to be able to include >>>>> all the data -- is that possible? If so, could you explain how to do it? >>>>> >>>>> The original annotations for my data are below: >>>>> >>>>> Disease Patient Treatment >>>>> disease1 1 control >>>>> disease1 1 control >>>>> disease1 1 control >>>>> disease1 2 control >>>>> disease1 3 control >>>>> disease1 3 control >>>>> disease1 4 control >>>>> disease1 1 treat >>>>> disease1 1 treat >>>>> disease1 1 treat >>>>> disease1 2 treat >>>>> disease1 3 treat >>>>> disease1 3 treat >>>>> disease1 4 treat >>>>> healthy 5 control >>>>> healthy 6 control >>>>> healthy 6 control >>>>> healthy 6 control >>>>> healthy 7 control >>>>> healthy 7 control >>>>> healthy 5 treat >>>>> healthy 6 treat >>>>> healthy 6 treat >>>>> healthy 6 treat >>>>> healthy 7 treat >>>>> healthy 7 treat >>>>> >>>>> As I was following the user's guide I amended the "Patient" labels so it >>>>> looked like this when I created the design matrix: >>>>> >>>>> Disease Patient Treatment >>>>> disease1 1 control >>>>> disease1 1 control >>>>> disease1 1 control >>>>> disease1 2 control >>>>> disease1 3 control >>>>> disease1 3 control >>>>> disease1 4 control >>>>> disease1 1 treat >>>>> disease1 1 treat >>>>> disease1 1 treat >>>>> disease1 2 treat >>>>> disease1 3 treat >>>>> disease1 3 treat >>>>> disease1 4 treat >>>>> healthy 1 control >>>>> healthy 2 control >>>>> healthy 2 control >>>>> healthy 2 control >>>>> healthy 3 control >>>>> healthy 3 control >>>>> healthy 1 treat >>>>> healthy 2 treat >>>>> healthy 2 treat >>>>> healthy 2 treat >>>>> healthy 3 treat >>>>> healthy 3 treat >>>>> >>>>> Thanks! >>>>> Maria >>>>> >>>>> >>>>> On 25/10/2012 06:18, Gordon K Smyth wrote: >>>>>> Dear Anna, >>>>>> >>>>>> You are right to recognise that the analysis of this sort of design is >>>>>> more complex than many other experiments, because it includes >>>>>> comparisons both within and between patients. I have included a new >>>>>> section in the edgeR User's Guide based on your experiment that >>>>>> describes the analysis. This will appear in the official release of >>>>>> edgeR in a couple of days. In the meantime, see pages 31-33 of: >>>>>> >>>>>> http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf >>>>>> >>>>>> Best wishes >>>>>> Gordon >>>>>> >>>>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT) >>>>>>> From: "anna [guest]" <guest at="" bioconductor.org=""> >>>>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr >>>>>>> Subject: [BioC] EdgeR: paired samples together with independant >>>>>>> samples >>>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> I am using EdgeR to analyse my RNAseq data. >>>>>>> >>>>>>> I have: >>>>>>> >>>>>>> cells from 3 healthy patients , either treated or not with a hormone . >>>>>>> >>>>>>> cells from 3 patients with disease D1, either treated or not with the >>>>>>> hormone >>>>>>> >>>>>>> cells from 3 patients with disease D2, either treated or not with the >>>>>>> hormone. >>>>>>> >>>>>>> I would like to know what is wrong in the response to the hormone in >>>>>>> patients with disease D1 and D2. >>>>>>> >>>>>>> I don't know how to combine paired comparisons, with pairwise >>>>>>> comparisons, in a unique glm analysis. >>>>>>> >>>>>>> thank you very much, >>>>>>> anna >>>>>>> >>>>>>> -- output of sessionInfo(): >>>>>>> >>>>>>> R version 2.15.1 (2012-06-22) >>>>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>>>> >>>>>>> locale: >>>>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 >>>>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C >>>>>>> [5] LC_TIME=French_France.1252 >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>> >>>>>>> loaded via a namespace (and not attached): >>>>>>> [1] tools_2.15.1 >>>>>>> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLYlink written 5.9 years ago by Gordon Smyth35k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 328 users visited in the last hour