Search
Question: edgeR estimateGLMCommonDisp Error on RNA analysis
0
gravatar for Justin Jeyakani GIS
4.5 years ago by
Justin Jeyakani GIS60 wrote:
Hello Sir, I'm doing differential gene exp. by the edgeR package. My data seems to suitable to analyse by glm method (As per edgeR userguide section 3.5 Comparisons Both Between and Within Subjects). I have two different matrix to design. Matrix One: I have 12 dataset 3 samples from different healthy individuals (Healthy) of two different stages (npc,ne) and 3 samples from different individuals (Disease1) of two different stages (npc,ne), so 6 data from each healthy(6) and Disese1 (6) below is my code and works well. But for my another design matrix it's not! Matrix Two: But I want to comapre the All the Healthy (6data from 3 indiauals) with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I do individual test of 3 disease1 with Healthy but the I'm getting error in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. I have the same problem with my another matrix also. Can guide me to solve this issue highly appreciated ... below is my code for the above mentiond two matrix...do I need to define anthing in the generate factor Level "gl" further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set causing dispersion error. Matrix1.txt: my input (works fine) Disease Patient Treatment 1 Healthy 1 npc 2 Healthy 1 ne 3 Healthy 2 npc 4 Healthy 2 ne 5 Healthy 3 npc 6 Healthy 3 ne 7 Disease1 4 npc 8 Disease1 4 ne 9 Disease1 5 npc 10 Disease1 5 ne 11 Disease1 6 npc 12 Disease1 6 ne SUPT1<-read.table("matrix1.txt",header=TRUE) summary(SUPT1) Patient<-gl(3,2,length=12) Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) data.frame(Disease,Patient,Treatment) Disease Patient Treatment 1 Healthy 1 npc 2 Healthy 1 ne 3 Healthy 2 npc 4 Healthy 2 ne 5 Healthy 3 npc 6 Healthy 3 ne 7 Disease1 1 npc 8 Disease1 1 ne 9 Disease1 2 npc 10 Disease1 2 ne 11 Disease1 3 npc 12 Disease1 3 ne design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) colnames(design) [1] "(Intercept)" "DiseaseDisease1" [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" library(edgeR) count<-read.table("RHN035-46_s1.txt",header=TRUE) head (count) colnames(count) samplename=colnames(count) cds01<-DGEList(count,group=samplename) head(cds01) cds01 summary(cds01) dim(cds01) keep<-rowSums(cpm(cds01)>2)>=4 cds01<-cds01[keep,] dim(cds01) cds01$sample$lib.size<-colSums(cds01$counts) y <- estimateGLMCommonDisp(cds01,design) y <- estimateGLMTrendedDisp(y,design) y <- estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") topTags(lrt) detags <- rownames(topTags(lrt)$table) cpm(y)[detags, order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") topTags(lrt) detags <- rownames(topTags(lrt)$table) cpm(y)[detags, order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) topTags(lrt) detags <- rownames(topTags(lrt)$table) cpm(y)[detags, order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) Disease Patient Treatment 1 Healthy 1 npc 2 Healthy 1 ne 3 Healthy 2 npc 4 Healthy 2 ne 5 Healthy 3 npc 6 Healthy 3 ne 7 Disease1 4 npc 8 Disease1 4 ne 9 Disease2 5 npc 10 Disease2 5 ne 11 Disease3 6 npc 12 Disease3 6 ne SUPT1<-read.table("matrix2.txt",header=TRUE) summary(SUPT1) Patient<-gl(3,2,length=12) Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1","Disease2","Disease3")) Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) data.frame(Disease,Patient,Treatment) Disease Patient Treatment 1 Healthy 1 npc 2 Healthy 1 ne 3 Healthy 2 npc 4 Healthy 2 ne 5 Healthy 3 npc 6 Healthy 3 ne 7 Disease1 1 npc 8 Disease1 1 ne 9 Disease2 2 npc 10 Disease2 2 ne 11 Disease3 3 npc 12 Disease3 3 ne design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) colnames(design) [1] "(Intercept)" "DiseaseDisease1" [3] "DiseaseDisease2" "DiseaseDisease3" [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" library(edgeR) count<-read.table("RHN035-46_s1.txt",header=TRUE) head (count) colnames(count) samplename=colnames(count) cds01<-DGEList(count,group=samplename) head(cds01) cds01 summary(cds01) dim(cds01) keep<-rowSums(cpm(cds01)>2)>=4 cds01<-cds01[keep,] dim(cds01) cds01$sample$lib.size<-colSums(cds01$counts) y <- estimateGLMCommonDisp(cds01,design) Warning message: In estimateGLMCommonDisp.default(y = y$counts, design = design, : No residual df: setting dispersion to N Matri3.txt- my input (gives Error) Disease Patient Treatment 1 Healthy 1 npc 2 Healthy 1 ne 3 Healthy 2 npc 4 Healthy 2 ne 5 Healthy 3 npc 6 Healthy 3 ne 7 Healthy 4 npc 8 Healthy 4 ne 9 Disease1 5 npc 10 Disease1 5 ne 11 Disease2 6 npc 12 Disease2 6 ne 13 Disease2 7 npc 14 Disease2 7 ne 15 Disease3 8 npc 16 Disease3 8 ne SUPT1<-read.table("matrix2.txt",header=TRUE) summary(SUPT1) Patient<-gl(4,2,length=16) Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1","Disease2","Disease3")) Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) data.frame(Disease,Patient,Treatment) Disease Patient Treatment 1 Healthy 1 npc 2 Healthy 1 ne 3 Healthy 2 npc 4 Healthy 2 ne 5 Healthy 3 npc 6 Healthy 3 ne 7 Healthy 4 npc 8 Healthy 4 ne 9 Disease1 1 npc 10 Disease1 1 ne 11 Disease2 2 npc 12 Disease2 2 ne 13 Disease2 3 npc 14 Disease2 3 ne 15 Disease3 4 npc 16 Disease3 4 ne design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) colnames(design) [1] "(Intercept)" "DiseaseDisease1" [3] "DiseaseDisease2" "DiseaseDisease3" [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" library(edgeR) count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.txt ",header=TRUE) head (count) colnames(count) samplename=colnames(count) cds01<-DGEList(count,group=samplename) head(cds01) cds01 summary(cds01) dim(cds01) keep<-rowSums(cpm(cds01)>1)>=4 cds01<-cds01[keep,] dim(cds01) cds01$sample$lib.size<-colSums(cds01$counts) y <- estimateGLMCommonDisp(cds01,design) Error in return(NA, ntags) : multi-argument returns are not permitted In addition: Warning message: In estimateGLMTrendedDisp.default(y = y$counts, design = design, : No residual df: cannot estimate dispersion Thanks and regards, Justin ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- [[alternative HTML version deleted]]
ADD COMMENTlink modified 4.5 years ago by Gordon Smyth35k • written 4.5 years ago by Justin Jeyakani GIS60
0
gravatar for Gordon Smyth
4.5 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:
Dear Justin, You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: > samples Disease Patient Treatment 1 Healthy H1 npc 2 Healthy H1 ne 3 Healthy H2 npc 4 Healthy H2 ne 5 Healthy H3 npc 6 Healthy H3 ne 7 Disease1 D4 npc 8 Disease1 D4 ne 9 Disease2 D5 npc 10 Disease2 D5 ne 11 Disease3 D6 npc 12 Disease3 D6 ne > design1 <- model.matrix(~Patient,data=samples) > design2 <- model.matrix(~Disease*Treatment,data=samples) > design <- cbind(design1,design2[,5:8]) > colnames(design) [1] "(Intercept)" "PatientD5" [3] "PatientD6" "PatientH1" [5] "PatientH2" "PatientH3" [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" Then you can test for coefficients 8, 9 and 10. Best wishes Gordon > Date: Fri, 21 Mar 2014 17:09:37 +0800 > From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis > > Hello Sir, > > I'm doing differential gene exp. by the edgeR package. My data seems to > suitable to analyse by glm method (As per edgeR userguide section 3.5 > Comparisons Both Between and Within Subjects). > > I have two different matrix to design. > Matrix One: > I have 12 dataset 3 samples from different healthy individuals (Healthy) > of two different stages (npc,ne) and 3 samples from different > individuals (Disease1) of two different stages (npc,ne), so 6 data from > each healthy(6) and Disese1 (6) below is my code and works well. But for > my another design matrix it's not! > > Matrix Two: > But I want to comapre the All the Healthy (6data from 3 indiauals) with > each of the Disease1 individual (npc,ne). B'ze I'm getting 125 diff.exp > genes btn healthyVsDisease1 and I'm expecting more, If I do individual > test of 3 disease1 with Healthy but the I'm getting error in the > "estimateGLMCommonDisp" steps. I couldn't solve the problem. I have the > same problem with my another matrix also. Can guide me to solve this > issue highly appreciated ... below is my code for the above mentiond two > matrix...do I need to define anthing in the generate factor Level "gl" > further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set > causing dispersion error. > > Matrix1.txt: my input (works fine) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 4 npc > 8 Disease1 4 ne > 9 Disease1 5 npc > 10 Disease1 5 ne > 11 Disease1 6 npc > 12 Disease1 6 ne > > SUPT1<-read.table("matrix1.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(3,2,length=12) > Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 1 npc > 8 Disease1 1 ne > 9 Disease1 2 npc > 10 Disease1 2 ne > 11 Disease1 3 npc > 12 Disease1 3 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46_s1.txt",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > > keep<-rowSums(cpm(cds01)>2)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > y <- estimateGLMCommonDisp(cds01,design) > y <- estimateGLMTrendedDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > fit <- glmFit(y, design) > > lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") > topTags(lrt) > detags <- rownames(topTags(lrt)$table) > cpm(y)[detags, order(y$samples$group)] > summary(de <- decideTestsDGE(lrt)) > > lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") > topTags(lrt) > detags <- rownames(topTags(lrt)$table) > cpm(y)[detags, order(y$samples$group)] > summary(de <- decideTestsDGE(lrt)) > > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) > topTags(lrt) > detags <- rownames(topTags(lrt)$table) > cpm(y)[detags, order(y$samples$group)] > summary(de <- decideTestsDGE(lrt)) > Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 4 npc > 8 Disease1 4 ne > 9 Disease2 5 npc > 10 Disease2 5 ne > 11 Disease3 6 npc > 12 Disease3 6 ne > > SUPT1<-read.table("matrix2.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(3,2,length=12) > Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1","Disease2","Disease3")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 1 npc > 8 Disease1 1 ne > 9 Disease2 2 npc > 10 Disease2 2 ne > 11 Disease3 3 npc > 12 Disease3 3 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseDisease2" "DiseaseDisease3" > [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" > [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" > [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46_s1.txt",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > > keep<-rowSums(cpm(cds01)>2)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > > y <- estimateGLMCommonDisp(cds01,design) > Warning message: > In estimateGLMCommonDisp.default(y = y$counts, design = design, : > No residual df: setting dispersion to N > Matri3.txt- my input (gives Error) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Healthy 4 npc > 8 Healthy 4 ne > 9 Disease1 5 npc > 10 Disease1 5 ne > 11 Disease2 6 npc > 12 Disease2 6 ne > 13 Disease2 7 npc > 14 Disease2 7 ne > 15 Disease3 8 npc > 16 Disease3 8 ne > > SUPT1<-read.table("matrix2.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(4,2,length=16) > Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1","Disease2","Disease3")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Healthy 4 npc > 8 Healthy 4 ne > 9 Disease1 1 npc > 10 Disease1 1 ne > 11 Disease2 2 npc > 12 Disease2 2 ne > 13 Disease2 3 npc > 14 Disease2 3 ne > 15 Disease3 4 npc > 16 Disease3 4 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseDisease2" "DiseaseDisease3" > [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" > [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" > [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" > [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" > [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.t xt",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > keep<-rowSums(cpm(cds01)>1)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > y <- estimateGLMCommonDisp(cds01,design) > Error in return(NA, ntags) : multi-argument returns are not permitted > In addition: Warning message: > In estimateGLMTrendedDisp.default(y = y$counts, design = design, : > No residual df: cannot estimate dispersion > > Thanks and regards, > Justin ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENTlink written 4.5 years ago by Gordon Smyth35k
Dear Gordon, Thanks for the code and the clarrification. I'm new to the edgeR your reply helps a lot. I have an issue to execute your code. I'm getting slightly different matrix compare to what you have sent using the same code... I'm getting the "DiseaseHealthy:Treatmentnpc" instead what you get "DiseaseDisease1:Treatmentnpc" also can I know that coefficients of "Treatmentnpc" is genes responding to the npc in healthy patients only? Which one gives the genes respond to the npc in healthy? > samples <-read.table("matrix1.txt",header=TRUE) > samples Disease Patient Treatment 1 Healthy H1 npc 2 Healthy H1 ne 3 Healthy H2 npc 4 Healthy H2 ne 5 Healthy H3 npc 6 Healthy H3 ne 7 Disease1 D4 npc 8 Disease1 D4 ne 9 Disease2 D5 npc 10 Disease2 D5 ne 11 Disease3 D6 npc 12 Disease3 D6 ne > design1 <- model.matrix(~Patient,data=samples) > design2 <- model.matrix(~Disease*Treatment,data=samples) > design <- cbind(design1,design2[,5:8]) > colnames(design) [1] "(Intercept)" "PatientD5" [3] "PatientD6" "PatientH1" [5] "PatientH2" "PatientH3" [7] "Treatmentnpc" "DiseaseDisease2:Treatmentnpc" [9] "DiseaseDisease3:Treatmentnpc" "DiseaseHealthy:Treatmentnpc" Thank you very much. Justin -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Monday, March 24, 2014 12:00 PM To: Justin Jeyakani (GIS) Cc: Bioconductor mailing list Subject: edgeR estimateGLMCommonDisp Error on RNA analysis Dear Justin, You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: > samples Disease Patient Treatment 1 Healthy H1 npc 2 Healthy H1 ne 3 Healthy H2 npc 4 Healthy H2 ne 5 Healthy H3 npc 6 Healthy H3 ne 7 Disease1 D4 npc 8 Disease1 D4 ne 9 Disease2 D5 npc 10 Disease2 D5 ne 11 Disease3 D6 npc 12 Disease3 D6 ne > design1 <- model.matrix(~Patient,data=samples) > design2 <- model.matrix(~Disease*Treatment,data=samples) > design <- cbind(design1,design2[,5:8]) > colnames(design) [1] "(Intercept)" "PatientD5" [3] "PatientD6" "PatientH1" [5] "PatientH2" "PatientH3" [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" Then you can test for coefficients 8, 9 and 10. Best wishes Gordon > Date: Fri, 21 Mar 2014 17:09:37 +0800 > From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis > > Hello Sir, > > I'm doing differential gene exp. by the edgeR package. My data seems > to suitable to analyse by glm method (As per edgeR userguide section > 3.5 Comparisons Both Between and Within Subjects). > > I have two different matrix to design. > Matrix One: > I have 12 dataset 3 samples from different healthy individuals > (Healthy) of two different stages (npc,ne) and 3 samples from > different individuals (Disease1) of two different stages (npc,ne), so > 6 data from each healthy(6) and Disese1 (6) below is my code and works > well. But for my another design matrix it's not! > > Matrix Two: > But I want to comapre the All the Healthy (6data from 3 indiauals) > with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 > diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I do > individual test of 3 disease1 with Healthy but the I'm getting error > in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. I > have the same problem with my another matrix also. Can guide me to > solve this issue highly appreciated ... below is my code for the above > mentiond two matrix...do I need to define anthing in the generate factor Level "gl" > further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set > causing dispersion error. > > Matrix1.txt: my input (works fine) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 4 npc > 8 Disease1 4 ne > 9 Disease1 5 npc > 10 Disease1 5 ne > 11 Disease1 6 npc > 12 Disease1 6 ne > > SUPT1<-read.table("matrix1.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(3,2,length=12) > Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 1 npc > 8 Disease1 1 ne > 9 Disease1 2 npc > 10 Disease1 2 ne > 11 Disease1 3 npc > 12 Disease1 3 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46_s1.txt",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > > keep<-rowSums(cpm(cds01)>2)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > y <- estimateGLMCommonDisp(cds01,design) > y <- estimateGLMTrendedDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > fit <- glmFit(y, design) > > lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") > topTags(lrt) > detags <- rownames(topTags(lrt)$table) cpm(y)[detags, > order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) > > lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") > topTags(lrt) > detags <- rownames(topTags(lrt)$table) cpm(y)[detags, > order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) > > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) > topTags(lrt) > detags <- rownames(topTags(lrt)$table) cpm(y)[detags, > order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) > Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 4 npc > 8 Disease1 4 ne > 9 Disease2 5 npc > 10 Disease2 5 ne > 11 Disease3 6 npc > 12 Disease3 6 ne > > SUPT1<-read.table("matrix2.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(3,2,length=12) > Disease <- factor(SUPT1$Disease, > levels=c("Healthy","Disease1","Disease2","Disease3")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 1 npc > 8 Disease1 1 ne > 9 Disease2 2 npc > 10 Disease2 2 ne > 11 Disease3 3 npc > 12 Disease3 3 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseDisease2" "DiseaseDisease3" > [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" > [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" > [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46_s1.txt",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > > keep<-rowSums(cpm(cds01)>2)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > > y <- estimateGLMCommonDisp(cds01,design) > Warning message: > In estimateGLMCommonDisp.default(y = y$counts, design = design, : > No residual df: setting dispersion to N > Matri3.txt- my input (gives Error) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Healthy 4 npc > 8 Healthy 4 ne > 9 Disease1 5 npc > 10 Disease1 5 ne > 11 Disease2 6 npc > 12 Disease2 6 ne > 13 Disease2 7 npc > 14 Disease2 7 ne > 15 Disease3 8 npc > 16 Disease3 8 ne > > SUPT1<-read.table("matrix2.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(4,2,length=16) > Disease <- factor(SUPT1$Disease, > levels=c("Healthy","Disease1","Disease2","Disease3")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Healthy 4 npc > 8 Healthy 4 ne > 9 Disease1 1 npc > 10 Disease1 1 ne > 11 Disease2 2 npc > 12 Disease2 2 ne > 13 Disease2 3 npc > 14 Disease2 3 ne > 15 Disease3 4 npc > 16 Disease3 4 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseDisease2" "DiseaseDisease3" > [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" > [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" > [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" > [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" > [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.txt > ",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > keep<-rowSums(cpm(cds01)>1)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > y <- estimateGLMCommonDisp(cds01,design) > Error in return(NA, ntags) : multi-argument returns are not permitted > In addition: Warning message: > In estimateGLMTrendedDisp.default(y = y$counts, design = design, : > No residual df: cannot estimate dispersion > > Thanks and regards, > Justin ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:14}}
ADD REPLYlink written 4.5 years ago by Justin Jeyakani GIS60
I assumed that "Healthy" is set as the first level of the factor 'Disease'. I assumed that because that is how you set it up in the original code you posted. You can make this so by typing samples$Disease <- relevel(samples$Disease, ref="Healthy") The coefficient "Treatmentnpc" gives the effect of npc in healthy patients. Gordon On Mon, 24 Mar 2014, Justin Jeyakani (GIS) wrote: > Dear Gordon, > > Thanks for the code and the clarrification. I'm new to the edgeR your reply helps a lot. I have an issue to execute your code. > > I'm getting slightly different matrix compare to what you have sent > using the same code... > I'm getting the "DiseaseHealthy:Treatmentnpc" instead what you get > "DiseaseDisease1:Treatmentnpc" also can I know that coefficients of > "Treatmentnpc" is genes responding to the npc in healthy patients only? > Which one gives the genes respond to the npc in healthy? > >> samples <-read.table("matrix1.txt",header=TRUE) >> samples > Disease Patient Treatment > 1 Healthy H1 npc > 2 Healthy H1 ne > 3 Healthy H2 npc > 4 Healthy H2 ne > 5 Healthy H3 npc > 6 Healthy H3 ne > 7 Disease1 D4 npc > 8 Disease1 D4 ne > 9 Disease2 D5 npc > 10 Disease2 D5 ne > 11 Disease3 D6 npc > 12 Disease3 D6 ne >> design1 <- model.matrix(~Patient,data=samples) >> design2 <- model.matrix(~Disease*Treatment,data=samples) >> design <- cbind(design1,design2[,5:8]) >> colnames(design) > [1] "(Intercept)" "PatientD5" > [3] "PatientD6" "PatientH1" > [5] "PatientH2" "PatientH3" > [7] "Treatmentnpc" "DiseaseDisease2:Treatmentnpc" > [9] "DiseaseDisease3:Treatmentnpc" "DiseaseHealthy:Treatmentnpc" > > Thank you very much. > > Justin > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Monday, March 24, 2014 12:00 PM > To: Justin Jeyakani (GIS) > Cc: Bioconductor mailing list > Subject: edgeR estimateGLMCommonDisp Error on RNA analysis > > Dear Justin, > > You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. > > You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: > > > samples > Disease Patient Treatment > 1 Healthy H1 npc > 2 Healthy H1 ne > 3 Healthy H2 npc > 4 Healthy H2 ne > 5 Healthy H3 npc > 6 Healthy H3 ne > 7 Disease1 D4 npc > 8 Disease1 D4 ne > 9 Disease2 D5 npc > 10 Disease2 D5 ne > 11 Disease3 D6 npc > 12 Disease3 D6 ne > > design1 <- model.matrix(~Patient,data=samples) > > design2 <- model.matrix(~Disease*Treatment,data=samples) > > design <- cbind(design1,design2[,5:8]) > > colnames(design) > [1] "(Intercept)" "PatientD5" > [3] "PatientD6" "PatientH1" > [5] "PatientH2" "PatientH3" > [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" > [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" > > Then you can test for coefficients 8, 9 and 10. > > Best wishes > Gordon > > >> Date: Fri, 21 Mar 2014 17:09:37 +0800 >> From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> >> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >> Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis >> >> Hello Sir, >> >> I'm doing differential gene exp. by the edgeR package. My data seems >> to suitable to analyse by glm method (As per edgeR userguide section >> 3.5 Comparisons Both Between and Within Subjects). >> >> I have two different matrix to design. >> Matrix One: > >> I have 12 dataset 3 samples from different healthy individuals >> (Healthy) of two different stages (npc,ne) and 3 samples from >> different individuals (Disease1) of two different stages (npc,ne), so >> 6 data from each healthy(6) and Disese1 (6) below is my code and works >> well. But for my another design matrix it's not! >> >> Matrix Two: > >> But I want to comapre the All the Healthy (6data from 3 indiauals) >> with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 >> diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I do >> individual test of 3 disease1 with Healthy but the I'm getting error >> in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. I >> have the same problem with my another matrix also. Can guide me to >> solve this issue highly appreciated ... below is my code for the above >> mentiond two matrix...do I need to define anthing in the generate factor Level "gl" >> further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set >> causing dispersion error. >> >> Matrix1.txt: my input (works fine) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Disease1 4 npc >> 8 Disease1 4 ne >> 9 Disease1 5 npc >> 10 Disease1 5 ne >> 11 Disease1 6 npc >> 12 Disease1 6 ne >> >> SUPT1<-read.table("matrix1.txt",header=TRUE) >> summary(SUPT1) >> Patient<-gl(3,2,length=12) >> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) >> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >> >> data.frame(Disease,Patient,Treatment) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Disease1 1 npc >> 8 Disease1 1 ne >> 9 Disease1 2 npc >> 10 Disease1 2 ne >> 11 Disease1 3 npc >> 12 Disease1 3 ne >> >> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >> colnames(design) >> [1] "(Intercept)" "DiseaseDisease1" >> [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >> [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >> [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >> >> library(edgeR) >> count<-read.table("RHN035-46_s1.txt",header=TRUE) >> head (count) >> colnames(count) >> samplename=colnames(count) >> cds01<-DGEList(count,group=samplename) >> head(cds01) >> cds01 >> summary(cds01) >> dim(cds01) >> >> >> keep<-rowSums(cpm(cds01)>2)>=4 >> cds01<-cds01[keep,] >> dim(cds01) >> >> cds01$sample$lib.size<-colSums(cds01$counts) >> y <- estimateGLMCommonDisp(cds01,design) >> y <- estimateGLMTrendedDisp(y,design) >> y <- estimateGLMTagwiseDisp(y,design) >> fit <- glmFit(y, design) >> >> lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") >> topTags(lrt) >> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >> >> lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") >> topTags(lrt) >> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >> >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) >> topTags(lrt) >> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >> Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Disease1 4 npc >> 8 Disease1 4 ne >> 9 Disease2 5 npc >> 10 Disease2 5 ne >> 11 Disease3 6 npc >> 12 Disease3 6 ne >> >> SUPT1<-read.table("matrix2.txt",header=TRUE) >> summary(SUPT1) >> Patient<-gl(3,2,length=12) >> Disease <- factor(SUPT1$Disease, >> levels=c("Healthy","Disease1","Disease2","Disease3")) >> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >> >> data.frame(Disease,Patient,Treatment) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Disease1 1 npc >> 8 Disease1 1 ne >> 9 Disease2 2 npc >> 10 Disease2 2 ne >> 11 Disease3 3 npc >> 12 Disease3 3 ne >> >> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >> colnames(design) >> [1] "(Intercept)" "DiseaseDisease1" >> [3] "DiseaseDisease2" "DiseaseDisease3" >> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >> [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >> [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >> >> library(edgeR) >> count<-read.table("RHN035-46_s1.txt",header=TRUE) >> head (count) >> colnames(count) >> samplename=colnames(count) >> cds01<-DGEList(count,group=samplename) >> head(cds01) >> cds01 >> summary(cds01) >> dim(cds01) >> >> >> keep<-rowSums(cpm(cds01)>2)>=4 >> cds01<-cds01[keep,] >> dim(cds01) >> >> cds01$sample$lib.size<-colSums(cds01$counts) >> >> y <- estimateGLMCommonDisp(cds01,design) >> Warning message: >> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >> No residual df: setting dispersion to N >> Matri3.txt- my input (gives Error) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Healthy 4 npc >> 8 Healthy 4 ne >> 9 Disease1 5 npc >> 10 Disease1 5 ne >> 11 Disease2 6 npc >> 12 Disease2 6 ne >> 13 Disease2 7 npc >> 14 Disease2 7 ne >> 15 Disease3 8 npc >> 16 Disease3 8 ne >> >> SUPT1<-read.table("matrix2.txt",header=TRUE) >> summary(SUPT1) >> Patient<-gl(4,2,length=16) >> Disease <- factor(SUPT1$Disease, >> levels=c("Healthy","Disease1","Disease2","Disease3")) >> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >> >> data.frame(Disease,Patient,Treatment) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Healthy 4 npc >> 8 Healthy 4 ne >> 9 Disease1 1 npc >> 10 Disease1 1 ne >> 11 Disease2 2 npc >> 12 Disease2 2 ne >> 13 Disease2 3 npc >> 14 Disease2 3 ne >> 15 Disease3 4 npc >> 16 Disease3 4 ne >> >> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >> colnames(design) >> [1] "(Intercept)" "DiseaseDisease1" >> [3] "DiseaseDisease2" "DiseaseDisease3" >> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >> [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" >> [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" >> [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >> [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >> >> library(edgeR) >> count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.txt >> ",header=TRUE) >> head (count) >> colnames(count) >> samplename=colnames(count) >> cds01<-DGEList(count,group=samplename) >> head(cds01) >> cds01 >> summary(cds01) >> dim(cds01) >> >> keep<-rowSums(cpm(cds01)>1)>=4 >> cds01<-cds01[keep,] >> dim(cds01) >> >> cds01$sample$lib.size<-colSums(cds01$counts) >> y <- estimateGLMCommonDisp(cds01,design) >> Error in return(NA, ntags) : multi-argument returns are not permitted >> In addition: Warning message: >> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >> No residual df: cannot estimate dispersion >> >> Thanks and regards, >> Justin > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}
ADD REPLYlink written 4.5 years ago by Gordon Smyth35k
Dear Gordon, Thanks you very much for your support. I got my results now. I have one more query about my another matrix... plz see the below is the matrix... which one of the matrix is right to use? I have 4 Healthy individuals (each individual have 2 stages npc and ne) and Disease1 one individual, Disease2 two individual and Disease3 one individual. Matrix1: Disease Patient Treatment 1 Healthy H1 npc 2 Healthy H1 ne 3 Healthy H2 npc 4 Healthy H2 ne 5 Healthy H3 npc 6 Healthy H3 ne 7 Healthy H4 npc 8 Healthy H4 ne 9 Disease1 D5 npc 10 Disease1 D5 ne 11 Disease2 D6 npc 12 Disease2 D6 ne 13 Disease2 D6 npc 14 Disease2 D6 ne 15 Disease3 D7 npc 16 Disease3 D7 ne OR Matrix2: Disease Patient Treatment 1 Healthy H1 npc 2 Healthy H1 ne 3 Healthy H2 npc 4 Healthy H2 ne 5 Healthy H3 npc 6 Healthy H3 ne 7 Healthy H4 npc 8 Healthy H4 ne 9 Disease1 D5 npc 10 Disease1 D5 ne 11 Disease2 D6 npc 12 Disease2 D6 ne 13 Disease2 D7 npc 14 Disease2 D7 ne 15 Disease3 D8 npc 16 Disease3 D8 ne Thank you very much. Justin -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Tuesday, March 25, 2014 9:34 AM To: Justin Jeyakani (GIS) Cc: Bioconductor mailing list Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis I assumed that "Healthy" is set as the first level of the factor 'Disease'. I assumed that because that is how you set it up in the original code you posted. You can make this so by typing samples$Disease <- relevel(samples$Disease, ref="Healthy") The coefficient "Treatmentnpc" gives the effect of npc in healthy patients. Gordon On Mon, 24 Mar 2014, Justin Jeyakani (GIS) wrote: > Dear Gordon, > > Thanks for the code and the clarrification. I'm new to the edgeR your reply helps a lot. I have an issue to execute your code. > > I'm getting slightly different matrix compare to what you have sent > using the same code... > I'm getting the "DiseaseHealthy:Treatmentnpc" instead what you get > "DiseaseDisease1:Treatmentnpc" also can I know that coefficients of > "Treatmentnpc" is genes responding to the npc in healthy patients only? > Which one gives the genes respond to the npc in healthy? > >> samples <-read.table("matrix1.txt",header=TRUE) >> samples > Disease Patient Treatment > 1 Healthy H1 npc > 2 Healthy H1 ne > 3 Healthy H2 npc > 4 Healthy H2 ne > 5 Healthy H3 npc > 6 Healthy H3 ne > 7 Disease1 D4 npc > 8 Disease1 D4 ne > 9 Disease2 D5 npc > 10 Disease2 D5 ne > 11 Disease3 D6 npc > 12 Disease3 D6 ne >> design1 <- model.matrix(~Patient,data=samples) >> design2 <- model.matrix(~Disease*Treatment,data=samples) >> design <- cbind(design1,design2[,5:8]) >> colnames(design) > [1] "(Intercept)" "PatientD5" > [3] "PatientD6" "PatientH1" > [5] "PatientH2" "PatientH3" > [7] "Treatmentnpc" "DiseaseDisease2:Treatmentnpc" > [9] "DiseaseDisease3:Treatmentnpc" "DiseaseHealthy:Treatmentnpc" > > Thank you very much. > > Justin > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Monday, March 24, 2014 12:00 PM > To: Justin Jeyakani (GIS) > Cc: Bioconductor mailing list > Subject: edgeR estimateGLMCommonDisp Error on RNA analysis > > Dear Justin, > > You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. > > You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: > > > samples > Disease Patient Treatment > 1 Healthy H1 npc > 2 Healthy H1 ne > 3 Healthy H2 npc > 4 Healthy H2 ne > 5 Healthy H3 npc > 6 Healthy H3 ne > 7 Disease1 D4 npc > 8 Disease1 D4 ne > 9 Disease2 D5 npc > 10 Disease2 D5 ne > 11 Disease3 D6 npc > 12 Disease3 D6 ne > > design1 <- model.matrix(~Patient,data=samples) > > design2 <- model.matrix(~Disease*Treatment,data=samples) > > design <- cbind(design1,design2[,5:8]) > colnames(design) > [1] "(Intercept)" "PatientD5" > [3] "PatientD6" "PatientH1" > [5] "PatientH2" "PatientH3" > [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" > [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" > > Then you can test for coefficients 8, 9 and 10. > > Best wishes > Gordon > > >> Date: Fri, 21 Mar 2014 17:09:37 +0800 >> From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> >> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >> Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis >> >> Hello Sir, >> >> I'm doing differential gene exp. by the edgeR package. My data seems >> to suitable to analyse by glm method (As per edgeR userguide section >> 3.5 Comparisons Both Between and Within Subjects). >> >> I have two different matrix to design. >> Matrix One: > >> I have 12 dataset 3 samples from different healthy individuals >> (Healthy) of two different stages (npc,ne) and 3 samples from >> different individuals (Disease1) of two different stages (npc,ne), so >> 6 data from each healthy(6) and Disese1 (6) below is my code and >> works well. But for my another design matrix it's not! >> >> Matrix Two: > >> But I want to comapre the All the Healthy (6data from 3 indiauals) >> with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 >> diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I do >> individual test of 3 disease1 with Healthy but the I'm getting error >> in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. I >> have the same problem with my another matrix also. Can guide me to >> solve this issue highly appreciated ... below is my code for the >> above mentiond two matrix...do I need to define anthing in the generate factor Level "gl" >> further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set >> causing dispersion error. >> >> Matrix1.txt: my input (works fine) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Disease1 4 npc >> 8 Disease1 4 ne >> 9 Disease1 5 npc >> 10 Disease1 5 ne >> 11 Disease1 6 npc >> 12 Disease1 6 ne >> >> SUPT1<-read.table("matrix1.txt",header=TRUE) >> summary(SUPT1) >> Patient<-gl(3,2,length=12) >> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) >> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >> >> data.frame(Disease,Patient,Treatment) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Disease1 1 npc >> 8 Disease1 1 ne >> 9 Disease1 2 npc >> 10 Disease1 2 ne >> 11 Disease1 3 npc >> 12 Disease1 3 ne >> >> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >> colnames(design) >> [1] "(Intercept)" "DiseaseDisease1" >> [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >> [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >> [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >> >> library(edgeR) >> count<-read.table("RHN035-46_s1.txt",header=TRUE) >> head (count) >> colnames(count) >> samplename=colnames(count) >> cds01<-DGEList(count,group=samplename) >> head(cds01) >> cds01 >> summary(cds01) >> dim(cds01) >> >> >> keep<-rowSums(cpm(cds01)>2)>=4 >> cds01<-cds01[keep,] >> dim(cds01) >> >> cds01$sample$lib.size<-colSums(cds01$counts) >> y <- estimateGLMCommonDisp(cds01,design) >> y <- estimateGLMTrendedDisp(y,design) y <- >> estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) >> >> lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") >> topTags(lrt) >> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >> >> lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") >> topTags(lrt) >> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >> >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) >> topTags(lrt) >> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >> Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Disease1 4 npc >> 8 Disease1 4 ne >> 9 Disease2 5 npc >> 10 Disease2 5 ne >> 11 Disease3 6 npc >> 12 Disease3 6 ne >> >> SUPT1<-read.table("matrix2.txt",header=TRUE) >> summary(SUPT1) >> Patient<-gl(3,2,length=12) >> Disease <- factor(SUPT1$Disease, >> levels=c("Healthy","Disease1","Disease2","Disease3")) >> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >> >> data.frame(Disease,Patient,Treatment) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Disease1 1 npc >> 8 Disease1 1 ne >> 9 Disease2 2 npc >> 10 Disease2 2 ne >> 11 Disease3 3 npc >> 12 Disease3 3 ne >> >> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >> colnames(design) >> [1] "(Intercept)" "DiseaseDisease1" >> [3] "DiseaseDisease2" "DiseaseDisease3" >> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >> [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >> [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >> >> library(edgeR) >> count<-read.table("RHN035-46_s1.txt",header=TRUE) >> head (count) >> colnames(count) >> samplename=colnames(count) >> cds01<-DGEList(count,group=samplename) >> head(cds01) >> cds01 >> summary(cds01) >> dim(cds01) >> >> >> keep<-rowSums(cpm(cds01)>2)>=4 >> cds01<-cds01[keep,] >> dim(cds01) >> >> cds01$sample$lib.size<-colSums(cds01$counts) >> >> y <- estimateGLMCommonDisp(cds01,design) >> Warning message: >> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >> No residual df: setting dispersion to N >> Matri3.txt- my input (gives Error) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Healthy 4 npc >> 8 Healthy 4 ne >> 9 Disease1 5 npc >> 10 Disease1 5 ne >> 11 Disease2 6 npc >> 12 Disease2 6 ne >> 13 Disease2 7 npc >> 14 Disease2 7 ne >> 15 Disease3 8 npc >> 16 Disease3 8 ne >> >> SUPT1<-read.table("matrix2.txt",header=TRUE) >> summary(SUPT1) >> Patient<-gl(4,2,length=16) >> Disease <- factor(SUPT1$Disease, >> levels=c("Healthy","Disease1","Disease2","Disease3")) >> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >> >> data.frame(Disease,Patient,Treatment) >> Disease Patient Treatment >> 1 Healthy 1 npc >> 2 Healthy 1 ne >> 3 Healthy 2 npc >> 4 Healthy 2 ne >> 5 Healthy 3 npc >> 6 Healthy 3 ne >> 7 Healthy 4 npc >> 8 Healthy 4 ne >> 9 Disease1 1 npc >> 10 Disease1 1 ne >> 11 Disease2 2 npc >> 12 Disease2 2 ne >> 13 Disease2 3 npc >> 14 Disease2 3 ne >> 15 Disease3 4 npc >> 16 Disease3 4 ne >> >> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >> colnames(design) >> [1] "(Intercept)" "DiseaseDisease1" >> [3] "DiseaseDisease2" "DiseaseDisease3" >> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >> [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" >> [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" >> [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >> [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >> >> library(edgeR) >> count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.tx >> t >> ",header=TRUE) >> head (count) >> colnames(count) >> samplename=colnames(count) >> cds01<-DGEList(count,group=samplename) >> head(cds01) >> cds01 >> summary(cds01) >> dim(cds01) >> >> keep<-rowSums(cpm(cds01)>1)>=4 >> cds01<-cds01[keep,] >> dim(cds01) >> >> cds01$sample$lib.size<-colSums(cds01$counts) >> y <- estimateGLMCommonDisp(cds01,design) >> Error in return(NA, ntags) : multi-argument returns are not permitted >> In addition: Warning message: >> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >> No residual df: cannot estimate dispersion >> >> Thanks and regards, >> Justin > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:30}}
ADD REPLYlink written 4.5 years ago by Justin Jeyakani GIS60
Dear Justin, Not sure why you ask this question --- the correct matrix to use is always the one containing correct information. Matrix1 incorrectly identifies two different individuals with Disease2 as the same individual, whereas Matrix2 gives correct information. Best wishes Gordon On Tue, 25 Mar 2014, Justin Jeyakani (GIS) wrote: > Dear Gordon, > > Thanks you very much for your support. I got my results now. > I have one more query about my another matrix... plz see the below is the matrix... > which one of the matrix is right to use? I have 4 Healthy individuals (each individual have 2 stages npc and ne) and Disease1 one individual, Disease2 two individual and Disease3 one individual. > > Matrix1: > Disease Patient Treatment > 1 Healthy H1 npc > 2 Healthy H1 ne > 3 Healthy H2 npc > 4 Healthy H2 ne > 5 Healthy H3 npc > 6 Healthy H3 ne > 7 Healthy H4 npc > 8 Healthy H4 ne > 9 Disease1 D5 npc > 10 Disease1 D5 ne > 11 Disease2 D6 npc > 12 Disease2 D6 ne > 13 Disease2 D6 npc > 14 Disease2 D6 ne > 15 Disease3 D7 npc > 16 Disease3 D7 ne > > OR > > Matrix2: > Disease Patient Treatment > 1 Healthy H1 npc > 2 Healthy H1 ne > 3 Healthy H2 npc > 4 Healthy H2 ne > 5 Healthy H3 npc > 6 Healthy H3 ne > 7 Healthy H4 npc > 8 Healthy H4 ne > 9 Disease1 D5 npc > 10 Disease1 D5 ne > 11 Disease2 D6 npc > 12 Disease2 D6 ne > 13 Disease2 D7 npc > 14 Disease2 D7 ne > 15 Disease3 D8 npc > 16 Disease3 D8 ne > > Thank you very much. > > Justin > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Tuesday, March 25, 2014 9:34 AM > To: Justin Jeyakani (GIS) > Cc: Bioconductor mailing list > Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis > > I assumed that "Healthy" is set as the first level of the factor 'Disease'. I assumed that because that is how you set it up in the original code you posted. > > You can make this so by typing > > samples$Disease <- relevel(samples$Disease, ref="Healthy") > > The coefficient "Treatmentnpc" gives the effect of npc in healthy patients. > > Gordon > > On Mon, 24 Mar 2014, Justin Jeyakani (GIS) wrote: > >> Dear Gordon, >> >> Thanks for the code and the clarrification. I'm new to the edgeR your reply helps a lot. I have an issue to execute your code. >> >> I'm getting slightly different matrix compare to what you have sent >> using the same code... > >> I'm getting the "DiseaseHealthy:Treatmentnpc" instead what you get >> "DiseaseDisease1:Treatmentnpc" also can I know that coefficients of >> "Treatmentnpc" is genes responding to the npc in healthy patients only? >> Which one gives the genes respond to the npc in healthy? >> >>> samples <-read.table("matrix1.txt",header=TRUE) >>> samples >> Disease Patient Treatment >> 1 Healthy H1 npc >> 2 Healthy H1 ne >> 3 Healthy H2 npc >> 4 Healthy H2 ne >> 5 Healthy H3 npc >> 6 Healthy H3 ne >> 7 Disease1 D4 npc >> 8 Disease1 D4 ne >> 9 Disease2 D5 npc >> 10 Disease2 D5 ne >> 11 Disease3 D6 npc >> 12 Disease3 D6 ne >>> design1 <- model.matrix(~Patient,data=samples) >>> design2 <- model.matrix(~Disease*Treatment,data=samples) >>> design <- cbind(design1,design2[,5:8]) >>> colnames(design) >> [1] "(Intercept)" "PatientD5" >> [3] "PatientD6" "PatientH1" >> [5] "PatientH2" "PatientH3" >> [7] "Treatmentnpc" "DiseaseDisease2:Treatmentnpc" >> [9] "DiseaseDisease3:Treatmentnpc" "DiseaseHealthy:Treatmentnpc" >> >> Thank you very much. >> >> Justin >> >> -----Original Message----- >> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >> Sent: Monday, March 24, 2014 12:00 PM >> To: Justin Jeyakani (GIS) >> Cc: Bioconductor mailing list >> Subject: edgeR estimateGLMCommonDisp Error on RNA analysis >> >> Dear Justin, >> >> You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. >> >> You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: >> >> > samples >> Disease Patient Treatment >> 1 Healthy H1 npc >> 2 Healthy H1 ne >> 3 Healthy H2 npc >> 4 Healthy H2 ne >> 5 Healthy H3 npc >> 6 Healthy H3 ne >> 7 Disease1 D4 npc >> 8 Disease1 D4 ne >> 9 Disease2 D5 npc >> 10 Disease2 D5 ne >> 11 Disease3 D6 npc >> 12 Disease3 D6 ne >> > design1 <- model.matrix(~Patient,data=samples) >> > design2 <- model.matrix(~Disease*Treatment,data=samples) >> > design <- cbind(design1,design2[,5:8]) > colnames(design) >> [1] "(Intercept)" "PatientD5" >> [3] "PatientD6" "PatientH1" >> [5] "PatientH2" "PatientH3" >> [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" >> [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" >> >> Then you can test for coefficients 8, 9 and 10. >> >> Best wishes >> Gordon >> >> >>> Date: Fri, 21 Mar 2014 17:09:37 +0800 >>> From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> >>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>> Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis >>> >>> Hello Sir, >>> >>> I'm doing differential gene exp. by the edgeR package. My data seems >>> to suitable to analyse by glm method (As per edgeR userguide section >>> 3.5 Comparisons Both Between and Within Subjects). >>> >>> I have two different matrix to design. >>> Matrix One: >> >>> I have 12 dataset 3 samples from different healthy individuals >>> (Healthy) of two different stages (npc,ne) and 3 samples from >>> different individuals (Disease1) of two different stages (npc,ne), so >>> 6 data from each healthy(6) and Disese1 (6) below is my code and >>> works well. But for my another design matrix it's not! >>> >>> Matrix Two: >> >>> But I want to comapre the All the Healthy (6data from 3 indiauals) >>> with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 >>> diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I do >>> individual test of 3 disease1 with Healthy but the I'm getting error >>> in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. I >>> have the same problem with my another matrix also. Can guide me to >>> solve this issue highly appreciated ... below is my code for the >>> above mentiond two matrix...do I need to define anthing in the generate factor Level "gl" >>> further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set >>> causing dispersion error. >>> >>> Matrix1.txt: my input (works fine) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Disease1 4 npc >>> 8 Disease1 4 ne >>> 9 Disease1 5 npc >>> 10 Disease1 5 ne >>> 11 Disease1 6 npc >>> 12 Disease1 6 ne >>> >>> SUPT1<-read.table("matrix1.txt",header=TRUE) >>> summary(SUPT1) >>> Patient<-gl(3,2,length=12) >>> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) >>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>> >>> data.frame(Disease,Patient,Treatment) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Disease1 1 npc >>> 8 Disease1 1 ne >>> 9 Disease1 2 npc >>> 10 Disease1 2 ne >>> 11 Disease1 3 npc >>> 12 Disease1 3 ne >>> >>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>> colnames(design) >>> [1] "(Intercept)" "DiseaseDisease1" >>> [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>> [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>> [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>> >>> library(edgeR) >>> count<-read.table("RHN035-46_s1.txt",header=TRUE) >>> head (count) >>> colnames(count) >>> samplename=colnames(count) >>> cds01<-DGEList(count,group=samplename) >>> head(cds01) >>> cds01 >>> summary(cds01) >>> dim(cds01) >>> >>> >>> keep<-rowSums(cpm(cds01)>2)>=4 >>> cds01<-cds01[keep,] >>> dim(cds01) >>> >>> cds01$sample$lib.size<-colSums(cds01$counts) >>> y <- estimateGLMCommonDisp(cds01,design) >>> y <- estimateGLMTrendedDisp(y,design) y <- >>> estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) >>> >>> lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") >>> topTags(lrt) >>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>> >>> lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") >>> topTags(lrt) >>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>> >>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) >>> topTags(lrt) >>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>> Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Disease1 4 npc >>> 8 Disease1 4 ne >>> 9 Disease2 5 npc >>> 10 Disease2 5 ne >>> 11 Disease3 6 npc >>> 12 Disease3 6 ne >>> >>> SUPT1<-read.table("matrix2.txt",header=TRUE) >>> summary(SUPT1) >>> Patient<-gl(3,2,length=12) >>> Disease <- factor(SUPT1$Disease, >>> levels=c("Healthy","Disease1","Disease2","Disease3")) >>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>> >>> data.frame(Disease,Patient,Treatment) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Disease1 1 npc >>> 8 Disease1 1 ne >>> 9 Disease2 2 npc >>> 10 Disease2 2 ne >>> 11 Disease3 3 npc >>> 12 Disease3 3 ne >>> >>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>> colnames(design) >>> [1] "(Intercept)" "DiseaseDisease1" >>> [3] "DiseaseDisease2" "DiseaseDisease3" >>> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >>> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >>> [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>> [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >>> >>> library(edgeR) >>> count<-read.table("RHN035-46_s1.txt",header=TRUE) >>> head (count) >>> colnames(count) >>> samplename=colnames(count) >>> cds01<-DGEList(count,group=samplename) >>> head(cds01) >>> cds01 >>> summary(cds01) >>> dim(cds01) >>> >>> >>> keep<-rowSums(cpm(cds01)>2)>=4 >>> cds01<-cds01[keep,] >>> dim(cds01) >>> >>> cds01$sample$lib.size<-colSums(cds01$counts) >>> >>> y <- estimateGLMCommonDisp(cds01,design) >>> Warning message: >>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>> No residual df: setting dispersion to N >>> Matri3.txt- my input (gives Error) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Healthy 4 npc >>> 8 Healthy 4 ne >>> 9 Disease1 5 npc >>> 10 Disease1 5 ne >>> 11 Disease2 6 npc >>> 12 Disease2 6 ne >>> 13 Disease2 7 npc >>> 14 Disease2 7 ne >>> 15 Disease3 8 npc >>> 16 Disease3 8 ne >>> >>> SUPT1<-read.table("matrix2.txt",header=TRUE) >>> summary(SUPT1) >>> Patient<-gl(4,2,length=16) >>> Disease <- factor(SUPT1$Disease, >>> levels=c("Healthy","Disease1","Disease2","Disease3")) >>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>> >>> data.frame(Disease,Patient,Treatment) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Healthy 4 npc >>> 8 Healthy 4 ne >>> 9 Disease1 1 npc >>> 10 Disease1 1 ne >>> 11 Disease2 2 npc >>> 12 Disease2 2 ne >>> 13 Disease2 3 npc >>> 14 Disease2 3 ne >>> 15 Disease3 4 npc >>> 16 Disease3 4 ne >>> >>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>> colnames(design) >>> [1] "(Intercept)" "DiseaseDisease1" >>> [3] "DiseaseDisease2" "DiseaseDisease3" >>> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >>> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >>> [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" >>> [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" >>> [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>> [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >>> >>> library(edgeR) >>> count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.tx >>> t >>> ",header=TRUE) >>> head (count) >>> colnames(count) >>> samplename=colnames(count) >>> cds01<-DGEList(count,group=samplename) >>> head(cds01) >>> cds01 >>> summary(cds01) >>> dim(cds01) >>> >>> keep<-rowSums(cpm(cds01)>1)>=4 >>> cds01<-cds01[keep,] >>> dim(cds01) >>> >>> cds01$sample$lib.size<-colSums(cds01$counts) >>> y <- estimateGLMCommonDisp(cds01,design) >>> Error in return(NA, ntags) : multi-argument returns are not permitted >>> In addition: Warning message: >>> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >>> No residual df: cannot estimate dispersion >>> >>> Thanks and regards, >>> Justin ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLYlink written 4.5 years ago by Gordon Smyth35k
Dear Gordon, Yes you are right. Thank you much for all my queries. Highly appreciated your help. Justin -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Wednesday, March 26, 2014 7:32 AM To: Justin Jeyakani (GIS) Cc: Bioconductor mailing list Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis Dear Justin, Not sure why you ask this question --- the correct matrix to use is always the one containing correct information. Matrix1 incorrectly identifies two different individuals with Disease2 as the same individual, whereas Matrix2 gives correct information. Best wishes Gordon On Tue, 25 Mar 2014, Justin Jeyakani (GIS) wrote: > Dear Gordon, > > Thanks you very much for your support. I got my results now. > I have one more query about my another matrix... plz see the below is the matrix... > which one of the matrix is right to use? I have 4 Healthy individuals (each individual have 2 stages npc and ne) and Disease1 one individual, Disease2 two individual and Disease3 one individual. > > Matrix1: > Disease Patient Treatment > 1 Healthy H1 npc > 2 Healthy H1 ne > 3 Healthy H2 npc > 4 Healthy H2 ne > 5 Healthy H3 npc > 6 Healthy H3 ne > 7 Healthy H4 npc > 8 Healthy H4 ne > 9 Disease1 D5 npc > 10 Disease1 D5 ne > 11 Disease2 D6 npc > 12 Disease2 D6 ne > 13 Disease2 D6 npc > 14 Disease2 D6 ne > 15 Disease3 D7 npc > 16 Disease3 D7 ne > > OR > > Matrix2: > Disease Patient Treatment > 1 Healthy H1 npc > 2 Healthy H1 ne > 3 Healthy H2 npc > 4 Healthy H2 ne > 5 Healthy H3 npc > 6 Healthy H3 ne > 7 Healthy H4 npc > 8 Healthy H4 ne > 9 Disease1 D5 npc > 10 Disease1 D5 ne > 11 Disease2 D6 npc > 12 Disease2 D6 ne > 13 Disease2 D7 npc > 14 Disease2 D7 ne > 15 Disease3 D8 npc > 16 Disease3 D8 ne > > Thank you very much. > > Justin > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Tuesday, March 25, 2014 9:34 AM > To: Justin Jeyakani (GIS) > Cc: Bioconductor mailing list > Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis > > I assumed that "Healthy" is set as the first level of the factor 'Disease'. I assumed that because that is how you set it up in the original code you posted. > > You can make this so by typing > > samples$Disease <- relevel(samples$Disease, ref="Healthy") > > The coefficient "Treatmentnpc" gives the effect of npc in healthy patients. > > Gordon > > On Mon, 24 Mar 2014, Justin Jeyakani (GIS) wrote: > >> Dear Gordon, >> >> Thanks for the code and the clarrification. I'm new to the edgeR your reply helps a lot. I have an issue to execute your code. >> >> I'm getting slightly different matrix compare to what you have sent >> using the same code... > >> I'm getting the "DiseaseHealthy:Treatmentnpc" instead what you get >> "DiseaseDisease1:Treatmentnpc" also can I know that coefficients of >> "Treatmentnpc" is genes responding to the npc in healthy patients only? >> Which one gives the genes respond to the npc in healthy? >> >>> samples <-read.table("matrix1.txt",header=TRUE) >>> samples >> Disease Patient Treatment >> 1 Healthy H1 npc >> 2 Healthy H1 ne >> 3 Healthy H2 npc >> 4 Healthy H2 ne >> 5 Healthy H3 npc >> 6 Healthy H3 ne >> 7 Disease1 D4 npc >> 8 Disease1 D4 ne >> 9 Disease2 D5 npc >> 10 Disease2 D5 ne >> 11 Disease3 D6 npc >> 12 Disease3 D6 ne >>> design1 <- model.matrix(~Patient,data=samples) >>> design2 <- model.matrix(~Disease*Treatment,data=samples) >>> design <- cbind(design1,design2[,5:8]) >>> colnames(design) >> [1] "(Intercept)" "PatientD5" >> [3] "PatientD6" "PatientH1" >> [5] "PatientH2" "PatientH3" >> [7] "Treatmentnpc" "DiseaseDisease2:Treatmentnpc" >> [9] "DiseaseDisease3:Treatmentnpc" "DiseaseHealthy:Treatmentnpc" >> >> Thank you very much. >> >> Justin >> >> -----Original Message----- >> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >> Sent: Monday, March 24, 2014 12:00 PM >> To: Justin Jeyakani (GIS) >> Cc: Bioconductor mailing list >> Subject: edgeR estimateGLMCommonDisp Error on RNA analysis >> >> Dear Justin, >> >> You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. >> >> You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: >> >> > samples >> Disease Patient Treatment >> 1 Healthy H1 npc >> 2 Healthy H1 ne >> 3 Healthy H2 npc >> 4 Healthy H2 ne >> 5 Healthy H3 npc >> 6 Healthy H3 ne >> 7 Disease1 D4 npc >> 8 Disease1 D4 ne >> 9 Disease2 D5 npc >> 10 Disease2 D5 ne >> 11 Disease3 D6 npc >> 12 Disease3 D6 ne >> > design1 <- model.matrix(~Patient,data=samples) >> > design2 <- model.matrix(~Disease*Treatment,data=samples) >> > design <- cbind(design1,design2[,5:8]) > colnames(design) >> [1] "(Intercept)" "PatientD5" >> [3] "PatientD6" "PatientH1" >> [5] "PatientH2" "PatientH3" >> [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" >> [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" >> >> Then you can test for coefficients 8, 9 and 10. >> >> Best wishes >> Gordon >> >> >>> Date: Fri, 21 Mar 2014 17:09:37 +0800 >>> From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> >>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>> Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis >>> >>> Hello Sir, >>> >>> I'm doing differential gene exp. by the edgeR package. My data seems >>> to suitable to analyse by glm method (As per edgeR userguide section >>> 3.5 Comparisons Both Between and Within Subjects). >>> >>> I have two different matrix to design. >>> Matrix One: >> >>> I have 12 dataset 3 samples from different healthy individuals >>> (Healthy) of two different stages (npc,ne) and 3 samples from >>> different individuals (Disease1) of two different stages (npc,ne), >>> so >>> 6 data from each healthy(6) and Disese1 (6) below is my code and >>> works well. But for my another design matrix it's not! >>> >>> Matrix Two: >> >>> But I want to comapre the All the Healthy (6data from 3 indiauals) >>> with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 >>> diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I do >>> individual test of 3 disease1 with Healthy but the I'm getting error >>> in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. >>> I have the same problem with my another matrix also. Can guide me to >>> solve this issue highly appreciated ... below is my code for the >>> above mentiond two matrix...do I need to define anthing in the generate factor Level "gl" >>> further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set >>> causing dispersion error. >>> >>> Matrix1.txt: my input (works fine) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Disease1 4 npc >>> 8 Disease1 4 ne >>> 9 Disease1 5 npc >>> 10 Disease1 5 ne >>> 11 Disease1 6 npc >>> 12 Disease1 6 ne >>> >>> SUPT1<-read.table("matrix1.txt",header=TRUE) >>> summary(SUPT1) >>> Patient<-gl(3,2,length=12) >>> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) >>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>> >>> data.frame(Disease,Patient,Treatment) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Disease1 1 npc >>> 8 Disease1 1 ne >>> 9 Disease1 2 npc >>> 10 Disease1 2 ne >>> 11 Disease1 3 npc >>> 12 Disease1 3 ne >>> >>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>> colnames(design) >>> [1] "(Intercept)" "DiseaseDisease1" >>> [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>> [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>> [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>> >>> library(edgeR) >>> count<-read.table("RHN035-46_s1.txt",header=TRUE) >>> head (count) >>> colnames(count) >>> samplename=colnames(count) >>> cds01<-DGEList(count,group=samplename) >>> head(cds01) >>> cds01 >>> summary(cds01) >>> dim(cds01) >>> >>> >>> keep<-rowSums(cpm(cds01)>2)>=4 >>> cds01<-cds01[keep,] >>> dim(cds01) >>> >>> cds01$sample$lib.size<-colSums(cds01$counts) >>> y <- estimateGLMCommonDisp(cds01,design) >>> y <- estimateGLMTrendedDisp(y,design) y <- >>> estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) >>> >>> lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") >>> topTags(lrt) >>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>> >>> lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") >>> topTags(lrt) >>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>> >>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) >>> topTags(lrt) >>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>> Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Disease1 4 npc >>> 8 Disease1 4 ne >>> 9 Disease2 5 npc >>> 10 Disease2 5 ne >>> 11 Disease3 6 npc >>> 12 Disease3 6 ne >>> >>> SUPT1<-read.table("matrix2.txt",header=TRUE) >>> summary(SUPT1) >>> Patient<-gl(3,2,length=12) >>> Disease <- factor(SUPT1$Disease, >>> levels=c("Healthy","Disease1","Disease2","Disease3")) >>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>> >>> data.frame(Disease,Patient,Treatment) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Disease1 1 npc >>> 8 Disease1 1 ne >>> 9 Disease2 2 npc >>> 10 Disease2 2 ne >>> 11 Disease3 3 npc >>> 12 Disease3 3 ne >>> >>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>> colnames(design) >>> [1] "(Intercept)" "DiseaseDisease1" >>> [3] "DiseaseDisease2" "DiseaseDisease3" >>> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >>> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >>> [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>> [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >>> >>> library(edgeR) >>> count<-read.table("RHN035-46_s1.txt",header=TRUE) >>> head (count) >>> colnames(count) >>> samplename=colnames(count) >>> cds01<-DGEList(count,group=samplename) >>> head(cds01) >>> cds01 >>> summary(cds01) >>> dim(cds01) >>> >>> >>> keep<-rowSums(cpm(cds01)>2)>=4 >>> cds01<-cds01[keep,] >>> dim(cds01) >>> >>> cds01$sample$lib.size<-colSums(cds01$counts) >>> >>> y <- estimateGLMCommonDisp(cds01,design) >>> Warning message: >>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>> No residual df: setting dispersion to N >>> Matri3.txt- my input (gives Error) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Healthy 4 npc >>> 8 Healthy 4 ne >>> 9 Disease1 5 npc >>> 10 Disease1 5 ne >>> 11 Disease2 6 npc >>> 12 Disease2 6 ne >>> 13 Disease2 7 npc >>> 14 Disease2 7 ne >>> 15 Disease3 8 npc >>> 16 Disease3 8 ne >>> >>> SUPT1<-read.table("matrix2.txt",header=TRUE) >>> summary(SUPT1) >>> Patient<-gl(4,2,length=16) >>> Disease <- factor(SUPT1$Disease, >>> levels=c("Healthy","Disease1","Disease2","Disease3")) >>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>> >>> data.frame(Disease,Patient,Treatment) >>> Disease Patient Treatment >>> 1 Healthy 1 npc >>> 2 Healthy 1 ne >>> 3 Healthy 2 npc >>> 4 Healthy 2 ne >>> 5 Healthy 3 npc >>> 6 Healthy 3 ne >>> 7 Healthy 4 npc >>> 8 Healthy 4 ne >>> 9 Disease1 1 npc >>> 10 Disease1 1 ne >>> 11 Disease2 2 npc >>> 12 Disease2 2 ne >>> 13 Disease2 3 npc >>> 14 Disease2 3 ne >>> 15 Disease3 4 npc >>> 16 Disease3 4 ne >>> >>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>> colnames(design) >>> [1] "(Intercept)" "DiseaseDisease1" >>> [3] "DiseaseDisease2" "DiseaseDisease3" >>> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >>> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >>> [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" >>> [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" >>> [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>> [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >>> >>> library(edgeR) >>> count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.t >>> x >>> t >>> ",header=TRUE) >>> head (count) >>> colnames(count) >>> samplename=colnames(count) >>> cds01<-DGEList(count,group=samplename) >>> head(cds01) >>> cds01 >>> summary(cds01) >>> dim(cds01) >>> >>> keep<-rowSums(cpm(cds01)>1)>=4 >>> cds01<-cds01[keep,] >>> dim(cds01) >>> >>> cds01$sample$lib.size<-colSums(cds01$counts) >>> y <- estimateGLMCommonDisp(cds01,design) >>> Error in return(NA, ntags) : multi-argument returns are not >>> permitted In addition: Warning message: >>> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >>> No residual df: cannot estimate dispersion >>> >>> Thanks and regards, >>> Justin ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:14}}
ADD REPLYlink written 4.5 years ago by Justin Jeyakani GIS60
0
gravatar for Justin Jeyakani GIS
4.5 years ago by
Justin Jeyakani GIS60 wrote:
Dear Gordon, One more query of the followup. Sorry to bugg you again? Just for case how to call the "ne" instead of "npc" like.. [1] "(Intercept)" "PatientD5" [3] "PatientD6" "PatientH1" [5] "PatientH2" "PatientH3" [7] "Treatmentne" "DiseaseDisease1:Treatmentne" [9] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" Instead [1] "(Intercept)" "PatientD5" [3] "PatientD6" "PatientH1" [5] "PatientH2" "PatientH3" [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" Thanks, Justin -----Original Message----- From: Justin Jeyakani (GIS) Sent: Monday, March 24, 2014 5:23 PM To: 'Gordon K Smyth' Cc: Bioconductor mailing list Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis Dear Gordon, Thanks for the code and the clarrification. I'm new to the edgeR your reply helps a lot. I have an issue to execute your code. I'm getting slightly different matrix compare to what you have sent using the same code... I'm getting the "DiseaseHealthy:Treatmentnpc" instead what you get "DiseaseDisease1:Treatmentnpc" also can I know that coefficients of "Treatmentnpc" is genes responding to the npc in healthy patients only? Which one gives the genes respond to the npc in healthy? > samples <-read.table("matrix1.txt",header=TRUE) > samples Disease Patient Treatment 1 Healthy H1 npc 2 Healthy H1 ne 3 Healthy H2 npc 4 Healthy H2 ne 5 Healthy H3 npc 6 Healthy H3 ne 7 Disease1 D4 npc 8 Disease1 D4 ne 9 Disease2 D5 npc 10 Disease2 D5 ne 11 Disease3 D6 npc 12 Disease3 D6 ne > design1 <- model.matrix(~Patient,data=samples) > design2 <- model.matrix(~Disease*Treatment,data=samples) > design <- cbind(design1,design2[,5:8]) > colnames(design) [1] "(Intercept)" "PatientD5" [3] "PatientD6" "PatientH1" [5] "PatientH2" "PatientH3" [7] "Treatmentnpc" "DiseaseDisease2:Treatmentnpc" [9] "DiseaseDisease3:Treatmentnpc" "DiseaseHealthy:Treatmentnpc" Thank you very much. Justin -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Monday, March 24, 2014 12:00 PM To: Justin Jeyakani (GIS) Cc: Bioconductor mailing list Subject: edgeR estimateGLMCommonDisp Error on RNA analysis Dear Justin, You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: > samples Disease Patient Treatment 1 Healthy H1 npc 2 Healthy H1 ne 3 Healthy H2 npc 4 Healthy H2 ne 5 Healthy H3 npc 6 Healthy H3 ne 7 Disease1 D4 npc 8 Disease1 D4 ne 9 Disease2 D5 npc 10 Disease2 D5 ne 11 Disease3 D6 npc 12 Disease3 D6 ne > design1 <- model.matrix(~Patient,data=samples) > design2 <- model.matrix(~Disease*Treatment,data=samples) > design <- cbind(design1,design2[,5:8]) > colnames(design) [1] "(Intercept)" "PatientD5" [3] "PatientD6" "PatientH1" [5] "PatientH2" "PatientH3" [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" Then you can test for coefficients 8, 9 and 10. Best wishes Gordon > Date: Fri, 21 Mar 2014 17:09:37 +0800 > From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis > > Hello Sir, > > I'm doing differential gene exp. by the edgeR package. My data seems > to suitable to analyse by glm method (As per edgeR userguide section > 3.5 Comparisons Both Between and Within Subjects). > > I have two different matrix to design. > Matrix One: > I have 12 dataset 3 samples from different healthy individuals > (Healthy) of two different stages (npc,ne) and 3 samples from > different individuals (Disease1) of two different stages (npc,ne), so > 6 data from each healthy(6) and Disese1 (6) below is my code and works > well. But for my another design matrix it's not! > > Matrix Two: > But I want to comapre the All the Healthy (6data from 3 indiauals) > with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 > diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I do > individual test of 3 disease1 with Healthy but the I'm getting error > in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. I > have the same problem with my another matrix also. Can guide me to > solve this issue highly appreciated ... below is my code for the above > mentiond two matrix...do I need to define anthing in the generate factor Level "gl" > further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set > causing dispersion error. > > Matrix1.txt: my input (works fine) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 4 npc > 8 Disease1 4 ne > 9 Disease1 5 npc > 10 Disease1 5 ne > 11 Disease1 6 npc > 12 Disease1 6 ne > > SUPT1<-read.table("matrix1.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(3,2,length=12) > Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 1 npc > 8 Disease1 1 ne > 9 Disease1 2 npc > 10 Disease1 2 ne > 11 Disease1 3 npc > 12 Disease1 3 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46_s1.txt",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > > keep<-rowSums(cpm(cds01)>2)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > y <- estimateGLMCommonDisp(cds01,design) > y <- estimateGLMTrendedDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > fit <- glmFit(y, design) > > lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") > topTags(lrt) > detags <- rownames(topTags(lrt)$table) cpm(y)[detags, > order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) > > lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") > topTags(lrt) > detags <- rownames(topTags(lrt)$table) cpm(y)[detags, > order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) > > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) > topTags(lrt) > detags <- rownames(topTags(lrt)$table) cpm(y)[detags, > order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) > Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 4 npc > 8 Disease1 4 ne > 9 Disease2 5 npc > 10 Disease2 5 ne > 11 Disease3 6 npc > 12 Disease3 6 ne > > SUPT1<-read.table("matrix2.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(3,2,length=12) > Disease <- factor(SUPT1$Disease, > levels=c("Healthy","Disease1","Disease2","Disease3")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Disease1 1 npc > 8 Disease1 1 ne > 9 Disease2 2 npc > 10 Disease2 2 ne > 11 Disease3 3 npc > 12 Disease3 3 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseDisease2" "DiseaseDisease3" > [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" > [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" > [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46_s1.txt",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > > keep<-rowSums(cpm(cds01)>2)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > > y <- estimateGLMCommonDisp(cds01,design) > Warning message: > In estimateGLMCommonDisp.default(y = y$counts, design = design, : > No residual df: setting dispersion to N > Matri3.txt- my input (gives Error) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Healthy 4 npc > 8 Healthy 4 ne > 9 Disease1 5 npc > 10 Disease1 5 ne > 11 Disease2 6 npc > 12 Disease2 6 ne > 13 Disease2 7 npc > 14 Disease2 7 ne > 15 Disease3 8 npc > 16 Disease3 8 ne > > SUPT1<-read.table("matrix2.txt",header=TRUE) > summary(SUPT1) > Patient<-gl(4,2,length=16) > Disease <- factor(SUPT1$Disease, > levels=c("Healthy","Disease1","Disease2","Disease3")) > Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) > > data.frame(Disease,Patient,Treatment) > Disease Patient Treatment > 1 Healthy 1 npc > 2 Healthy 1 ne > 3 Healthy 2 npc > 4 Healthy 2 ne > 5 Healthy 3 npc > 6 Healthy 3 ne > 7 Healthy 4 npc > 8 Healthy 4 ne > 9 Disease1 1 npc > 10 Disease1 1 ne > 11 Disease2 2 npc > 12 Disease2 2 ne > 13 Disease2 3 npc > 14 Disease2 3 ne > 15 Disease3 4 npc > 16 Disease3 4 ne > > design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) > colnames(design) > [1] "(Intercept)" "DiseaseDisease1" > [3] "DiseaseDisease2" "DiseaseDisease3" > [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" > [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" > [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" > [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" > [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" > [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" > [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" > [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" > > library(edgeR) > count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.txt > ",header=TRUE) > head (count) > colnames(count) > samplename=colnames(count) > cds01<-DGEList(count,group=samplename) > head(cds01) > cds01 > summary(cds01) > dim(cds01) > > keep<-rowSums(cpm(cds01)>1)>=4 > cds01<-cds01[keep,] > dim(cds01) > > cds01$sample$lib.size<-colSums(cds01$counts) > y <- estimateGLMCommonDisp(cds01,design) > Error in return(NA, ntags) : multi-argument returns are not permitted > In addition: Warning message: > In estimateGLMTrendedDisp.default(y = y$counts, design = design, : > No residual df: cannot estimate dispersion > > Thanks and regards, > Justin ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:14}}
ADD COMMENTlink written 4.5 years ago by Justin Jeyakani GIS60
0
gravatar for Gordon Smyth
4.5 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:
Justin, By default, model.matrix() in R always creates the design matrix in such a way that each level of a factor is compared back to the first level. So type levels(samples$Treatment) to see which level is first and which is second. You will find that you have "ne" first and "npc" second, which is alphabetical order. This means that genes up are up/down in "npc" relative to "ne". Have a think about this, and experiment with some simple matrices, and the answers to your questions will become clear. A couple of posts ago, I told you a way to change the level which comes first for a factor. Gordon On Thu, 27 Mar 2014, Justin Jeyakani (GIS) wrote: > Dear Gordon, > > > If I undestand correctly the following > "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" > gives genes responding to the Treatmentnpc in patients and you confirmed that in the previous mail. > > For the above, genes are up/down regulated in "ne" compared to "npc" right? > > If it's "npc" comapared to "ne" then what changes needs to be done in the code to get "ne" instead of calling "npc", i.e. like this, > "Treatmentne" "DiseaseDisease1:Treatmentne" "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" > > My code in the first thread of this mail gives "Treatmentne" "DiseaseDisease1:Treatmentne" "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne". > Because the first stage is npc (neural progenitor cells) and the next stage is ne (neurons). > > > Thanks you. > Justin > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Wednesday, March 26, 2014 7:32 AM > To: Justin Jeyakani (GIS) > Cc: Bioconductor mailing list > Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis > > Dear Justin, > > Not sure why you ask this question --- the correct matrix to use is always the one containing correct information. > > Matrix1 incorrectly identifies two different individuals with Disease2 as the same individual, whereas Matrix2 gives correct information. > > Best wishes > Gordon > > On Tue, 25 Mar 2014, Justin Jeyakani (GIS) wrote: > >> Dear Gordon, >> >> Thanks you very much for your support. I got my results now. >> I have one more query about my another matrix... plz see the below is the matrix... >> which one of the matrix is right to use? I have 4 Healthy individuals (each individual have 2 stages npc and ne) and Disease1 one individual, Disease2 two individual and Disease3 one individual. >> >> Matrix1: >> Disease Patient Treatment >> 1 Healthy H1 npc >> 2 Healthy H1 ne >> 3 Healthy H2 npc >> 4 Healthy H2 ne >> 5 Healthy H3 npc >> 6 Healthy H3 ne >> 7 Healthy H4 npc >> 8 Healthy H4 ne >> 9 Disease1 D5 npc >> 10 Disease1 D5 ne >> 11 Disease2 D6 npc >> 12 Disease2 D6 ne >> 13 Disease2 D6 npc >> 14 Disease2 D6 ne >> 15 Disease3 D7 npc >> 16 Disease3 D7 ne >> >> OR >> >> Matrix2: >> Disease Patient Treatment >> 1 Healthy H1 npc >> 2 Healthy H1 ne >> 3 Healthy H2 npc >> 4 Healthy H2 ne >> 5 Healthy H3 npc >> 6 Healthy H3 ne >> 7 Healthy H4 npc >> 8 Healthy H4 ne >> 9 Disease1 D5 npc >> 10 Disease1 D5 ne >> 11 Disease2 D6 npc >> 12 Disease2 D6 ne >> 13 Disease2 D7 npc >> 14 Disease2 D7 ne >> 15 Disease3 D8 npc >> 16 Disease3 D8 ne >> >> Thank you very much. >> >> Justin >> >> -----Original Message----- >> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >> Sent: Tuesday, March 25, 2014 9:34 AM >> To: Justin Jeyakani (GIS) >> Cc: Bioconductor mailing list >> Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis >> >> I assumed that "Healthy" is set as the first level of the factor 'Disease'. I assumed that because that is how you set it up in the original code you posted. >> >> You can make this so by typing >> >> samples$Disease <- relevel(samples$Disease, ref="Healthy") >> >> The coefficient "Treatmentnpc" gives the effect of npc in healthy patients. >> >> Gordon >> >> On Mon, 24 Mar 2014, Justin Jeyakani (GIS) wrote: >> >>> Dear Gordon, >>> >>> Thanks for the code and the clarrification. I'm new to the edgeR your reply helps a lot. I have an issue to execute your code. >>> >>> I'm getting slightly different matrix compare to what you have sent >>> using the same code... >> >>> I'm getting the "DiseaseHealthy:Treatmentnpc" instead what you get >>> "DiseaseDisease1:Treatmentnpc" also can I know that coefficients of >>> "Treatmentnpc" is genes responding to the npc in healthy patients only? >>> Which one gives the genes respond to the npc in healthy? >>> >>>> samples <-read.table("matrix1.txt",header=TRUE) >>>> samples >>> Disease Patient Treatment >>> 1 Healthy H1 npc >>> 2 Healthy H1 ne >>> 3 Healthy H2 npc >>> 4 Healthy H2 ne >>> 5 Healthy H3 npc >>> 6 Healthy H3 ne >>> 7 Disease1 D4 npc >>> 8 Disease1 D4 ne >>> 9 Disease2 D5 npc >>> 10 Disease2 D5 ne >>> 11 Disease3 D6 npc >>> 12 Disease3 D6 ne >>>> design1 <- model.matrix(~Patient,data=samples) >>>> design2 <- model.matrix(~Disease*Treatment,data=samples) >>>> design <- cbind(design1,design2[,5:8]) >>>> colnames(design) >>> [1] "(Intercept)" "PatientD5" >>> [3] "PatientD6" "PatientH1" >>> [5] "PatientH2" "PatientH3" >>> [7] "Treatmentnpc" "DiseaseDisease2:Treatmentnpc" >>> [9] "DiseaseDisease3:Treatmentnpc" "DiseaseHealthy:Treatmentnpc" >>> >>> Thank you very much. >>> >>> Justin >>> >>> -----Original Message----- >>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >>> Sent: Monday, March 24, 2014 12:00 PM >>> To: Justin Jeyakani (GIS) >>> Cc: Bioconductor mailing list >>> Subject: edgeR estimateGLMCommonDisp Error on RNA analysis >>> >>> Dear Justin, >>> >>> You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. >>> >>> You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: >>> >>>> samples >>> Disease Patient Treatment >>> 1 Healthy H1 npc >>> 2 Healthy H1 ne >>> 3 Healthy H2 npc >>> 4 Healthy H2 ne >>> 5 Healthy H3 npc >>> 6 Healthy H3 ne >>> 7 Disease1 D4 npc >>> 8 Disease1 D4 ne >>> 9 Disease2 D5 npc >>> 10 Disease2 D5 ne >>> 11 Disease3 D6 npc >>> 12 Disease3 D6 ne >>>> design1 <- model.matrix(~Patient,data=samples) >>>> design2 <- model.matrix(~Disease*Treatment,data=samples) >>>> design <- cbind(design1,design2[,5:8]) > colnames(design) >>> [1] "(Intercept)" "PatientD5" >>> [3] "PatientD6" "PatientH1" >>> [5] "PatientH2" "PatientH3" >>> [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" >>> [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" >>> >>> Then you can test for coefficients 8, 9 and 10. >>> >>> Best wishes >>> Gordon >>> >>> >>>> Date: Fri, 21 Mar 2014 17:09:37 +0800 >>>> From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> >>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis >>>> >>>> Hello Sir, >>>> >>>> I'm doing differential gene exp. by the edgeR package. My data seems >>>> to suitable to analyse by glm method (As per edgeR userguide section >>>> 3.5 Comparisons Both Between and Within Subjects). >>>> >>>> I have two different matrix to design. >>>> Matrix One: >>> >>>> I have 12 dataset 3 samples from different healthy individuals >>>> (Healthy) of two different stages (npc,ne) and 3 samples from >>>> different individuals (Disease1) of two different stages (npc,ne), >>>> so >>>> 6 data from each healthy(6) and Disese1 (6) below is my code and >>>> works well. But for my another design matrix it's not! >>>> >>>> Matrix Two: >>> >>>> But I want to comapre the All the Healthy (6data from 3 indiauals) >>>> with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 >>>> diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I do >>>> individual test of 3 disease1 with Healthy but the I'm getting error >>>> in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. >>>> I have the same problem with my another matrix also. Can guide me to >>>> solve this issue highly appreciated ... below is my code for the >>>> above mentiond two matrix...do I need to define anthing in the generate factor Level "gl" >>>> further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set >>>> causing dispersion error. >>>> >>>> Matrix1.txt: my input (works fine) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Disease1 4 npc >>>> 8 Disease1 4 ne >>>> 9 Disease1 5 npc >>>> 10 Disease1 5 ne >>>> 11 Disease1 6 npc >>>> 12 Disease1 6 ne >>>> >>>> SUPT1<-read.table("matrix1.txt",header=TRUE) >>>> summary(SUPT1) >>>> Patient<-gl(3,2,length=12) >>>> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) >>>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>>> >>>> data.frame(Disease,Patient,Treatment) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Disease1 1 npc >>>> 8 Disease1 1 ne >>>> 9 Disease1 2 npc >>>> 10 Disease1 2 ne >>>> 11 Disease1 3 npc >>>> 12 Disease1 3 ne >>>> >>>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>>> colnames(design) >>>> [1] "(Intercept)" "DiseaseDisease1" >>>> [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>>> [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>>> [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>>> >>>> library(edgeR) >>>> count<-read.table("RHN035-46_s1.txt",header=TRUE) >>>> head (count) >>>> colnames(count) >>>> samplename=colnames(count) >>>> cds01<-DGEList(count,group=samplename) >>>> head(cds01) >>>> cds01 >>>> summary(cds01) >>>> dim(cds01) >>>> >>>> >>>> keep<-rowSums(cpm(cds01)>2)>=4 >>>> cds01<-cds01[keep,] >>>> dim(cds01) >>>> >>>> cds01$sample$lib.size<-colSums(cds01$counts) >>>> y <- estimateGLMCommonDisp(cds01,design) >>>> y <- estimateGLMTrendedDisp(y,design) y <- >>>> estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) >>>> >>>> lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") >>>> topTags(lrt) >>>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>>> >>>> lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") >>>> topTags(lrt) >>>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>>> >>>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) >>>> topTags(lrt) >>>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>>> Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Disease1 4 npc >>>> 8 Disease1 4 ne >>>> 9 Disease2 5 npc >>>> 10 Disease2 5 ne >>>> 11 Disease3 6 npc >>>> 12 Disease3 6 ne >>>> >>>> SUPT1<-read.table("matrix2.txt",header=TRUE) >>>> summary(SUPT1) >>>> Patient<-gl(3,2,length=12) >>>> Disease <- factor(SUPT1$Disease, >>>> levels=c("Healthy","Disease1","Disease2","Disease3")) >>>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>>> >>>> data.frame(Disease,Patient,Treatment) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Disease1 1 npc >>>> 8 Disease1 1 ne >>>> 9 Disease2 2 npc >>>> 10 Disease2 2 ne >>>> 11 Disease3 3 npc >>>> 12 Disease3 3 ne >>>> >>>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>>> colnames(design) >>>> [1] "(Intercept)" "DiseaseDisease1" >>>> [3] "DiseaseDisease2" "DiseaseDisease3" >>>> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>>> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >>>> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>>> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >>>> [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>>> [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >>>> >>>> library(edgeR) >>>> count<-read.table("RHN035-46_s1.txt",header=TRUE) >>>> head (count) >>>> colnames(count) >>>> samplename=colnames(count) >>>> cds01<-DGEList(count,group=samplename) >>>> head(cds01) >>>> cds01 >>>> summary(cds01) >>>> dim(cds01) >>>> >>>> >>>> keep<-rowSums(cpm(cds01)>2)>=4 >>>> cds01<-cds01[keep,] >>>> dim(cds01) >>>> >>>> cds01$sample$lib.size<-colSums(cds01$counts) >>>> >>>> y <- estimateGLMCommonDisp(cds01,design) >>>> Warning message: >>>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>>> No residual df: setting dispersion to N >>>> Matri3.txt- my input (gives Error) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Healthy 4 npc >>>> 8 Healthy 4 ne >>>> 9 Disease1 5 npc >>>> 10 Disease1 5 ne >>>> 11 Disease2 6 npc >>>> 12 Disease2 6 ne >>>> 13 Disease2 7 npc >>>> 14 Disease2 7 ne >>>> 15 Disease3 8 npc >>>> 16 Disease3 8 ne >>>> >>>> SUPT1<-read.table("matrix2.txt",header=TRUE) >>>> summary(SUPT1) >>>> Patient<-gl(4,2,length=16) >>>> Disease <- factor(SUPT1$Disease, >>>> levels=c("Healthy","Disease1","Disease2","Disease3")) >>>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>>> >>>> data.frame(Disease,Patient,Treatment) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Healthy 4 npc >>>> 8 Healthy 4 ne >>>> 9 Disease1 1 npc >>>> 10 Disease1 1 ne >>>> 11 Disease2 2 npc >>>> 12 Disease2 2 ne >>>> 13 Disease2 3 npc >>>> 14 Disease2 3 ne >>>> 15 Disease3 4 npc >>>> 16 Disease3 4 ne >>>> >>>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>>> colnames(design) >>>> [1] "(Intercept)" "DiseaseDisease1" >>>> [3] "DiseaseDisease2" "DiseaseDisease3" >>>> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>>> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >>>> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>>> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >>>> [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" >>>> [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" >>>> [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>>> [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >>>> >>>> library(edgeR) >>>> count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.t >>>> x >>>> t >>>> ",header=TRUE) >>>> head (count) >>>> colnames(count) >>>> samplename=colnames(count) >>>> cds01<-DGEList(count,group=samplename) >>>> head(cds01) >>>> cds01 >>>> summary(cds01) >>>> dim(cds01) >>>> >>>> keep<-rowSums(cpm(cds01)>1)>=4 >>>> cds01<-cds01[keep,] >>>> dim(cds01) >>>> >>>> cds01$sample$lib.size<-colSums(cds01$counts) >>>> y <- estimateGLMCommonDisp(cds01,design) >>>> Error in return(NA, ntags) : multi-argument returns are not >>>> permitted In addition: Warning message: >>>> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >>>> No residual df: cannot estimate dispersion >>>> >>>> Thanks and regards, >>>> Justin > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:14}}
ADD COMMENTlink written 4.5 years ago by Gordon Smyth35k
Dear Gordon, Thank you for patiently answering all my queries. I yes should have use the level and try relevel or reordering the input also the another way to call ne compare to npc. Actually I wanted to confim whether npc means compare with ne or the otherway. Once again thanks for your support. Justin -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Friday, March 28, 2014 3:34 PM To: Justin Jeyakani (GIS) Cc: Bioconductor mailing list Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis Justin, By default, model.matrix() in R always creates the design matrix in such a way that each level of a factor is compared back to the first level. So type levels(samples$Treatment) to see which level is first and which is second. You will find that you have "ne" first and "npc" second, which is alphabetical order. This means that genes up are up/down in "npc" relative to "ne". Have a think about this, and experiment with some simple matrices, and the answers to your questions will become clear. A couple of posts ago, I told you a way to change the level which comes first for a factor. Gordon On Thu, 27 Mar 2014, Justin Jeyakani (GIS) wrote: > Dear Gordon, > > > If I undestand correctly the following "Treatmentnpc" > "DiseaseDisease1:Treatmentnpc" "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" > gives genes responding to the Treatmentnpc in patients and you confirmed that in the previous mail. > > For the above, genes are up/down regulated in "ne" compared to "npc" right? > > If it's "npc" comapared to "ne" then what changes needs to be done in > the code to get "ne" instead of calling "npc", i.e. like this, "Treatmentne" "DiseaseDisease1:Treatmentne" "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" > > My code in the first thread of this mail gives "Treatmentne" "DiseaseDisease1:Treatmentne" "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne". > Because the first stage is npc (neural progenitor cells) and the next stage is ne (neurons). > > > Thanks you. > Justin > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Wednesday, March 26, 2014 7:32 AM > To: Justin Jeyakani (GIS) > Cc: Bioconductor mailing list > Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis > > Dear Justin, > > Not sure why you ask this question --- the correct matrix to use is always the one containing correct information. > > Matrix1 incorrectly identifies two different individuals with Disease2 as the same individual, whereas Matrix2 gives correct information. > > Best wishes > Gordon > > On Tue, 25 Mar 2014, Justin Jeyakani (GIS) wrote: > >> Dear Gordon, >> >> Thanks you very much for your support. I got my results now. >> I have one more query about my another matrix... plz see the below is the matrix... >> which one of the matrix is right to use? I have 4 Healthy individuals (each individual have 2 stages npc and ne) and Disease1 one individual, Disease2 two individual and Disease3 one individual. >> >> Matrix1: >> Disease Patient Treatment >> 1 Healthy H1 npc >> 2 Healthy H1 ne >> 3 Healthy H2 npc >> 4 Healthy H2 ne >> 5 Healthy H3 npc >> 6 Healthy H3 ne >> 7 Healthy H4 npc >> 8 Healthy H4 ne >> 9 Disease1 D5 npc >> 10 Disease1 D5 ne >> 11 Disease2 D6 npc >> 12 Disease2 D6 ne >> 13 Disease2 D6 npc >> 14 Disease2 D6 ne >> 15 Disease3 D7 npc >> 16 Disease3 D7 ne >> >> OR >> >> Matrix2: >> Disease Patient Treatment >> 1 Healthy H1 npc >> 2 Healthy H1 ne >> 3 Healthy H2 npc >> 4 Healthy H2 ne >> 5 Healthy H3 npc >> 6 Healthy H3 ne >> 7 Healthy H4 npc >> 8 Healthy H4 ne >> 9 Disease1 D5 npc >> 10 Disease1 D5 ne >> 11 Disease2 D6 npc >> 12 Disease2 D6 ne >> 13 Disease2 D7 npc >> 14 Disease2 D7 ne >> 15 Disease3 D8 npc >> 16 Disease3 D8 ne >> >> Thank you very much. >> >> Justin >> >> -----Original Message----- >> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >> Sent: Tuesday, March 25, 2014 9:34 AM >> To: Justin Jeyakani (GIS) >> Cc: Bioconductor mailing list >> Subject: RE: edgeR estimateGLMCommonDisp Error on RNA analysis >> >> I assumed that "Healthy" is set as the first level of the factor 'Disease'. I assumed that because that is how you set it up in the original code you posted. >> >> You can make this so by typing >> >> samples$Disease <- relevel(samples$Disease, ref="Healthy") >> >> The coefficient "Treatmentnpc" gives the effect of npc in healthy patients. >> >> Gordon >> >> On Mon, 24 Mar 2014, Justin Jeyakani (GIS) wrote: >> >>> Dear Gordon, >>> >>> Thanks for the code and the clarrification. I'm new to the edgeR your reply helps a lot. I have an issue to execute your code. >>> >>> I'm getting slightly different matrix compare to what you have sent >>> using the same code... >> >>> I'm getting the "DiseaseHealthy:Treatmentnpc" instead what you get >>> "DiseaseDisease1:Treatmentnpc" also can I know that coefficients of >>> "Treatmentnpc" is genes responding to the npc in healthy patients only? >>> Which one gives the genes respond to the npc in healthy? >>> >>>> samples <-read.table("matrix1.txt",header=TRUE) >>>> samples >>> Disease Patient Treatment >>> 1 Healthy H1 npc >>> 2 Healthy H1 ne >>> 3 Healthy H2 npc >>> 4 Healthy H2 ne >>> 5 Healthy H3 npc >>> 6 Healthy H3 ne >>> 7 Disease1 D4 npc >>> 8 Disease1 D4 ne >>> 9 Disease2 D5 npc >>> 10 Disease2 D5 ne >>> 11 Disease3 D6 npc >>> 12 Disease3 D6 ne >>>> design1 <- model.matrix(~Patient,data=samples) >>>> design2 <- model.matrix(~Disease*Treatment,data=samples) >>>> design <- cbind(design1,design2[,5:8]) >>>> colnames(design) >>> [1] "(Intercept)" "PatientD5" >>> [3] "PatientD6" "PatientH1" >>> [5] "PatientH2" "PatientH3" >>> [7] "Treatmentnpc" "DiseaseDisease2:Treatmentnpc" >>> [9] "DiseaseDisease3:Treatmentnpc" "DiseaseHealthy:Treatmentnpc" >>> >>> Thank you very much. >>> >>> Justin >>> >>> -----Original Message----- >>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] >>> Sent: Monday, March 24, 2014 12:00 PM >>> To: Justin Jeyakani (GIS) >>> Cc: Bioconductor mailing list >>> Subject: edgeR estimateGLMCommonDisp Error on RNA analysis >>> >>> Dear Justin, >>> >>> You are creating design matrices that have more columns than you have samples. So it's not surprising that edgeR tells you there are no residual df available to estimate the dispersion. >>> >>> You want to test the treatment effect for each individual diseased patient vs the average of the treatment effect for healthy patients. Making design matrices for hypotheses like this isn't automatic in R. Here is one way to do it: >>> >>>> samples >>> Disease Patient Treatment >>> 1 Healthy H1 npc >>> 2 Healthy H1 ne >>> 3 Healthy H2 npc >>> 4 Healthy H2 ne >>> 5 Healthy H3 npc >>> 6 Healthy H3 ne >>> 7 Disease1 D4 npc >>> 8 Disease1 D4 ne >>> 9 Disease2 D5 npc >>> 10 Disease2 D5 ne >>> 11 Disease3 D6 npc >>> 12 Disease3 D6 ne >>>> design1 <- model.matrix(~Patient,data=samples) >>>> design2 <- model.matrix(~Disease*Treatment,data=samples) >>>> design <- cbind(design1,design2[,5:8]) > colnames(design) >>> [1] "(Intercept)" "PatientD5" >>> [3] "PatientD6" "PatientH1" >>> [5] "PatientH2" "PatientH3" >>> [7] "Treatmentnpc" "DiseaseDisease1:Treatmentnpc" >>> [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc" >>> >>> Then you can test for coefficients 8, 9 and 10. >>> >>> Best wishes >>> Gordon >>> >>> >>>> Date: Fri, 21 Mar 2014 17:09:37 +0800 >>>> From: "Justin Jeyakani (GIS)" <gnanakkan at="" gis.a-star.edu.sg=""> >>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis >>>> >>>> Hello Sir, >>>> >>>> I'm doing differential gene exp. by the edgeR package. My data >>>> seems to suitable to analyse by glm method (As per edgeR userguide >>>> section >>>> 3.5 Comparisons Both Between and Within Subjects). >>>> >>>> I have two different matrix to design. >>>> Matrix One: >>> >>>> I have 12 dataset 3 samples from different healthy individuals >>>> (Healthy) of two different stages (npc,ne) and 3 samples from >>>> different individuals (Disease1) of two different stages (npc,ne), >>>> so >>>> 6 data from each healthy(6) and Disese1 (6) below is my code and >>>> works well. But for my another design matrix it's not! >>>> >>>> Matrix Two: >>> >>>> But I want to comapre the All the Healthy (6data from 3 indiauals) >>>> with each of the Disease1 individual (npc,ne). B'ze I'm getting 125 >>>> diff.exp genes btn healthyVsDisease1 and I'm expecting more, If I >>>> do individual test of 3 disease1 with Healthy but the I'm getting >>>> error in the "estimateGLMCommonDisp" steps. I couldn't solve the problem. >>>> I have the same problem with my another matrix also. Can guide me >>>> to solve this issue highly appreciated ... below is my code for the >>>> above mentiond two matrix...do I need to define anthing in the generate factor Level "gl" >>>> further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set >>>> causing dispersion error. >>>> >>>> Matrix1.txt: my input (works fine) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Disease1 4 npc >>>> 8 Disease1 4 ne >>>> 9 Disease1 5 npc >>>> 10 Disease1 5 ne >>>> 11 Disease1 6 npc >>>> 12 Disease1 6 ne >>>> >>>> SUPT1<-read.table("matrix1.txt",header=TRUE) >>>> summary(SUPT1) >>>> Patient<-gl(3,2,length=12) >>>> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1")) >>>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>>> >>>> data.frame(Disease,Patient,Treatment) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Disease1 1 npc >>>> 8 Disease1 1 ne >>>> 9 Disease1 2 npc >>>> 10 Disease1 2 ne >>>> 11 Disease1 3 npc >>>> 12 Disease1 3 ne >>>> >>>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>>> colnames(design) >>>> [1] "(Intercept)" "DiseaseDisease1" >>>> [3] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>>> [5] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>>> [7] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>>> >>>> library(edgeR) >>>> count<-read.table("RHN035-46_s1.txt",header=TRUE) >>>> head (count) >>>> colnames(count) >>>> samplename=colnames(count) >>>> cds01<-DGEList(count,group=samplename) >>>> head(cds01) >>>> cds01 >>>> summary(cds01) >>>> dim(cds01) >>>> >>>> >>>> keep<-rowSums(cpm(cds01)>2)>=4 >>>> cds01<-cds01[keep,] >>>> dim(cds01) >>>> >>>> cds01$sample$lib.size<-colSums(cds01$counts) >>>> y <- estimateGLMCommonDisp(cds01,design) >>>> y <- estimateGLMTrendedDisp(y,design) y <- >>>> estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) >>>> >>>> lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne") >>>> topTags(lrt) >>>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>>> >>>> lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne") >>>> topTags(lrt) >>>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>>> >>>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1)) >>>> topTags(lrt) >>>> detags <- rownames(topTags(lrt)$table) cpm(y)[detags, >>>> order(y$samples$group)] summary(de <- decideTestsDGE(lrt)) >>>> Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Disease1 4 npc >>>> 8 Disease1 4 ne >>>> 9 Disease2 5 npc >>>> 10 Disease2 5 ne >>>> 11 Disease3 6 npc >>>> 12 Disease3 6 ne >>>> >>>> SUPT1<-read.table("matrix2.txt",header=TRUE) >>>> summary(SUPT1) >>>> Patient<-gl(3,2,length=12) >>>> Disease <- factor(SUPT1$Disease, >>>> levels=c("Healthy","Disease1","Disease2","Disease3")) >>>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>>> >>>> data.frame(Disease,Patient,Treatment) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Disease1 1 npc >>>> 8 Disease1 1 ne >>>> 9 Disease2 2 npc >>>> 10 Disease2 2 ne >>>> 11 Disease3 3 npc >>>> 12 Disease3 3 ne >>>> >>>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>>> colnames(design) >>>> [1] "(Intercept)" "DiseaseDisease1" >>>> [3] "DiseaseDisease2" "DiseaseDisease3" >>>> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>>> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >>>> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>>> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >>>> [13] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>>> [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >>>> >>>> library(edgeR) >>>> count<-read.table("RHN035-46_s1.txt",header=TRUE) >>>> head (count) >>>> colnames(count) >>>> samplename=colnames(count) >>>> cds01<-DGEList(count,group=samplename) >>>> head(cds01) >>>> cds01 >>>> summary(cds01) >>>> dim(cds01) >>>> >>>> >>>> keep<-rowSums(cpm(cds01)>2)>=4 >>>> cds01<-cds01[keep,] >>>> dim(cds01) >>>> >>>> cds01$sample$lib.size<-colSums(cds01$counts) >>>> >>>> y <- estimateGLMCommonDisp(cds01,design) >>>> Warning message: >>>> In estimateGLMCommonDisp.default(y = y$counts, design = design, : >>>> No residual df: setting dispersion to N >>>> Matri3.txt- my input (gives Error) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Healthy 4 npc >>>> 8 Healthy 4 ne >>>> 9 Disease1 5 npc >>>> 10 Disease1 5 ne >>>> 11 Disease2 6 npc >>>> 12 Disease2 6 ne >>>> 13 Disease2 7 npc >>>> 14 Disease2 7 ne >>>> 15 Disease3 8 npc >>>> 16 Disease3 8 ne >>>> >>>> SUPT1<-read.table("matrix2.txt",header=TRUE) >>>> summary(SUPT1) >>>> Patient<-gl(4,2,length=16) >>>> Disease <- factor(SUPT1$Disease, >>>> levels=c("Healthy","Disease1","Disease2","Disease3")) >>>> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne")) >>>> >>>> data.frame(Disease,Patient,Treatment) >>>> Disease Patient Treatment >>>> 1 Healthy 1 npc >>>> 2 Healthy 1 ne >>>> 3 Healthy 2 npc >>>> 4 Healthy 2 ne >>>> 5 Healthy 3 npc >>>> 6 Healthy 3 ne >>>> 7 Healthy 4 npc >>>> 8 Healthy 4 ne >>>> 9 Disease1 1 npc >>>> 10 Disease1 1 ne >>>> 11 Disease2 2 npc >>>> 12 Disease2 2 ne >>>> 13 Disease2 3 npc >>>> 14 Disease2 3 ne >>>> 15 Disease3 4 npc >>>> 16 Disease3 4 ne >>>> >>>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) >>>> colnames(design) >>>> [1] "(Intercept)" "DiseaseDisease1" >>>> [3] "DiseaseDisease2" "DiseaseDisease3" >>>> [5] "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2" >>>> [7] "DiseaseDisease2:Patient2" "DiseaseDisease3:Patient2" >>>> [9] "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" >>>> [11] "DiseaseDisease2:Patient3" "DiseaseDisease3:Patient3" >>>> [13] "DiseaseHealthy:Patient4" "DiseaseDisease1:Patient4" >>>> [15] "DiseaseDisease2:Patient4" "DiseaseDisease3:Patient4" >>>> [17] "DiseaseHealthy:Treatmentne" "DiseaseDisease1:Treatmentne" >>>> [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne" >>>> >>>> library(edgeR) >>>> count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count. >>>> t >>>> x >>>> t >>>> ",header=TRUE) >>>> head (count) >>>> colnames(count) >>>> samplename=colnames(count) >>>> cds01<-DGEList(count,group=samplename) >>>> head(cds01) >>>> cds01 >>>> summary(cds01) >>>> dim(cds01) >>>> >>>> keep<-rowSums(cpm(cds01)>1)>=4 >>>> cds01<-cds01[keep,] >>>> dim(cds01) >>>> >>>> cds01$sample$lib.size<-colSums(cds01$counts) >>>> y <- estimateGLMCommonDisp(cds01,design) >>>> Error in return(NA, ntags) : multi-argument returns are not >>>> permitted In addition: Warning message: >>>> In estimateGLMTrendedDisp.default(y = y$counts, design = design, : >>>> No residual df: cannot estimate dispersion >>>> >>>> Thanks and regards, >>>> Justin > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:30}}
ADD REPLYlink written 4.5 years ago by Justin Jeyakani GIS60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 451 users visited in the last hour