Contrast Problem
1
0
Entering edit mode
@aditi-rambani-5366
Last seen 9.6 years ago
Thanks for your input. We defined a single experiment with 8 levels and fitted model as a one way layout using attached script. All our contrasts gave us output except one : lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) We get an error message : Error in beta[k, ] <- betaj[decr, ] :  NAs are not allowed in subscripted assignments Calls: glmLRT ... glmFit.DGEList -> glmFit -> glmFit.default -> mglmLS Can you please look into this. Thanks THE SCRIPT : library("edgeR") counts <- read.table(INFILE, header=T, row.names=1) groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) design <- model.matrix(~groups) dge <- DGEList(counts=counts, group=groups) dge <- calcNormFactors(dge) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,0,1,1,-1,-1,0,0)) lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,0,1,1,1,1,-2,-2)) lrt.acc3 <- glmLRT(dge, fit, contrast=c(-3,-3,1,1,1,1,1,1)) lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) lrt.Mx <- glmLRT(dge, fit, contrast=c(0,0,1,-1,0,0,0,0)) lrt.Tx <- glmLRT(dge, fit, contrast=c(0,0,0,0,1,-1,0,0)) lrt.Tom <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,0,1,-1)) write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Cc: Bioconductor mailing list <bioconductor@r-project.org> Sent: Friday, June 29, 2012 6:19 PM Subject: Re: Contrast Problem Dear Aditi, The reason you are getting unexpected results is that your contrasts are incorrect.  The meaning of a contrast is that it makes a comparison between the coefficients of the linear model that you have fitted.  Here are the column names of your design matrix: > colnames(design2) "(Intercept)" "accessions2" "accessions3" "accessions4" "genomes2" "accessions2:genomes2" "accessions3:genomes2" "accessions4:genomes2" So when you take the contrast c(1,-1,0,0,0,0,0,0) you are comparing the first coefficient, which is the intercept term, with the second coefficient, which is a difference between accession2 and accession1 for the first genome.  It is not a comparison of any interest. The interaction model that you have fitted is inappropriate for the questions that you want to answer.  I suggest that you instead redefine a single experiment factor with 8 levels, corresponding to all combinations of accession and genome, so that you can fit your model as a one way layout.  I think that will produce coefficients that you will find easier to understand and to take contrasts of. I will not be able to provide more help for some days. Best wishes Gordon On Fri, 29 Jun 2012, Aditi Rambani wrote: Hello, Thanks for your reply, sorry for not being clear enough. > When you say different expression (DE) between genomes for all the accessions, do you mean DE between genomes for each of the accessions separately?  That is DE genes for A.F1 vs D.F1, for A.Tom vs D.Tom, for A.Tx vs D.Tx and for A.Mx vs D.Mx? Yes, that is what i meant. We do a contrast for A.F1 vs D.F1 like this : [contrast=c(1,-1,0,0,0,0,0,0)]. Our problem is that even when contrast is between A.F1(0 rpkm)  vs D. F1 (0 rpkm) it has significant FDR, I dont understand how it can contrast two columns with zero values and show a significant differential expression. Also, it sometimes ignores genes with significant bias but that number is not very high. > When you say DE between accessions, is that for each genome separately? In other words, do you expect the differences between the accessions to be relatively the same for each genome, or will the differences between accessions be genome-specific? We are not absolutely certain about what differences to expect from genomes between accessions. We do know that some accessions are more closely related than others and we can assume they will have a similar differential expression pattern. For accessions we did our contrasts like this : acc1= Mx vs Tx  [lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0))] acc2 = Tom vs Mx/Tx [lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2))] acc3= F1 vs Mx/Tx/Tom [lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1))] I have no way to test accuracy of these results because i dont know what is expected. I was hoping if we implemented the package correctly we could trust the results. > Your comments about zero expression and not detecting significantly DE genes don't make sense to me.  I won't try to respond to those comments because I think sorting out the above questions will probably solve other perceived problems as well. - I will appreciate your input on this matter and thanks for looking into this. Aditi > Date: Thu, 28 Jun 2012 12:01:50 -0700 > To: "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > Subject: [BioC] Contrast Problem > > Hello, > > I am a graduate student at Brigham Young University working on polyploid cotton RNA seq data. Our study design has two explanatory variables, one is 'accession' with four levels (F1,Tom,Tx,Mx) and other one is 'genome' with two levels (A genome or D genome). We want to detect differential expression of genes between 'genomes' from all the accessions and also find genes that are differentially expressed between accessions. We built a (Accession*Genome) model and did a contrast for two levels of 'genomes'. In contrast results we see that many genes with zero expression (0 RPKM) have significant FDRs and some significantly differentially expressed genes are not detected. We are not sure why this is happening, any help will be greatly appreciated. > > Thanks > > Aditi > > We are using following script to do our analysis but > > > library("edgeR") > > counts <- read.table(INFILE, header=T, row.names=1) > groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) > accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) > genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) > > > design2 <- model.matrix(~accessions*genomes) > > dge <- DGEList(counts=counts, group=groups) > dge <- calcNormFactors(dge) > dge2 <- estimateGLMCommonDisp(dge, design2) > dge2 <- estimateGLMTrendedDisp(dge2, design2) > dge2 <- estimateGLMTagwiseDisp(dge2, design2) > > fit2 <- glmFit(dge2, design2) > > lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0)) > lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2)) > lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1)) > lrt.F1 <- glmLRT(dge2, fit2, contrast=c(1,-1,0,0,0,0,0,0)) > lrt.Mx <- glmLRT(dge2, fit2, contrast=c(0,0,1,-1,0,0,0,0)) > lrt.Tx <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,1,-1,0,0)) > lrt.Tom <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,0,0,1,-1)) > > > write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) > write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) > write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) > write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) > write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}
genomes genomes • 1.6k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia
The contrasts you are using do not make sense unless you use design <- model.matrix(~0+groups) Gordon On Mon, 9 Jul 2012, Aditi Rambani wrote: Thanks for your input. We defined a single experiment with 8 levels and fitted model as a one way layout using attached script. All our contrasts gave us output except one :?lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) We get an error message : Error in beta[k, ] <- betaj[decr, ] : ?NAs are not allowed in subscripted assignments Calls: glmLRT ... glmFit.DGEList -> glmFit -> glmFit.default -> mglmLS Can you please look into this. Thanks THE SCRIPT : library("edgeR") counts <- read.table(INFILE, header=T, row.names=1) groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) design <- model.matrix(~groups) dge <- DGEList(counts=counts, group=groups) dge <- calcNormFactors(dge) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,0,1,1,-1,-1,0,0)) lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,0,1,1,1,1,-2,-2)) lrt.acc3 <- glmLRT(dge, fit, contrast=c(-3,-3,1,1,1,1,1,1)) lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) lrt.Mx <- glmLRT(dge, fit, contrast=c(0,0,1,-1,0,0,0,0)) lrt.Tx <- glmLRT(dge, fit, contrast=c(0,0,0,0,1,-1,0,0)) lrt.Tom <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,0,1,-1)) write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth at="" wehi.edu.au=""> To: Aditi Rambani <aditi_rambani at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Sent: Friday, June 29, 2012 6:19 PM Subject: Re: Contrast Problem Dear Aditi, The reason you are getting unexpected results is that your contrasts are incorrect.? The meaning of a contrast is that it makes a comparison between the coefficients of the linear model that you have fitted.? Here are the column names of your design matrix: > colnames(design2) "(Intercept)" "accessions2" "accessions3" "accessions4" "genomes2" "accessions2:genomes2" "accessions3:genomes2" "accessions4:genomes2" So when you take the contrast c(1,-1,0,0,0,0,0,0) you are comparing the first coefficient, which is the intercept term, with the second coefficient, which is a difference between accession2 and accession1 for the first genome.? It is not a comparison of any interest. The interaction model that you have fitted is inappropriate for the questions that you want to answer.? I suggest that you instead redefine a single experiment factor with 8 levels, corresponding to all combinations of accession and genome, so that you can fit your model as a one way layout.? I think that will produce coefficients that you will find easier to understand and to take contrasts of. I will not be able to provide more help for some days. Best wishes Gordon On Fri, 29 Jun 2012, Aditi Rambani wrote: Hello, Thanks for your reply, sorry for not being clear enough. > When you say different expression (DE) between genomes for all the accessions, do you mean DE between genomes for each of the accessions separately?? That is DE genes for A.F1 vs D.F1, for A.Tom vs D.Tom, for A.Tx vs D.Tx and for A.Mx vs D.Mx? Yes, that is what i meant. We do a contrast for A.F1 vs D.F1 like this : [contrast=c(1,-1,0,0,0,0,0,0)].? Our problem is that even when contrast is between A.F1(0 rpkm) ?vs D. F1 (0 rpkm) it has significant FDR, I dont understand how it can contrast two columns with zero values and show a significant differential expression. Also, it sometimes ignores genes with?significant?bias but that number is not very high. > When you say DE between accessions, is that for each genome separately? In other words, do you expect the differences between the accessions to be relatively the same for each genome, or will the differences between accessions be genome-specific? We are not absolutely certain about what differences to expect from genomes between accessions. We do know that some accessions are more closely related than others and we can assume they will have a similar differential expression pattern. For accessions we did our contrasts like this : acc1= Mx vs Tx ?[lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0))] acc2 = Tom vs Mx/Tx [lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2))] acc3= F1 vs Mx/Tx/Tom [lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1))] I have no way to test accuracy of these results because i dont know what is expected. I was hoping if we implemented the package correctly we could trust the results. > Your comments about zero expression and not detecting significantly DE genes don't make sense to me.? I won't try to respond to those comments because I think sorting out the above questions will probably solve other perceived problems as well. - I will appreciate your input on this matter and thanks for looking into this.?? Aditi > Date: Thu, 28 Jun 2012 12:01:50 -0700 > From: Aditi Rambani <aditi_rambani at="" yahoo.com=""> > To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] Contrast Problem > ? > Hello,? > ? > I am a graduate student at Brigham Young University working on polyploid cotton RNA seq data.?Our study design has two explanatory variables, one is 'accession' with four levels (F1,Tom,Tx,Mx) and other one is 'genome' with two levels (A genome or D genome). We want to detect differential expression of genes between 'genomes' from all the accessions and also find genes that are differentially expressed between accessions. We built a (Accession*Genome) model and did a contrast for two levels of 'genomes'. In contrast results we see that many genes with zero expression (0 RPKM) have significant FDRs and some significantly differentially expressed genes are not detected. We are not sure why this is happening, any help will be greatly appreciated. > ? > Thanks > ? > Aditi? > ? > We are using following script to do our analysis but? > ? > ? > library("edgeR") > ? > counts <- read.table(INFILE, header=T, row.names=1) > groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) > accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) > genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) > ? > ? > design2 <- model.matrix(~accessions*genomes) > ? > dge <- DGEList(counts=counts, group=groups) > dge <- calcNormFactors(dge) > dge2 <- estimateGLMCommonDisp(dge, design2) > dge2 <- estimateGLMTrendedDisp(dge2, design2) > dge2 <- estimateGLMTagwiseDisp(dge2, design2) > ? > fit2 <- glmFit(dge2, design2) > ? > lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0)) > lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2)) > lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1)) > lrt.F1 <- glmLRT(dge2, fit2, contrast=c(1,-1,0,0,0,0,0,0)) > lrt.Mx <- glmLRT(dge2, fit2, contrast=c(0,0,1,-1,0,0,0,0)) > lrt.Tx <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,1,-1,0,0)) > lrt.Tom <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,0,0,1,-1)) > ? > ? > write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) > write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) > write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) > write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) > write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD COMMENT
0
Entering edit mode
Thanks, we this fixed our error.� ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Cc: Bioconductor mailing list <bioconductor@r-project.org> Sent: Monday, July 9, 2012 11:08 PM Subject: Re: Contrast Problem The contrasts you are using do not make sense unless you use � design <- model.matrix(~0+groups) Gordon On Mon, 9 Jul 2012, Aditi Rambani wrote: Thanks for your input. We defined a single experiment with 8 levels and fitted model as a one way layout using attached script. All our contrasts gave us output except one :�lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) We get an error message : Error in beta[k, ] <- betaj[decr, ] : �NAs are not allowed in subscripted assignments Calls: glmLRT ... glmFit.DGEList -> glmFit -> glmFit.default -> mglmLS Can you please look into this. Thanks THE SCRIPT : library("edgeR") counts <- read.table(INFILE, header=T, row.names=1) groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) design <- model.matrix(~groups) dge <- DGEList(counts=counts, group=groups) dge <- calcNormFactors(dge) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,0,1,1,-1,-1,0,0)) lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,0,1,1,1,1,-2,-2)) lrt.acc3 <- glmLRT(dge, fit, contrast=c(-3,-3,1,1,1,1,1,1)) lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) lrt.Mx <- glmLRT(dge, fit, contrast=c(0,0,1,-1,0,0,0,0)) lrt.Tx <- glmLRT(dge, fit, contrast=c(0,0,0,0,1,-1,0,0)) lrt.Tom <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,0,1,-1)) write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> bioconductor@r-project.org> Sent: Friday, June 29, 2012 6:19 PM Subject: Re: Contrast Problem Dear Aditi, The reason you are getting unexpected results is that your contrasts are incorrect.� The meaning of a contrast is that it makes a comparison between the coefficients of the linear model that you have fitted.� Here are the column names of your design matrix: > colnames(design2) "(Intercept)" "accessions2" "accessions3" "accessions4" "genomes2" "accessions2:genomes2" "accessions3:genomes2" "accessions4:genomes2" So when you take the contrast c(1,-1,0,0,0,0,0,0) you are comparing the first coefficient, which is the intercept term, with the second coefficient, which is a difference between accession2 and accession1 for the first genome.� It is not a comparison of any interest. The interaction model that you have fitted is inappropriate for the questions that you want to answer.� I suggest that you instead redefine a single experiment factor with 8 levels, corresponding to all combinations of accession and genome, so that you can fit your model as a one way layout.� I think that will produce coefficients that you will find easier to understand and to take contrasts of. I will not be able to provide more help for some days. Best wishes Gordon On Fri, 29 Jun 2012, Aditi Rambani wrote: Hello, Thanks for your reply, sorry for not being clear enough. > When you say different expression (DE) between genomes for all the accessions, do you mean DE between genomes for each of the accessions separately?� That is DE genes for A.F1 vs D.F1, for A.Tom vs D.Tom, for A.Tx vs D.Tx and for A.Mx vs D.Mx? Yes, that is what i meant. We do a contrast for A.F1 vs D.F1 like this : [contrast=c(1,-1,0,0,0,0,0,0)].� Our problem is that even when contrast is between A.F1(0 rpkm) �vs D. F1 (0 rpkm) it has significant FDR, I dont understand how it can contrast two columns with zero values and show a significant differential expression. Also, it sometimes ignores genes with�significant�bias but that number is not very high. > When you say DE between accessions, is that for each genome separately? In other words, do you expect the differences between the accessions to be relatively the same for each genome, or will the differences between accessions be genome-specific? We are not absolutely certain about what differences to expect from genomes between accessions. We do know that some accessions are more closely related than others and we can assume they will have a similar differential expression pattern. For accessions we did our contrasts like this : acc1= Mx vs Tx �[lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0))] acc2 = Tom vs Mx/Tx [lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2))] acc3= F1 vs Mx/Tx/Tom [lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1))] I have no way to test accuracy of these results because i dont know what is expected. I was hoping if we implemented the package correctly we could trust the results. > Your comments about zero expression and not detecting significantly DE genes don't make sense to me.� I won't try to respond to those comments because I think sorting out the above questions will probably solve other perceived problems as well. - I will appreciate your input on this matter and thanks for looking into this.�� Aditi > Date: Thu, 28 Jun 2012 12:01:50 -0700 > To: "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > Subject: [BioC] Contrast Problem > � > Hello,� > � > I am a graduate student at Brigham Young University working on polyploid cotton RNA seq data.�Our study design has two explanatory variables, one is 'accession' with four levels (F1,Tom,Tx,Mx) and other one is 'genome' with two levels (A genome or D genome). We want to detect differential expression of genes between 'genomes' from all the accessions and also find genes that are differentially expressed between accessions. We built a (Accession*Genome) model and did a contrast for two levels of 'genomes'. In contrast results we see that many genes with zero expression (0 RPKM) have significant FDRs and some significantly differentially expressed genes are not detected. We are not sure why this is happening, any help will be greatly appreciated. > � > Thanks > � > Aditi� > � > We are using following script to do our analysis but� > � > � > library("edgeR") > � > counts <- read.table(INFILE, header=T, row.names=1) > groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) > accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) > genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) > � > � > design2 <- model.matrix(~accessions*genomes) > � > dge <- DGEList(counts=counts, group=groups) > dge <- calcNormFactors(dge) > dge2 <- estimateGLMCommonDisp(dge, design2) > dge2 <- estimateGLMTrendedDisp(dge2, design2) > dge2 <- estimateGLMTagwiseDisp(dge2, design2) > � > fit2 <- glmFit(dge2, design2) > � > lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0)) > lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2)) > lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1)) > lrt.F1 <- glmLRT(dge2, fit2, contrast=c(1,-1,0,0,0,0,0,0)) > lrt.Mx <- glmLRT(dge2, fit2, contrast=c(0,0,1,-1,0,0,0,0)) > lrt.Tx <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,1,-1,0,0)) > lrt.Tom <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,0,0,1,-1)) > � > � > write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) > write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) > write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) > write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) > write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}
ADD REPLY
0
Entering edit mode
Hello, Thanks for recommending single experiment approach for our analysis, it made the contrasts easier. My committee still thinks we should not ignore the 'interaction' between the accessions and genomes. I tried to implement accession*genome model using attached script. There is no way for me to sanity check these results, I can only compare it with my single experiment results. I want to make sure I am making the right contrasts. acc1 = Mx vs Tx acc2 = Mx,Tx vs Tom acc3 = Mx,Tx,Tom vs F1 rat1 = Mx:D vs Tx:D rat2 = Mx:D,Tx:D vs Tom:D rat3 =Mx:D,Tx:D,Tom:D vs F1:D I will look forward to your response. Thanks INFILE="analysis/combined.txt" counts <- read.table(INFILE, header=T, row.names=1) groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) design <- model.matrix(~accessions*genomes) dge <- DGEList(counts=counts, group=groups) dge <- calcNormFactors(dge) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,1,0,-1,0,0,0,0)) lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,1,-2,1,0,0,0,0)) lrt.acc3 <- glmLRT(dge, fit, contrast=c(0,4,4,4,0,0,0,0)) lrt.rat1 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,1,0,-1)) lrt.rat2 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,1,-2,1)) lrt.rat3 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,4,4,4)) write.table(topTags(lrt.acc1, n=15000), file="acc1.results", sep="\t", quote=F) write.table(topTags(lrt.acc2, n=15000), file="acc2.results", sep="\t", quote=F) write.table(topTags(lrt.acc3, n=15000), file="acc3.results", sep="\t", quote=F) write.table(topTags(lrt.rat1, n=15000), file="rat1.results", sep="\t", quote=F) write.table(topTags(lrt.rat2, n=15000), file="rat2.results", sep="\t", quote=F) write.table(topTags(lrt.rat3, n=15000), file="rat3.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Cc: Bioconductor mailing list <bioconductor@r-project.org> Sent: Monday, July 9, 2012 11:08 PM Subject: Re: Contrast Problem The contrasts you are using do not make sense unless you use   design <- model.matrix(~0+groups) Gordon On Mon, 9 Jul 2012, Aditi Rambani wrote: Thanks for your input. We defined a single experiment with 8 levels and fitted model as a one way layout using attached script. All our contrasts gave us output except one : lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) We get an error message : Error in beta[k, ] <- betaj[decr, ] :  NAs are not allowed in subscripted assignments Calls: glmLRT ... glmFit.DGEList -> glmFit -> glmFit.default -> mglmLS Can you please look into this. Thanks THE SCRIPT : library("edgeR") counts <- read.table(INFILE, header=T, row.names=1) groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) design <- model.matrix(~groups) dge <- DGEList(counts=counts, group=groups) dge <- calcNormFactors(dge) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,0,1,1,-1,-1,0,0)) lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,0,1,1,1,1,-2,-2)) lrt.acc3 <- glmLRT(dge, fit, contrast=c(-3,-3,1,1,1,1,1,1)) lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) lrt.Mx <- glmLRT(dge, fit, contrast=c(0,0,1,-1,0,0,0,0)) lrt.Tx <- glmLRT(dge, fit, contrast=c(0,0,0,0,1,-1,0,0)) lrt.Tom <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,0,1,-1)) write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> bioconductor@r-project.org> Sent: Friday, June 29, 2012 6:19 PM Subject: Re: Contrast Problem Dear Aditi, The reason you are getting unexpected results is that your contrasts are incorrect.  The meaning of a contrast is that it makes a comparison between the coefficients of the linear model that you have fitted.  Here are the column names of your design matrix: > colnames(design2) "(Intercept)" "accessions2" "accessions3" "accessions4" "genomes2" "accessions2:genomes2" "accessions3:genomes2" "accessions4:genomes2" So when you take the contrast c(1,-1,0,0,0,0,0,0) you are comparing the first coefficient, which is the intercept term, with the second coefficient, which is a difference between accession2 and accession1 for the first genome.  It is not a comparison of any interest. The interaction model that you have fitted is inappropriate for the questions that you want to answer.  I suggest that you instead redefine a single experiment factor with 8 levels, corresponding to all combinations of accession and genome, so that you can fit your model as a one way layout.  I think that will produce coefficients that you will find easier to understand and to take contrasts of. I will not be able to provide more help for some days. Best wishes Gordon On Fri, 29 Jun 2012, Aditi Rambani wrote: Hello, Thanks for your reply, sorry for not being clear enough. > When you say different expression (DE) between genomes for all the accessions, do you mean DE between genomes for each of the accessions separately?  That is DE genes for A.F1 vs D.F1, for A.Tom vs D.Tom, for A.Tx vs D.Tx and for A.Mx vs D.Mx? Yes, that is what i meant. We do a contrast for A.F1 vs D.F1 like this : [contrast=c(1,-1,0,0,0,0,0,0)]. Our problem is that even when contrast is between A.F1(0 rpkm)  vs D. F1 (0 rpkm) it has significant FDR, I dont understand how it can contrast two columns with zero values and show a significant differential expression. Also, it sometimes ignores genes with significant bias but that number is not very high. > When you say DE between accessions, is that for each genome separately? In other words, do you expect the differences between the accessions to be relatively the same for each genome, or will the differences between accessions be genome-specific? We are not absolutely certain about what differences to expect from genomes between accessions. We do know that some accessions are more closely related than others and we can assume they will have a similar differential expression pattern. For accessions we did our contrasts like this : acc1= Mx vs Tx  [lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0))] acc2 = Tom vs Mx/Tx [lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2))] acc3= F1 vs Mx/Tx/Tom [lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1))] I have no way to test accuracy of these results because i dont know what is expected. I was hoping if we implemented the package correctly we could trust the results. > Your comments about zero expression and not detecting significantly DE genes don't make sense to me.  I won't try to respond to those comments because I think sorting out the above questions will probably solve other perceived problems as well. - I will appreciate your input on this matter and thanks for looking into this. Aditi > Date: Thu, 28 Jun 2012 12:01:50 -0700 > To: "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > Subject: [BioC] Contrast Problem > > Hello, > > I am a graduate student at Brigham Young University working on polyploid cotton RNA seq data. Our study design has two explanatory variables, one is 'accession' with four levels (F1,Tom,Tx,Mx) and other one is 'genome' with two levels (A genome or D genome). We want to detect differential expression of genes between 'genomes' from all the accessions and also find genes that are differentially expressed between accessions. We built a (Accession*Genome) model and did a contrast for two levels of 'genomes'. In contrast results we see that many genes with zero expression (0 RPKM) have significant FDRs and some significantly differentially expressed genes are not detected. We are not sure why this is happening, any help will be greatly appreciated. > > Thanks > > Aditi > > We are using following script to do our analysis but > > > library("edgeR") > > counts <- read.table(INFILE, header=T, row.names=1) > groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) > accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) > genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) > > > design2 <- model.matrix(~accessions*genomes) > > dge <- DGEList(counts=counts, group=groups) > dge <- calcNormFactors(dge) > dge2 <- estimateGLMCommonDisp(dge, design2) > dge2 <- estimateGLMTrendedDisp(dge2, design2) > dge2 <- estimateGLMTagwiseDisp(dge2, design2) > > fit2 <- glmFit(dge2, design2) > > lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0)) > lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2)) > lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1)) > lrt.F1 <- glmLRT(dge2, fit2, contrast=c(1,-1,0,0,0,0,0,0)) > lrt.Mx <- glmLRT(dge2, fit2, contrast=c(0,0,1,-1,0,0,0,0)) > lrt.Tx <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,1,-1,0,0)) > lrt.Tom <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,0,0,1,-1)) > > > write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) > write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) > write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) > write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) > write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}
ADD REPLY
0
Entering edit mode
Dear Aditi, Testing for interactions is very simple using the single experimental factor approach that I advised you to try. I am not willing to give you any more advice unless you: 1. Try following my advice, 2. Explain what scientific questions you are trying to answer, and 3. Explain why you chose the contrasts you have. So far, you're not doing any of these things. Best wishes Gordon On Wed, 11 Jul 2012, Aditi Rambani wrote: > Hello,? > Thanks for recommending single experiment approach for our analysis, it > made the contrasts easier. My?committee still thinks we should not > ignore the 'interaction' between the accessions and genomes. I tried to > implement accession*genome model using attached script. There is no way > for me to sanity check these results, I can only compare it with my > single experiment results. I want to make sure I am making the right > contrasts. > acc1 = Mx vs Tx > acc2 = Mx,Tx vs Tom > acc3 = Mx,Tx,Tom vs F1 > rat1 = Mx:D vs Tx:D > rat2 = Mx:D,Tx:D vs Tom:D > rat3 =Mx:D,Tx:D,Tom:D vs F1:D > I will look forward to your response. Thanks > INFILE="analysis/combined.txt" > counts <- read.table(INFILE, header=T, row.names=1) > groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) > genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) > accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) > design <- model.matrix(~accessions*genomes) > dge <- DGEList(counts=counts, group=groups) > dge <- calcNormFactors(dge) > dge <- estimateGLMCommonDisp(dge, design) > dge <- estimateGLMTrendedDisp(dge, design) > dge <- estimateGLMTagwiseDisp(dge, design) > fit <- glmFit(dge, design) > lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,1,0,-1,0,0,0,0)) > lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,1,-2,1,0,0,0,0)) > lrt.acc3 <- glmLRT(dge, fit, contrast=c(0,4,4,4,0,0,0,0)) > lrt.rat1 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,1,0,-1)) > lrt.rat2 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,1,-2,1)) > lrt.rat3 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,4,4,4)) > write.table(topTags(lrt.acc1, n=15000), file="acc1.results", sep="\t", quote=F) > write.table(topTags(lrt.acc2, n=15000), file="acc2.results", sep="\t", quote=F) > write.table(topTags(lrt.acc3, n=15000), file="acc3.results", sep="\t", quote=F) > write.table(topTags(lrt.rat1, n=15000), file="rat1.results", sep="\t", quote=F) > write.table(topTags(lrt.rat2, n=15000), file="rat2.results", sep="\t", quote=F) > write.table(topTags(lrt.rat3, n=15000), file="rat3.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth at="" wehi.edu.au=""> To: Aditi Rambani <aditi_rambani at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Sent: Monday, July 9, 2012 11:08 PM Subject: Re: Contrast Problem The contrasts you are using do not make sense unless you use ? design <- model.matrix(~0+groups) Gordon On Mon, 9 Jul 2012, Aditi Rambani wrote: Thanks for your input. We defined a single experiment with 8 levels and fitted model as a one way layout using attached script. All our contrasts gave us output except one :?lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) We get an error message : Error in beta[k, ] <- betaj[decr, ] : ?NAs are not allowed in subscripted assignments Calls: glmLRT ... glmFit.DGEList -> glmFit -> glmFit.default -> mglmLS Can you please look into this. Thanks THE SCRIPT : library("edgeR") counts <- read.table(INFILE, header=T, row.names=1) groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) design <- model.matrix(~groups) dge <- DGEList(counts=counts, group=groups) dge <- calcNormFactors(dge) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,0,1,1,-1,-1,0,0)) lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,0,1,1,1,1,-2,-2)) lrt.acc3 <- glmLRT(dge, fit, contrast=c(-3,-3,1,1,1,1,1,1)) lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) lrt.Mx <- glmLRT(dge, fit, contrast=c(0,0,1,-1,0,0,0,0)) lrt.Tx <- glmLRT(dge, fit, contrast=c(0,0,0,0,1,-1,0,0)) lrt.Tom <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,0,1,-1)) write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> To: Aditi Rambani <aditi_rambani at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Sent: Friday, June 29, 2012 6:19 PM Subject: Re: Contrast Problem Dear Aditi, The reason you are getting unexpected results is that your contrasts are incorrect.? The meaning of a contrast is that it makes a comparison between the coefficients of the linear model that you have fitted.? Here are the column names of your design matrix: > colnames(design2) "(Intercept)" "accessions2" "accessions3" "accessions4" "genomes2" "accessions2:genomes2" "accessions3:genomes2" "accessions4:genomes2" So when you take the contrast c(1,-1,0,0,0,0,0,0) you are comparing the first coefficient, which is the intercept term, with the second coefficient, which is a difference between accession2 and accession1 for the first genome.? It is not a comparison of any interest. The interaction model that you have fitted is inappropriate for the questions that you want to answer.? I suggest that you instead redefine a single experiment factor with 8 levels, corresponding to all combinations of accession and genome, so that you can fit your model as a one way layout.? I think that will produce coefficients that you will find easier to understand and to take contrasts of. I will not be able to provide more help for some days. Best wishes Gordon On Fri, 29 Jun 2012, Aditi Rambani wrote: Hello, Thanks for your reply, sorry for not being clear enough. > When you say different expression (DE) between genomes for all the accessions, do you mean DE between genomes for each of the accessions separately?? That is DE genes for A.F1 vs D.F1, for A.Tom vs D.Tom, for A.Tx vs D.Tx and for A.Mx vs D.Mx? Yes, that is what i meant. We do a contrast for A.F1 vs D.F1 like this : [contrast=c(1,-1,0,0,0,0,0,0)].? Our problem is that even when contrast is between A.F1(0 rpkm) ?vs D. F1 (0 rpkm) it has significant FDR, I dont understand how it can contrast two columns with zero values and show a significant differential expression. Also, it sometimes ignores genes with?significant?bias but that number is not very high. > When you say DE between accessions, is that for each genome separately? In other words, do you expect the differences between the accessions to be relatively the same for each genome, or will the differences between accessions be genome-specific? We are not absolutely certain about what differences to expect from genomes between accessions. We do know that some accessions are more closely related than others and we can assume they will have a similar differential expression pattern. For accessions we did our contrasts like this : acc1= Mx vs Tx ?[lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0))] acc2 = Tom vs Mx/Tx [lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2))] acc3= F1 vs Mx/Tx/Tom [lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1))] I have no way to test accuracy of these results because i dont know what is expected. I was hoping if we implemented the package correctly we could trust the results. > Your comments about zero expression and not detecting significantly DE genes don't make sense to me.? I won't try to respond to those comments because I think sorting out the above questions will probably solve other perceived problems as well. - I will appreciate your input on this matter and thanks for looking into this.?? Aditi > Date: Thu, 28 Jun 2012 12:01:50 -0700 > From: Aditi Rambani <aditi_rambani at="" yahoo.com=""> > To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] Contrast Problem > ? > Hello,? > ? > I am a graduate student at Brigham Young University working on polyploid cotton RNA seq data.?Our study design has two explanatory variables, one is 'accession' with four levels (F1,Tom,Tx,Mx) and other one is 'genome' with two levels (A genome or D genome). We want to detect differential expression of genes between 'genomes' from all the accessions and also find genes that are differentially expressed between accessions. We built a (Accession*Genome) model and did a contrast for two levels of 'genomes'. In contrast results we see that many genes with zero expression (0 RPKM) have significant FDRs and some significantly differentially expressed genes are not detected. We are not sure why this is happening, any help will be greatly appreciated. > ? > Thanks > ? > Aditi? > ? > We are using following script to do our analysis but? > ? > ? > library("edgeR") > ? > counts <- read.table(INFILE, header=T, row.names=1) > groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) > accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) > genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) > ? > ? > design2 <- model.matrix(~accessions*genomes) > ? > dge <- DGEList(counts=counts, group=groups) > dge <- calcNormFactors(dge) > dge2 <- estimateGLMCommonDisp(dge, design2) > dge2 <- estimateGLMTrendedDisp(dge2, design2) > dge2 <- estimateGLMTagwiseDisp(dge2, design2) > ? > fit2 <- glmFit(dge2, design2) > ? > lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0)) > lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2)) > lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1)) > lrt.F1 <- glmLRT(dge2, fit2, contrast=c(1,-1,0,0,0,0,0,0)) > lrt.Mx <- glmLRT(dge2, fit2, contrast=c(0,0,1,-1,0,0,0,0)) > lrt.Tx <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,1,-1,0,0)) > lrt.Tom <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,0,0,1,-1)) > ? > ? > write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) > write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) > write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) > write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) > write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD REPLY
0
Entering edit mode
Dear Aditi, I have written a new chapter for the edgeR User's Guide, trying to give advice for experiments like yours. You might find it helpful to look at Section 3.3 especially of: http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/ed geRUsersGuide.pdf Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, http://www.statsci.org/smyth On Wed, 11 Jul 2012, Aditi Rambani wrote: > Hello,? Thanks for recommending single experiment approach for our analysis, it made the contrasts easier. My?committee still thinks we should not ignore the 'interaction' between the accessions and genomes. I tried to implement accession*genome model using attached script. There is no way for me to sanity check these results, I can only compare it with my single experiment results. I want to make sure I am making the right contrasts. acc1 = Mx vs Tx acc2 = Mx,Tx vs Tom acc3 = Mx,Tx,Tom vs F1 rat1 = Mx:D vs Tx:D rat2 = Mx:D,Tx:D vs Tom:D rat3 =Mx:D,Tx:D,Tom:D vs F1:D I will look forward to your response. Thanks INFILE="analysis/combined.txt" counts <- read.table(INFILE, header=T, row.names=1) groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) design <- model.matrix(~accessions*genomes) dge <- DGEList(counts=counts, group=groups) dge <- calcNormFactors(dge) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,1,0,-1,0,0,0,0)) lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,1,-2,1,0,0,0,0)) lrt.acc3 <- glmLRT(dge, fit, contrast=c(0,4,4,4,0,0,0,0)) lrt.rat1 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,1,0,-1)) lrt.rat2 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,1,-2,1)) lrt.rat3 <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,4,4,4)) write.table(topTags(lrt.acc1, n=15000), file="acc1.results", sep="\t", quote=F) write.table(topTags(lrt.acc2, n=15000), file="acc2.results", sep="\t", quote=F) write.table(topTags(lrt.acc3, n=15000), file="acc3.results", sep="\t", quote=F) write.table(topTags(lrt.rat1, n=15000), file="rat1.results", sep="\t", quote=F) write.table(topTags(lrt.rat2, n=15000), file="rat2.results", sep="\t", quote=F) write.table(topTags(lrt.rat3, n=15000), file="rat3.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth at="" wehi.edu.au=""> To: Aditi Rambani <aditi_rambani at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Sent: Monday, July 9, 2012 11:08 PM Subject: Re: Contrast Problem The contrasts you are using do not make sense unless you use ? design <- model.matrix(~0+groups) Gordon On Mon, 9 Jul 2012, Aditi Rambani wrote: Thanks for your input. We defined a single experiment with 8 levels and fitted model as a one way layout using attached script. All our contrasts gave us output except one :?lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) We get an error message : Error in beta[k, ] <- betaj[decr, ] : ?NAs are not allowed in subscripted assignments Calls: glmLRT ... glmFit.DGEList -> glmFit -> glmFit.default -> mglmLS Can you please look into this. Thanks THE SCRIPT : library("edgeR") counts <- read.table(INFILE, header=T, row.names=1) groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) design <- model.matrix(~groups) dge <- DGEList(counts=counts, group=groups) dge <- calcNormFactors(dge) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) lrt.acc1 <- glmLRT(dge, fit, contrast=c(0,0,1,1,-1,-1,0,0)) lrt.acc2 <- glmLRT(dge, fit, contrast=c(0,0,1,1,1,1,-2,-2)) lrt.acc3 <- glmLRT(dge, fit, contrast=c(-3,-3,1,1,1,1,1,1)) lrt.F1 <- glmLRT(dge, fit, contrast=c(1,-1,0,0,0,0,0,0)) lrt.Mx <- glmLRT(dge, fit, contrast=c(0,0,1,-1,0,0,0,0)) lrt.Tx <- glmLRT(dge, fit, contrast=c(0,0,0,0,1,-1,0,0)) lrt.Tom <- glmLRT(dge, fit, contrast=c(0,0,0,0,0,0,1,-1)) write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> To: Aditi Rambani <aditi_rambani at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Sent: Friday, June 29, 2012 6:19 PM Subject: Re: Contrast Problem Dear Aditi, The reason you are getting unexpected results is that your contrasts are incorrect.? The meaning of a contrast is that it makes a comparison between the coefficients of the linear model that you have fitted.? Here are the column names of your design matrix: > colnames(design2) "(Intercept)" "accessions2" "accessions3" "accessions4" "genomes2" "accessions2:genomes2" "accessions3:genomes2" "accessions4:genomes2" So when you take the contrast c(1,-1,0,0,0,0,0,0) you are comparing the first coefficient, which is the intercept term, with the second coefficient, which is a difference between accession2 and accession1 for the first genome.? It is not a comparison of any interest. The interaction model that you have fitted is inappropriate for the questions that you want to answer.? I suggest that you instead redefine a single experiment factor with 8 levels, corresponding to all combinations of accession and genome, so that you can fit your model as a one way layout.? I think that will produce coefficients that you will find easier to understand and to take contrasts of. I will not be able to provide more help for some days. Best wishes Gordon On Fri, 29 Jun 2012, Aditi Rambani wrote: Hello, Thanks for your reply, sorry for not being clear enough. > When you say different expression (DE) between genomes for all the accessions, do you mean DE between genomes for each of the accessions separately?? That is DE genes for A.F1 vs D.F1, for A.Tom vs D.Tom, for A.Tx vs D.Tx and for A.Mx vs D.Mx? Yes, that is what i meant. We do a contrast for A.F1 vs D.F1 like this : [contrast=c(1,-1,0,0,0,0,0,0)].? Our problem is that even when contrast is between A.F1(0 rpkm) ?vs D. F1 (0 rpkm) it has significant FDR, I dont understand how it can contrast two columns with zero values and show a significant differential expression. Also, it sometimes ignores genes with?significant?bias but that number is not very high. > When you say DE between accessions, is that for each genome separately? In other words, do you expect the differences between the accessions to be relatively the same for each genome, or will the differences between accessions be genome-specific? We are not absolutely certain about what differences to expect from genomes between accessions. We do know that some accessions are more closely related than others and we can assume they will have a similar differential expression pattern. For accessions we did our contrasts like this : acc1= Mx vs Tx ?[lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0))] acc2 = Tom vs Mx/Tx [lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2))] acc3= F1 vs Mx/Tx/Tom [lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1))] I have no way to test accuracy of these results because i dont know what is expected. I was hoping if we implemented the package correctly we could trust the results. > Your comments about zero expression and not detecting significantly DE genes don't make sense to me.? I won't try to respond to those comments because I think sorting out the above questions will probably solve other perceived problems as well. - I will appreciate your input on this matter and thanks for looking into this.?? Aditi > Date: Thu, 28 Jun 2012 12:01:50 -0700 > From: Aditi Rambani <aditi_rambani at="" yahoo.com=""> > To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] Contrast Problem > ? > Hello,? > ? > I am a graduate student at Brigham Young University working on polyploid cotton RNA seq data.?Our study design has two explanatory variables, one is 'accession' with four levels (F1,Tom,Tx,Mx) and other one is 'genome' with two levels (A genome or D genome). We want to detect differential expression of genes between 'genomes' from all the accessions and also find genes that are differentially expressed between accessions. We built a (Accession*Genome) model and did a contrast for two levels of 'genomes'. In contrast results we see that many genes with zero expression (0 RPKM) have significant FDRs and some significantly differentially expressed genes are not detected. We are not sure why this is happening, any help will be greatly appreciated. > ? > Thanks > ? > Aditi? > ? > We are using following script to do our analysis but? > ? > ? > library("edgeR") > ? > counts <- read.table(INFILE, header=T, row.names=1) > groups <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8)) > accessions <- factor(c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)) > genomes <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2)) > ? > ? > design2 <- model.matrix(~accessions*genomes) > ? > dge <- DGEList(counts=counts, group=groups) > dge <- calcNormFactors(dge) > dge2 <- estimateGLMCommonDisp(dge, design2) > dge2 <- estimateGLMTrendedDisp(dge2, design2) > dge2 <- estimateGLMTagwiseDisp(dge2, design2) > ? > fit2 <- glmFit(dge2, design2) > ? > lrt.acc1 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,-1,-1,0,0)) > lrt.acc2 <- glmLRT(dge2, fit2, contrast=c(0,0,1,1,1,1,-2,-2)) > lrt.acc3 <- glmLRT(dge2, fit2, contrast=c(-3,-3,1,1,1,1,1,1)) > lrt.F1 <- glmLRT(dge2, fit2, contrast=c(1,-1,0,0,0,0,0,0)) > lrt.Mx <- glmLRT(dge2, fit2, contrast=c(0,0,1,-1,0,0,0,0)) > lrt.Tx <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,1,-1,0,0)) > lrt.Tom <- glmLRT(dge2, fit2, contrast=c(0,0,0,0,0,0,1,-1)) > ? > ? > write.table(topTags(lrt.acc1, n=10000), file="acc1.results", sep="\t", quote=F) > write.table(topTags(lrt.acc2, n=10000), file="acc2.results", sep="\t", quote=F) > write.table(topTags(lrt.acc3, n=10000), file="acc3.results", sep="\t", quote=F) > write.table(topTags(lrt.F1, n=10000), file="F1.results", sep="\t", quote=F) > write.tabletopTagslrt.Mx, n=10000), file="Mx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tx, n=10000), file="Tx.results", sep="\t", quote=F) > write.table(topTags(lrt.Tom, n=10000), file="Tom.results", sep="\t", quote=F) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:10}}
ADD REPLY

Login before adding your answer.

Traffic: 978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6