EdgeR multi-factor design questions

0

Entering edit mode

Morgan Mouchka ▴ 30

@morgan-mouchka-5848

Last seen 11.4 years ago

Hello all, Im currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual. More specifically, I have two questions. The first is whether I have chosen the correct method for analyzing my multi-factorial experiment. I have 2 factors, state (apo, sym) and treatment (control, treatment). Im interested if there is a differential response due to the treatment that varies by state. In other words, do apos respond to the treatment in the same way that syms respond? To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups. I have pasted my session output and preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question. Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other? For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state? It seems that this could be some sort of interaction term, but Im uncertain as to the best way to set this up. Im using the 3.0.8 version of EdgeR and the 2.15.2 version of R. I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment. Thanks so much, Morgan Morgan Mouchka PhD Candidate E231 Corson Hall Dept. Ecology and Evolutionary Biology Cornell University mep74@cornell.edu http://www.eeb.cornell.edu/mouchka > x <- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig") > head (x) AC3 AC4 AC5 AC7 AT1 AT2 Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55 Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2 Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961 Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8 Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470 AT4 AT5 SC1 SC2 SC3 SC5 Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49 Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2 Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788 Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5 Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194 ST1 ST2 ST3 ST5 Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82 Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3 Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488 Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8 Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221 > #assign samples to groups, A=AC, B=AT, C=SC, D=ST > group <- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D ","D","D","D")) > # Define data object (list-based, includes info for counts and info about samples) and will calculate library size > y <- DGEList (counts = x, group = group) Calculating library sizes from column totals. > #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case) > keep <- rowSums (cpm(y)>1) >= 4 > y2 <- y[keep,] > dim(y2) [1] 17739 16 > y2$samples$lib.size <- colSums(y2$counts) > # Normalize for sample-specific effects > y3 <- calcNormFactors (y2) > y3$samples group lib.size norm.factors AC3 A 11095093 1.0237872 AC4 A 5604841 0.9912734 AC5 A 7517691 0.9824405 AC7 A 5918039 0.9919439 AT1 B 6004830 0.9609672 AT2 B 7692419 0.9543765 AT4 B 5197815 0.9575609 AT5 B 7759281 0.9643070 SC1 C 3542875 1.0693587 SC2 C 6720709 1.0488297 SC3 C 3881774 1.0022528 SC5 C 5109584 1.0429925 ST1 D 10429666 0.9815467 ST2 D 9662842 1.0103866 ST3 D 8267838 1.0198095 ST5 D 10698815 1.0069064 > #plot biological coefficient of variation (BVC)between samples > plotMDS(y3) > #check levels > levels (y3$samples$group) [1] "A" "B" "C" "D" > #design matrix to describe treatment conditions > design <- model.matrix(~0+group, data=y3$samples) > colnames(design) <- c("A", "B", "C", "D") > design A B C D AC3 1 0 0 0 AC4 1 0 0 0 AC5 1 0 0 0 AC7 1 0 0 0 AT1 0 1 0 0 AT2 0 1 0 0 AT4 0 1 0 0 AT5 0 1 0 0 SC1 0 0 1 0 SC2 0 0 1 0 SC3 0 0 1 0 SC5 0 0 1 0 ST1 0 0 0 1 ST2 0 0 0 1 ST3 0 0 0 1 ST5 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$group [1] "contr.treatment" > y4 <- estimateGLMCommonDisp(y3, design, verbose=TRUE) Disp = 0.02061 , BCV = 0.1436 > #estimate gene-wise dispersion > y5 <- estimateGLMTrendedDisp(y4, design) > y6 <- estimateGLMTagwiseDisp(y5,design) > plotBCV(y6) > fit <- glmFit(y6, design) > # make contrasts > my.contrasts <- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design) > #pairwise comparison of B and A (apo treatmetn and apo control) > lrt.BvsA <- glmLRT(fit, contrast=my.contrasts[,"BvsA"]) > # summary of model fitting funtions > summary(de<-decideTestsDGE(lrt.BvsA)) [,1] -1 562 0 16091 1 1086 > lrt.DvsC <- glmLRT(fit, contrast=my.contrasts[,"DvsC"]) > summary(de<-decideTestsDGE(lrt.DvsC)) [,1] -1 496 0 16710 1 533 [[alternative HTML version deleted]]

RNASeq edgeR RNASeq edgeR • 1.8k views

ADD COMMENT • link updated 12.9 years ago by James W. MacDonald 68k • written 12.9 years ago by Morgan Mouchka ▴ 30

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 17 hours ago

United States

Hi Morgan, On 3/25/2013 3:15 PM, Morgan Mouchka wrote: > Hello all, > > I?m currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual. > > > More specifically, I have two questions. The first is whether I have chosen the correct method for analyzing my multi-factorial experiment. I have 2 factors, state (apo, sym) and treatment (control, treatment). I?m interested if there is a differential response due to the treatment that varies by state. In other words, do apos respond to the treatment in the same way that syms respond? > > > To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups. I have pasted my session output and preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question. > > Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other? For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state? It seems that this could be some sort of interaction term, but I?m uncertain as to the best way to set this up. This is definitely an interaction term. How you set it up is dependent on what parameterization makes the most sense to you. Following your method below, you want glmLRT(fit, contrast = c(1,-1,-1,1)) or you could do state <- factor(rep(1:2, each = 8)) treat <- factor(rep(1:2, each=4, times=2)) design <- model.matrix(~treat*state) and then glmLRT(fit, coef=4) will test the interaction, and coefficients 2 and 3 test the main effects for treatment and state, as you have done below. Best, Jim > > I?m using the 3.0.8 version of EdgeR and the 2.15.2 version of R. I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment. > > Thanks so much, > > Morgan > > Morgan Mouchka > PhD Candidate > E231 Corson Hall > Dept. Ecology and Evolutionary Biology > Cornell University > mep74 at cornell.edu > http://www.eeb.cornell.edu/mouchka > >> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig") >> head (x) > AC3 AC4 AC5 AC7 AT1 AT2 > Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55 > Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2 > Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961 > Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8 > Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 > Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470 > AT4 AT5 SC1 SC2 SC3 SC5 > Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49 > Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2 > Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788 > Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5 > Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 > Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194 > ST1 ST2 ST3 ST5 > Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82 > Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3 > Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488 > Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8 > Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 > Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221 >> #assign samples to groups, A=AC, B=AT, C=SC, D=ST >> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D ","D","D","D")) >> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size >> y<- DGEList (counts = x, group = group) > Calculating library sizes from column totals. >> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case) >> keep<- rowSums (cpm(y)>1)>= 4 >> y2<- y[keep,] >> dim(y2) > [1] 17739 16 >> y2$samples$lib.size<- colSums(y2$counts) >> # Normalize for sample-specific effects >> y3<- calcNormFactors (y2) >> y3$samples > group lib.size norm.factors > AC3 A 11095093 1.0237872 > AC4 A 5604841 0.9912734 > AC5 A 7517691 0.9824405 > AC7 A 5918039 0.9919439 > AT1 B 6004830 0.9609672 > AT2 B 7692419 0.9543765 > AT4 B 5197815 0.9575609 > AT5 B 7759281 0.9643070 > SC1 C 3542875 1.0693587 > SC2 C 6720709 1.0488297 > SC3 C 3881774 1.0022528 > SC5 C 5109584 1.0429925 > ST1 D 10429666 0.9815467 > ST2 D 9662842 1.0103866 > ST3 D 8267838 1.0198095 > ST5 D 10698815 1.0069064 >> #plot biological coefficient of variation (BVC)between samples >> plotMDS(y3) >> #check levels >> levels (y3$samples$group) > [1] "A" "B" "C" "D" >> #design matrix to describe treatment conditions >> design<- model.matrix(~0+group, data=y3$samples) >> colnames(design)<- c("A", "B", "C", "D") >> design > A B C D > AC3 1 0 0 0 > AC4 1 0 0 0 > AC5 1 0 0 0 > AC7 1 0 0 0 > AT1 0 1 0 0 > AT2 0 1 0 0 > AT4 0 1 0 0 > AT5 0 1 0 0 > SC1 0 0 1 0 > SC2 0 0 1 0 > SC3 0 0 1 0 > SC5 0 0 1 0 > ST1 0 0 0 1 > ST2 0 0 0 1 > ST3 0 0 0 1 > ST5 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$group > [1] "contr.treatment" > >> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE) > Disp = 0.02061 , BCV = 0.1436 >> #estimate gene-wise dispersion >> y5<- estimateGLMTrendedDisp(y4, design) >> y6<- estimateGLMTagwiseDisp(y5,design) >> plotBCV(y6) >> fit<- glmFit(y6, design) >> # make contrasts >> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design) >> #pairwise comparison of B and A (apo treatmetn and apo control) >> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"]) >> # summary of model fitting funtions >> summary(de<-decideTestsDGE(lrt.BvsA)) > [,1] > -1 562 > 0 16091 > 1 1086 >> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"]) >> summary(de<-decideTestsDGE(lrt.DvsC)) > [,1] > -1 496 > 0 16710 > 1 533 > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.9 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Jim, On Mar 25, 2013, at 4:01 PM, James W. MacDonald wrote: > Hi Morgan, > > On 3/25/2013 3:15 PM, Morgan Mouchka wrote: >> Hello all, >> >> Im currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual. >> >> >> More specifically, I have two questions. The first is whether I have chosen the correct method for analyzing my multi-factorial experiment. I have 2 factors, state (apo, sym) and treatment (control, treatment). Im interested if there is a differential response due to the treatment that varies by state. In other words, do apos respond to the treatment in the same way that syms respond? >> >> >> To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups. I have pasted my session output and preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question. >> >> Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other? For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state? It seems that this could be some sort of interaction term, but Im uncertain as to the best way to set this up. > > This is definitely an interaction term. How you set it up is dependent on what parameterization makes the most sense to you. Following your method below, you want > > glmLRT(fit, contrast = c(1,-1,-1,1)) I'm a bit confused, but wouldn't this model be combining the controls (apo and sym) and comparing it to the treatments (apo and sym) in an additive fashion? When I run this model, I get different results than when I run your second model below. Are they suppose to provide the same output or did I misunderstand your post? (Certainly likely, I'm very new to GLM and edgeR). > > or you could do > > state <- factor(rep(1:2, each = 8)) > treat <- factor(rep(1:2, each=4, times=2)) > > design <- model.matrix(~treat*state) > > and then > > glmLRT(fit, coef=4) > > will test the interaction, and coefficients 2 and 3 test the main effects for treatment and state, as you have done below. When I use coefficients 2 and 3 in the above model, I get different results from when I compare A and B (apo control to apo treatment) and C and D (sym control to sym treatment) in my original model. Should they be the same? I have posted my most recent session below using topTags to demonstrate the differing results. I ran my model first and then ran your recommended models, followed by your model with coefficients 2 and 3. Hopefully this all makes sense? > Thanks so much, Morgan > library(edgeR) > #reading the counts from a file > x <- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig") > head (x) AC3 AC4 AC5 AC7 AT1 AT2 Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55 Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2 Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961 Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8 Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470 AT4 AT5 SC1 SC2 SC3 SC5 Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49 Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2 Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788 Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5 Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194 ST1 ST2 ST3 ST5 Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82 Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3 Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488 Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8 Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221 #MY MODEL > #assign samples to groups, A=AC, B=AT, C=SC, D=ST > group <- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D ","D","D","D")) > # Define data object (list-based, includes info for counts and info about samples) and will calculate library size > y <- DGEList (counts = x, group = group) Calculating library sizes from column totals. > #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case) > keep <- rowSums (cpm(y)>1) >= 4 > y2 <- y[keep,] > dim(y2) [1] 17739 16 > y2$samples$lib.size <- colSums(y2$counts) > # Normalize for sample-specific effects > y3 <- calcNormFactors (y2) > y3$samples group lib.size norm.factors AC3 A 11095093 1.0237872 AC4 A 5604841 0.9912734 AC5 A 7517691 0.9824405 AC7 A 5918039 0.9919439 AT1 B 6004830 0.9609672 AT2 B 7692419 0.9543765 AT4 B 5197815 0.9575609 AT5 B 7759281 0.9643070 SC1 C 3542875 1.0693587 SC2 C 6720709 1.0488297 SC3 C 3881774 1.0022528 SC5 C 5109584 1.0429925 ST1 D 10429666 0.9815467 ST2 D 9662842 1.0103866 ST3 D 8267838 1.0198095 ST5 D 10698815 1.0069064 > design <- model.matrix(~0+group, data=y3$samples) > colnames(design) <- c("A", "B", "C", "D") > design A B C D AC3 1 0 0 0 AC4 1 0 0 0 AC5 1 0 0 0 AC7 1 0 0 0 AT1 0 1 0 0 AT2 0 1 0 0 AT4 0 1 0 0 AT5 0 1 0 0 SC1 0 0 1 0 SC2 0 0 1 0 SC3 0 0 1 0 SC5 0 0 1 0 ST1 0 0 0 1 ST2 0 0 0 1 ST3 0 0 0 1 ST5 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$group [1] "contr.treatment" > y4 <- estimateGLMCommonDisp(y3, design, verbose=TRUE) Disp = 0.02061 , BCV = 0.1436 > #estimate gene-wise dispersion > y5 <- estimateGLMTrendedDisp(y4, design) Loading required package: splines > y6 <- estimateGLMTagwiseDisp(y5,design) > fit <- glmFit(y6, design) > my.contrasts <- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design) > #pairwise comparison of B and A > lrt.BvsA <- glmLRT(fit, contrast=my.contrasts[,"BvsA"]) > #table of top DE tags > topTags(lrt.BvsA) Coefficient: -1*A 1*B logFC logCPM LR Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 4.506598 8.226827 461.5764 Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 4.620617 7.708649 442.6438 Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 4.281604 4.374277 339.8684 Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 2.661941 7.444760 280.1675 Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840 2.227577 6.807869 256.4828 Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 2.451201 9.549261 252.1390 Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099 2.119708 7.827086 246.6841 Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 2.080116 6.409451 246.1767 Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963 2.124257 8.493526 244.0451 Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818 3.282939 6.540550 239.7896 PValue FDR Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.181936e-102 3.870537e-98 Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 2.877748e-98 2.552418e-94 Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 6.816117e-76 4.030370e-72 Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 6.903808e-63 3.061666e-59 Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840 1.002773e-57 3.557638e-54 Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 8.874275e-57 2.623679e-53 Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099 1.371971e-55 3.476771e-52 Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 1.770015e-55 3.924786e-52 Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963 5.160772e-55 1.017188e-51 Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818 4.371068e-54 7.753838e-51 > lrt.DvsC <- glmLRT(fit, contrast=my.contrasts[,"DvsC"]) > topTags(lrt.DvsC) Coefficient: -1*C 1*D logFC logCPM LR Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.477714 8.226827 157.20112 Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425 -1.986510 5.205879 131.37771 Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676 -1.451476 6.950002 103.49253 Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096 -1.574898 6.554798 88.60841 Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 -2.204284 3.908595 87.97649 Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455 2.060784 3.603245 80.94718 Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 2.280971 4.374277 80.10507 Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588 -2.016184 5.268778 80.01864 Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889 -2.035882 3.040181 76.14863 Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 -1.538410 5.854172 72.23494 PValue FDR Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 4.625963e-36 8.205996e-32 Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425 2.047045e-30 1.815627e-26 Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676 2.613781e-24 1.545529e-20 Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096 4.812378e-21 2.134169e-17 Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 6.623716e-21 2.349962e-17 Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455 2.318323e-19 6.854121e-16 Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 3.550203e-19 8.224107e-16 Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588 3.708938e-19 8.224107e-16 Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889 2.630983e-18 5.185667e-15 Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 1.910427e-17 3.388906e-14 #JIM'S 1ST MODEL > #interactions effect to test if Sym Treat diff from Apo Treat > lrt.interaction <- glmLRT(fit, contrast = c(1,-1,-1,1)) > topTags(lrt.interaction) Coefficient: 1*A -1*B -1*C 1*D logFC logCPM LR Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 -4.134843 7.708649 220.54182 Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 -1.899300 7.444760 76.74320 Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 -1.867935 6.863793 71.58929 Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 -1.547852 6.409451 71.51138 Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460 -1.489774 6.525675 64.83901 Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213 -1.546709 7.548864 61.65883 Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 -1.360292 8.389845 59.29519 Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 -1.606681 9.549261 59.26803 Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295 -1.274685 7.759520 58.79090 Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 -2.028884 8.226827 57.76618 PValue FDR Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 6.889670e-50 1.222159e-45 Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 1.946973e-18 1.726868e-14 Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 2.649905e-17 1.222497e-13 Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 2.756632e-17 1.222497e-13 Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460 8.127420e-16 2.883446e-12 Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213 4.084321e-15 1.207530e-11 Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 1.357077e-14 3.050973e-11 Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 1.375939e-14 3.050973e-11 Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295 1.753507e-14 3.456161e-11 Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.952016e-14 5.236581e-11 #ALTERNATIVE > state <- factor(rep(1:2, each = 8)) > treat <- factor(rep(1:2, each=4, times=2)) > design <- model.matrix(~treat*state) > design (Intercept) treat2 state2 treat2:state2 1 1 0 0 0 2 1 0 0 0 3 1 0 0 0 4 1 0 0 0 5 1 1 0 0 6 1 1 0 0 7 1 1 0 0 8 1 1 0 0 9 1 0 1 0 10 1 0 1 0 11 1 0 1 0 12 1 0 1 0 13 1 1 1 1 14 1 1 1 1 15 1 1 1 1 16 1 1 1 1 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$treat [1] "contr.treatment" attr(,"contrasts")$state [1] "contr.treatment" > lrt.interaction2 <- glmLRT(fit, coef=4) > topTags(lrt.interaction2) Coefficient: D logFC logCPM LR Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 -23.55939 -0.05067457 1620.221 Locus_52010_Transcript_1/1_Confidence_0.000_Length_731 -22.84920 -0.18609176 1955.056 Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 -22.37722 -0.88447008 1923.027 Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 -22.19649 -0.63260884 3087.708 Locus_124155_Transcript_1/1_Confidence_1.000_Length_648 -22.02352 -0.80276576 2491.409 Locus_1722_Transcript_1/1_Confidence_0.000_Length_731 -21.87997 -0.75239480 3251.146 Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.87383 -0.23507267 1632.186 Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357 -21.62135 -0.13684358 1905.960 Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 -21.40488 -0.62850673 2271.751 Locus_47566_Transcript_1/1_Confidence_0.000_Length_992 -21.40385 -0.62656773 2492.533 PValue FDR Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 0 0 Locus_52010_Transcript_1/1_Confidence_0.000_Length_731 0 0 Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 0 0 Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 0 0 Locus_124155_Transcript_1/1_Confidence_1.000_Length_648 0 0 Locus_1722_Transcript_1/1_Confidence_0.000_Length_731 0 0 Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 0 0 Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357 0 0 Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 0 0 Locus_47566_Transcript_1/1_Confidence_0.000_Length_992 0 0 #JIM'S 1ST MODEL: SAME AS C AND D COMPARISON (SYM CONTROL TO SYM TREATMENT)? > lrt.3 <- glmLRT(fit, coef=3) > topTags(lrt.3) Coefficient: C logFC logCPM LR Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 -23.20368 -0.63260884 2954.522 Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 -22.24145 -0.05067457 1529.706 Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 -22.23039 -0.57979449 2739.489 Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 -22.22741 -0.62850673 2193.659 Locus_34242_Transcript_1/1_Confidence_0.000_Length_290 -21.91166 -0.09022093 2767.658 Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 -21.91140 -0.57058617 1714.034 Locus_50361_Transcript_1/1_Confidence_0.000_Length_899 -21.90846 -0.07570223 3706.103 Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 -21.90643 -0.74190749 2140.909 Locus_8741_Transcript_2/2_Confidence_1.000_Length_380 -21.65265 -0.07156613 1494.994 Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.45320 -0.23507267 1558.686 PValue FDR Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 0 0 Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 0 0 Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 0 0 Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 0 0 Locus_34242_Transcript_1/1_Confidence_0.000_Length_290 0 0 Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 0 0 Locus_50361_Transcript_1/1_Confidence_0.000_Length_899 0 0 Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 0 0 Locus_8741_Transcript_2/2_Confidence_1.000_Length_380 0 0 Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 0 0 #JIM'S 1ST MODEL: SAME AS A AND B COMPARISON (APO CONTROL AND APO TREATMENT)? > lrt.2 <- glmLRT(fit, coef=2) > topTags(lrt.2) Coefficient: B logFC logCPM LR Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 -27.77711 6.430503 3886.142 Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422 -27.77711 3.945365 2525.716 Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333 -27.77711 3.041338 2328.365 Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222 -27.77711 6.168745 2973.233 Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091 -27.77711 4.657887 2296.824 Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 -27.77711 3.300734 2269.282 Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211 -27.77711 2.771320 2360.788 Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406 -27.77711 4.930170 3599.471 Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 -27.77711 2.112580 1802.150 Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925 -27.77711 6.014430 2939.592 PValue FDR Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 0 0 Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422 0 0 Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333 0 0 Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222 0 0 Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091 0 0 Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 0 0 Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211 0 0 Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406 0 0 Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 0 0 Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925 0 0 > Best, > > Jim > > >> >> Im using the 3.0.8 version of EdgeR and the 2.15.2 version of R. I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment. >> >> Thanks so much, >> >> Morgan >> >> Morgan Mouchka >> PhD Candidate >> E231 Corson Hall >> Dept. Ecology and Evolutionary Biology >> Cornell University >> mep74@cornell.edu >> http://www.eeb.cornell.edu/mouchka >> >>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig") >>> head (x) >> AC3 AC4 AC5 AC7 AT1 AT2 >> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55 >> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2 >> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961 >> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8 >> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 >> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470 >> AT4 AT5 SC1 SC2 SC3 SC5 >> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49 >> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2 >> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788 >> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5 >> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 >> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194 >> ST1 ST2 ST3 ST5 >> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82 >> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3 >> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488 >> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8 >> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 >> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221 >>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST >>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C"," D","D","D","D")) >>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size >>> y<- DGEList (counts = x, group = group) >> Calculating library sizes from column totals. >>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case) >>> keep<- rowSums (cpm(y)>1)>= 4 >>> y2<- y[keep,] >>> dim(y2) >> [1] 17739 16 >>> y2$samples$lib.size<- colSums(y2$counts) >>> # Normalize for sample-specific effects >>> y3<- calcNormFactors (y2) >>> y3$samples >> group lib.size norm.factors >> AC3 A 11095093 1.0237872 >> AC4 A 5604841 0.9912734 >> AC5 A 7517691 0.9824405 >> AC7 A 5918039 0.9919439 >> AT1 B 6004830 0.9609672 >> AT2 B 7692419 0.9543765 >> AT4 B 5197815 0.9575609 >> AT5 B 7759281 0.9643070 >> SC1 C 3542875 1.0693587 >> SC2 C 6720709 1.0488297 >> SC3 C 3881774 1.0022528 >> SC5 C 5109584 1.0429925 >> ST1 D 10429666 0.9815467 >> ST2 D 9662842 1.0103866 >> ST3 D 8267838 1.0198095 >> ST5 D 10698815 1.0069064 >>> #plot biological coefficient of variation (BVC)between samples >>> plotMDS(y3) >>> #check levels >>> levels (y3$samples$group) >> [1] "A" "B" "C" "D" >>> #design matrix to describe treatment conditions >>> design<- model.matrix(~0+group, data=y3$samples) >>> colnames(design)<- c("A", "B", "C", "D") >>> design >> A B C D >> AC3 1 0 0 0 >> AC4 1 0 0 0 >> AC5 1 0 0 0 >> AC7 1 0 0 0 >> AT1 0 1 0 0 >> AT2 0 1 0 0 >> AT4 0 1 0 0 >> AT5 0 1 0 0 >> SC1 0 0 1 0 >> SC2 0 0 1 0 >> SC3 0 0 1 0 >> SC5 0 0 1 0 >> ST1 0 0 0 1 >> ST2 0 0 0 1 >> ST3 0 0 0 1 >> ST5 0 0 0 1 >> attr(,"assign") >> [1] 1 1 1 1 >> attr(,"contrasts") >> attr(,"contrasts")$group >> [1] "contr.treatment" >> >>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE) >> Disp = 0.02061 , BCV = 0.1436 >>> #estimate gene-wise dispersion >>> y5<- estimateGLMTrendedDisp(y4, design) >>> y6<- estimateGLMTagwiseDisp(y5,design) >>> plotBCV(y6) >>> fit<- glmFit(y6, design) >>> # make contrasts >>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design) >>> #pairwise comparison of B and A (apo treatmetn and apo control) >>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"]) >>> # summary of model fitting funtions >>> summary(de<-decideTestsDGE(lrt.BvsA)) >> [,1] >> -1 562 >> 0 16091 >> 1 1086 >>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"]) >>> summary(de<-decideTestsDGE(lrt.DvsC)) >> [,1] >> -1 496 >> 0 16710 >> 1 533 >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > Morgan Mouchka PhD Candidate E231 Corson Hall Dept. Ecology and Evolutionary Biology Cornell University mep74@cornell.edu http://www.eeb.cornell.edu/mouchka [[alternative HTML version deleted]]

ADD REPLY • link 12.9 years ago Morgan Mouchka ▴ 30

0

Entering edit mode

Hi Morgan, On 3/26/2013 4:40 PM, Morgan Mouchka wrote: > Hi Jim, > > On Mar 25, 2013, at 4:01 PM, James W. MacDonald wrote: > >> Hi Morgan, >> >> On 3/25/2013 3:15 PM, Morgan Mouchka wrote: >>> Hello all, >>> >>> I?m currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual. >>> >>> >>> More specifically, I have two questions. The first is whether I have chosen the correct method for analyzing my multi-factorial experiment. I have 2 factors, state (apo, sym) and treatment (control, treatment). I?m interested if there is a differential response due to the treatment that varies by state. In other words, do apos respond to the treatment in the same way that syms respond? >>> >>> >>> To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups. I have pasted my session output and preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question. >>> >>> Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other? For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state? It seems that this could be some sort of interaction term, but I?m uncertain as to the best way to set this up. >> This is definitely an interaction term. How you set it up is dependent on what parameterization makes the most sense to you. Following your method below, you want >> >> glmLRT(fit, contrast = c(1,-1,-1,1)) > I'm a bit confused, but wouldn't this model be combining the controls (apo and sym) and comparing it to the treatments (apo and sym) in an additive fashion? When I run this model, I get different results than when I run your second model below. Are they suppose to provide the same output or did I misunderstand your post? (Certainly likely, I'm very new to GLM and edgeR). This contrast is the same as (Sym trt - Sym ctrl) - (Apo trt - Apo ctrl), which tests to see if the difference between treatment and control varies according to state. This is the same as the coefficient treat2:state2 in the alternative parameterization below. But two things here. First, I had a brain cramp in my earlier email to you. The correct interpretations of the coefficients are Apo ctrl Apo trt - Apo ctrl Sym ctrl - Apo ctrl (Sym trt - Sym ctrl) - (Apo trt - Apo ctrl) So coef = 3 doesn't give you Sym trt - Sym ctrl. It is more difficult to get that contrast from this parameterization, so sticking with what you already had is the easiest way to go. Second, I don't see in your code below where you re-fit the model using the alternative parameterization, so I tried this with some data I have in hand. Note that you have to go all the way back to the beginning, recreating the dgeList object with the different design matrix. So, using your parameterization I get > topTags(glmLRT(fit, contrast = c(1,-1,-1,1))) symbol egids logFC logCPM LR PValue FDR 319317 Snhg11 319317 -8.012675 8.499244 333.5330 1.634236e-74 2.088554e-70 21938 Tnfrsf1b 21938 6.895792 4.632327 268.2860 2.682053e-60 1.713832e-56 16365 Irg1 16365 7.709183 4.697518 248.7724 4.809099e-56 2.048676e-52 18133 Nov 18133 6.798803 6.007596 237.2266 1.582884e-53 5.057313e-50 13058 Cybb 13058 7.395509 5.371628 233.3577 1.104324e-52 2.822652e-49 20302 Ccl3 20302 7.776920 4.644823 220.7970 6.060828e-50 1.290956e-46 22671 Rnf112 22671 -6.945110 6.711583 217.2831 3.540102e-49 6.463215e-46 13733 Emr1 13733 6.855894 5.131320 210.6611 9.854565e-48 1.574267e-44 20266 Scn1b 20266 -5.007392 7.965721 206.7374 7.074398e-47 1.004565e-43 110834 Chrna3 110834 6.344460 4.341593 206.1794 9.363937e-47 1.196711e-43 and then re-parameterizing > treat <- factor(rep(c(1,2,1,2), c(5,3,1,3))) > type <- factor(rep(c(1,2,1,2), c(3,2,4,3))) > design2 <- model.matrix(~treat*type) > dgelst2 <- estimateGLMCommonDisp(DGEList(counts=cnts, genes=tx.filt), design2) Calculating library sizes from column totals. > dgelst2 <- estimateGLMTrendedDisp(dgelst2, design2) > dgelst2 <- estimateGLMTagwiseDisp(dgelst2, design2) > fit2 <- glmFit(dgelst2, design2) > topTags(glmLRT(fit2, coef=4)) Coefficient: treat2:type2 symbol egids logFC logCPM LR PValue FDR 319317 Snhg11 319317 -7.869265 8.428033 388.4536 1.796893e-86 2.296429e-82 13058 Cybb 13058 7.824446 5.392286 319.0685 2.310988e-71 1.476721e-67 16365 Irg1 16365 7.773614 4.714316 249.1532 3.972389e-56 1.692238e-52 13733 Emr1 13733 7.444357 5.144043 246.3510 1.621694e-55 4.197947e-52 21938 Tnfrsf1b 21938 6.915307 4.651340 246.3258 1.642389e-55 4.197947e-52 20266 Scn1b 20266 -4.879057 7.909679 238.6288 7.828831e-54 1.667541e-50 18133 Nov 18133 6.871500 6.026197 237.6275 1.294259e-53 2.362947e-50 20302 Ccl3 20302 7.784824 4.663569 209.2621 1.990055e-47 3.179113e-44 100039795 Ildr2 100039795 4.813985 6.396443 199.6276 2.518196e-45 3.575838e-42 110834 Chrna3 110834 6.382514 4.361077 196.0642 1.509239e-44 1.928808e-41 So you can see that the likelihood ratios are the same - the only difference here is likely due to the fact that we are estimating the dispersions on different coefficients (e.g., the other coefficients in the model), so the denominator of the statistic will change, resulting in slightly different p-values and ordering. But all in all, very similar results. Best, Jim >> or you could do >> >> state<- factor(rep(1:2, each = 8)) >> treat<- factor(rep(1:2, each=4, times=2)) >> >> design<- model.matrix(~treat*state) >> >> and then >> >> glmLRT(fit, coef=4) >> >> will test the interaction, and coefficients 2 and 3 test the main effects for treatment and state, as you have done below. > When I use coefficients 2 and 3 in the above model, I get different results from when I compare A and B (apo control to apo treatment) and C and D (sym control to sym treatment) in my original model. Should they be the same? I have posted my most recent session below using topTags to demonstrate the differing results. I ran my model first and then ran your recommended models, followed by your model with coefficients 2 and 3. Hopefully this all makes sense? > Thanks so much, > Morgan > >> library(edgeR) >> #reading the counts from a file >> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig") >> head (x) > AC3 AC4 AC5 AC7 AT1 AT2 > Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55 > Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2 > Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961 > Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8 > Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 > Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470 > AT4 AT5 SC1 SC2 SC3 SC5 > Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49 > Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2 > Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788 > Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5 > Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 > Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194 > ST1 ST2 ST3 ST5 > Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82 > Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3 > Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488 > Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8 > Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 > Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221 > > #MY MODEL > >> #assign samples to groups, A=AC, B=AT, C=SC, D=ST >> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D ","D","D","D")) >> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size >> y<- DGEList (counts = x, group = group) > Calculating library sizes from column totals. >> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case) >> keep<- rowSums (cpm(y)>1)>= 4 >> y2<- y[keep,] >> dim(y2) > [1] 17739 16 >> y2$samples$lib.size<- colSums(y2$counts) >> # Normalize for sample-specific effects >> y3<- calcNormFactors (y2) >> y3$samples > group lib.size norm.factors > AC3 A 11095093 1.0237872 > AC4 A 5604841 0.9912734 > AC5 A 7517691 0.9824405 > AC7 A 5918039 0.9919439 > AT1 B 6004830 0.9609672 > AT2 B 7692419 0.9543765 > AT4 B 5197815 0.9575609 > AT5 B 7759281 0.9643070 > SC1 C 3542875 1.0693587 > SC2 C 6720709 1.0488297 > SC3 C 3881774 1.0022528 > SC5 C 5109584 1.0429925 > ST1 D 10429666 0.9815467 > ST2 D 9662842 1.0103866 > ST3 D 8267838 1.0198095 > ST5 D 10698815 1.0069064 >> design<- model.matrix(~0+group, data=y3$samples) >> colnames(design)<- c("A", "B", "C", "D") >> design > A B C D > AC3 1 0 0 0 > AC4 1 0 0 0 > AC5 1 0 0 0 > AC7 1 0 0 0 > AT1 0 1 0 0 > AT2 0 1 0 0 > AT4 0 1 0 0 > AT5 0 1 0 0 > SC1 0 0 1 0 > SC2 0 0 1 0 > SC3 0 0 1 0 > SC5 0 0 1 0 > ST1 0 0 0 1 > ST2 0 0 0 1 > ST3 0 0 0 1 > ST5 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$group > [1] "contr.treatment" > >> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE) > Disp = 0.02061 , BCV = 0.1436 >> #estimate gene-wise dispersion >> y5<- estimateGLMTrendedDisp(y4, design) > Loading required package: splines >> y6<- estimateGLMTagwiseDisp(y5,design) >> fit<- glmFit(y6, design) >> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design) >> #pairwise comparison of B and A >> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"]) >> #table of top DE tags >> topTags(lrt.BvsA) > Coefficient: -1*A 1*B > logFC logCPM LR > Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 4.506598 8.226827 461.5764 > Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 4.620617 7.708649 442.6438 > Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 4.281604 4.374277 339.8684 > Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 2.661941 7.444760 280.1675 > Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840 2.227577 6.807869 256.4828 > Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 2.451201 9.549261 252.1390 > Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099 2.119708 7.827086 246.6841 > Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 2.080116 6.409451 246.1767 > Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963 2.124257 8.493526 244.0451 > Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818 3.282939 6.540550 239.7896 > PValue FDR > Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.181936e-102 3.870537e-98 > Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 2.877748e-98 2.552418e-94 > Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 6.816117e-76 4.030370e-72 > Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 6.903808e-63 3.061666e-59 > Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840 1.002773e-57 3.557638e-54 > Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 8.874275e-57 2.623679e-53 > Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099 1.371971e-55 3.476771e-52 > Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 1.770015e-55 3.924786e-52 > Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963 5.160772e-55 1.017188e-51 > Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818 4.371068e-54 7.753838e-51 >> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"]) >> topTags(lrt.DvsC) > Coefficient: -1*C 1*D > logFC logCPM LR > Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.477714 8.226827 157.20112 > Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425 -1.986510 5.205879 131.37771 > Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676 -1.451476 6.950002 103.49253 > Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096 -1.574898 6.554798 88.60841 > Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 -2.204284 3.908595 87.97649 > Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455 2.060784 3.603245 80.94718 > Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 2.280971 4.374277 80.10507 > Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588 -2.016184 5.268778 80.01864 > Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889 -2.035882 3.040181 76.14863 > Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 -1.538410 5.854172 72.23494 > PValue FDR > Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 4.625963e-36 8.205996e-32 > Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425 2.047045e-30 1.815627e-26 > Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676 2.613781e-24 1.545529e-20 > Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096 4.812378e-21 2.134169e-17 > Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 6.623716e-21 2.349962e-17 > Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455 2.318323e-19 6.854121e-16 > Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 3.550203e-19 8.224107e-16 > Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588 3.708938e-19 8.224107e-16 > Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889 2.630983e-18 5.185667e-15 > Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 1.910427e-17 3.388906e-14 > > #JIM'S 1ST MODEL >> #interactions effect to test if Sym Treat diff from Apo Treat >> lrt.interaction<- glmLRT(fit, contrast = c(1,-1,-1,1)) >> topTags(lrt.interaction) > Coefficient: 1*A -1*B -1*C 1*D > logFC logCPM LR > Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 -4.134843 7.708649 220.54182 > Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 -1.899300 7.444760 76.74320 > Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 -1.867935 6.863793 71.58929 > Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 -1.547852 6.409451 71.51138 > Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460 -1.489774 6.525675 64.83901 > Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213 -1.546709 7.548864 61.65883 > Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 -1.360292 8.389845 59.29519 > Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 -1.606681 9.549261 59.26803 > Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295 -1.274685 7.759520 58.79090 > Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 -2.028884 8.226827 57.76618 > PValue FDR > Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 6.889670e-50 1.222159e-45 > Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 1.946973e-18 1.726868e-14 > Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 2.649905e-17 1.222497e-13 > Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 2.756632e-17 1.222497e-13 > Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460 8.127420e-16 2.883446e-12 > Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213 4.084321e-15 1.207530e-11 > Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 1.357077e-14 3.050973e-11 > Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 1.375939e-14 3.050973e-11 > Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295 1.753507e-14 3.456161e-11 > Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.952016e-14 5.236581e-11 > > #ALTERNATIVE > >> state<- factor(rep(1:2, each = 8)) >> treat<- factor(rep(1:2, each=4, times=2)) >> design<- model.matrix(~treat*state) >> design > (Intercept) treat2 state2 treat2:state2 > 1 1 0 0 0 > 2 1 0 0 0 > 3 1 0 0 0 > 4 1 0 0 0 > 5 1 1 0 0 > 6 1 1 0 0 > 7 1 1 0 0 > 8 1 1 0 0 > 9 1 0 1 0 > 10 1 0 1 0 > 11 1 0 1 0 > 12 1 0 1 0 > 13 1 1 1 1 > 14 1 1 1 1 > 15 1 1 1 1 > 16 1 1 1 1 > attr(,"assign") > [1] 0 1 2 3 > attr(,"contrasts") > attr(,"contrasts")$treat > [1] "contr.treatment" > > attr(,"contrasts")$state > [1] "contr.treatment" > >> lrt.interaction2<- glmLRT(fit, coef=4) >> topTags(lrt.interaction2) > Coefficient: D > logFC logCPM LR > Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 -23.55939 -0.05067457 1620.221 > Locus_52010_Transcript_1/1_Confidence_0.000_Length_731 -22.84920 -0.18609176 1955.056 > Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 -22.37722 -0.88447008 1923.027 > Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 -22.19649 -0.63260884 3087.708 > Locus_124155_Transcript_1/1_Confidence_1.000_Length_648 -22.02352 -0.80276576 2491.409 > Locus_1722_Transcript_1/1_Confidence_0.000_Length_731 -21.87997 -0.75239480 3251.146 > Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.87383 -0.23507267 1632.186 > Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357 -21.62135 -0.13684358 1905.960 > Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 -21.40488 -0.62850673 2271.751 > Locus_47566_Transcript_1/1_Confidence_0.000_Length_992 -21.40385 -0.62656773 2492.533 > PValue FDR > Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 0 0 > Locus_52010_Transcript_1/1_Confidence_0.000_Length_731 0 0 > Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 0 0 > Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 0 0 > Locus_124155_Transcript_1/1_Confidence_1.000_Length_648 0 0 > Locus_1722_Transcript_1/1_Confidence_0.000_Length_731 0 0 > Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 0 0 > Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357 0 0 > Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 0 0 > Locus_47566_Transcript_1/1_Confidence_0.000_Length_992 0 0 > > #JIM'S 1ST MODEL: SAME AS C AND D COMPARISON (SYM CONTROL TO SYM TREATMENT)? >> lrt.3<- glmLRT(fit, coef=3) >> topTags(lrt.3) > Coefficient: C > logFC logCPM LR > Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 -23.20368 -0.63260884 2954.522 > Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 -22.24145 -0.05067457 1529.706 > Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 -22.23039 -0.57979449 2739.489 > Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 -22.22741 -0.62850673 2193.659 > Locus_34242_Transcript_1/1_Confidence_0.000_Length_290 -21.91166 -0.09022093 2767.658 > Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 -21.91140 -0.57058617 1714.034 > Locus_50361_Transcript_1/1_Confidence_0.000_Length_899 -21.90846 -0.07570223 3706.103 > Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 -21.90643 -0.74190749 2140.909 > Locus_8741_Transcript_2/2_Confidence_1.000_Length_380 -21.65265 -0.07156613 1494.994 > Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.45320 -0.23507267 1558.686 > PValue FDR > Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 0 0 > Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 0 0 > Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 0 0 > Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 0 0 > Locus_34242_Transcript_1/1_Confidence_0.000_Length_290 0 0 > Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 0 0 > Locus_50361_Transcript_1/1_Confidence_0.000_Length_899 0 0 > Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 0 0 > Locus_8741_Transcript_2/2_Confidence_1.000_Length_380 0 0 > Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 0 0 > > #JIM'S 1ST MODEL: SAME AS A AND B COMPARISON (APO CONTROL AND APO TREATMENT)? >> lrt.2<- glmLRT(fit, coef=2) >> topTags(lrt.2) > Coefficient: B > logFC logCPM LR > Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 -27.77711 6.430503 3886.142 > Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422 -27.77711 3.945365 2525.716 > Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333 -27.77711 3.041338 2328.365 > Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222 -27.77711 6.168745 2973.233 > Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091 -27.77711 4.657887 2296.824 > Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 -27.77711 3.300734 2269.282 > Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211 -27.77711 2.771320 2360.788 > Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406 -27.77711 4.930170 3599.471 > Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 -27.77711 2.112580 1802.150 > Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925 -27.77711 6.014430 2939.592 > PValue FDR > Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 0 0 > Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422 0 0 > Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333 0 0 > Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222 0 0 > Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091 0 0 > Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 0 0 > Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211 0 0 > Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406 0 0 > Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 0 0 > Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925 0 0 >> Best, >> >> Jim >> >> >>> I?m using the 3.0.8 version of EdgeR and the 2.15.2 version of R. I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment. >>> >>> Thanks so much, >>> >>> Morgan >>> >>> Morgan Mouchka >>> PhD Candidate >>> E231 Corson Hall >>> Dept. Ecology and Evolutionary Biology >>> Cornell University >>> mep74 at cornell.edu >>> http://www.eeb.cornell.edu/mouchka >>> >>>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig") >>>> head (x) >>> AC3 AC4 AC5 AC7 AT1 AT2 >>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55 >>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2 >>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961 >>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8 >>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 >>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470 >>> AT4 AT5 SC1 SC2 SC3 SC5 >>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49 >>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2 >>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788 >>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5 >>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 >>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194 >>> ST1 ST2 ST3 ST5 >>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82 >>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3 >>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488 >>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8 >>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 >>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221 >>>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST >>>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C", "D","D","D","D")) >>>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size >>>> y<- DGEList (counts = x, group = group) >>> Calculating library sizes from column totals. >>>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case) >>>> keep<- rowSums (cpm(y)>1)>= 4 >>>> y2<- y[keep,] >>>> dim(y2) >>> [1] 17739 16 >>>> y2$samples$lib.size<- colSums(y2$counts) >>>> # Normalize for sample-specific effects >>>> y3<- calcNormFactors (y2) >>>> y3$samples >>> group lib.size norm.factors >>> AC3 A 11095093 1.0237872 >>> AC4 A 5604841 0.9912734 >>> AC5 A 7517691 0.9824405 >>> AC7 A 5918039 0.9919439 >>> AT1 B 6004830 0.9609672 >>> AT2 B 7692419 0.9543765 >>> AT4 B 5197815 0.9575609 >>> AT5 B 7759281 0.9643070 >>> SC1 C 3542875 1.0693587 >>> SC2 C 6720709 1.0488297 >>> SC3 C 3881774 1.0022528 >>> SC5 C 5109584 1.0429925 >>> ST1 D 10429666 0.9815467 >>> ST2 D 9662842 1.0103866 >>> ST3 D 8267838 1.0198095 >>> ST5 D 10698815 1.0069064 >>>> #plot biological coefficient of variation (BVC)between samples >>>> plotMDS(y3) >>>> #check levels >>>> levels (y3$samples$group) >>> [1] "A" "B" "C" "D" >>>> #design matrix to describe treatment conditions >>>> design<- model.matrix(~0+group, data=y3$samples) >>>> colnames(design)<- c("A", "B", "C", "D") >>>> design >>> A B C D >>> AC3 1 0 0 0 >>> AC4 1 0 0 0 >>> AC5 1 0 0 0 >>> AC7 1 0 0 0 >>> AT1 0 1 0 0 >>> AT2 0 1 0 0 >>> AT4 0 1 0 0 >>> AT5 0 1 0 0 >>> SC1 0 0 1 0 >>> SC2 0 0 1 0 >>> SC3 0 0 1 0 >>> SC5 0 0 1 0 >>> ST1 0 0 0 1 >>> ST2 0 0 0 1 >>> ST3 0 0 0 1 >>> ST5 0 0 0 1 >>> attr(,"assign") >>> [1] 1 1 1 1 >>> attr(,"contrasts") >>> attr(,"contrasts")$group >>> [1] "contr.treatment" >>> >>>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE) >>> Disp = 0.02061 , BCV = 0.1436 >>>> #estimate gene-wise dispersion >>>> y5<- estimateGLMTrendedDisp(y4, design) >>>> y6<- estimateGLMTagwiseDisp(y5,design) >>>> plotBCV(y6) >>>> fit<- glmFit(y6, design) >>>> # make contrasts >>>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design) >>>> #pairwise comparison of B and A (apo treatmetn and apo control) >>>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"]) >>>> # summary of model fitting funtions >>>> summary(de<-decideTestsDGE(lrt.BvsA)) >>> [,1] >>> -1 562 >>> 0 16091 >>> 1 1086 >>>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"]) >>>> summary(de<-decideTestsDGE(lrt.DvsC)) >>> [,1] >>> -1 496 >>> 0 16710 >>> 1 533 >>> [[alternative HTML version deleted]] >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> > Morgan Mouchka > PhD Candidate > E231 Corson Hall > Dept. Ecology and Evolutionary Biology > Cornell University > mep74 at cornell.edu > http://www.eeb.cornell.edu/mouchka > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.9 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Jim, Thank you for your patience and your willingness to demonstrate with my data. You are absolutely right, I did not refit the model from the beginning, which in hindsight seems like a rather obvious thing that needed to been done. Thanks again, Morgan On Mar 27, 2013, at 10:39 AM, James W. MacDonald wrote: > Hi Morgan, > > On 3/26/2013 4:40 PM, Morgan Mouchka wrote: >> Hi Jim, >> >> On Mar 25, 2013, at 4:01 PM, James W. MacDonald wrote: >> >>> Hi Morgan, >>> >>> On 3/25/2013 3:15 PM, Morgan Mouchka wrote: >>>> Hello all, >>>> >>>> Im currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual. >>>> >>>> >>>> More specifically, I have two questions. The first is whether I have chosen the correct method for analyzing my multi-factorial experiment. I have 2 factors, state (apo, sym) and treatment (control, treatment). Im interested if there is a differential response due to the treatment that varies by state. In other words, do apos respond to the treatment in the same way that syms respond? >>>> >>>> >>>> To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups. I have pasted my session output and preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question. >>>> >>>> Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other? For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state? It seems that this could be some sort of interaction term, but Im uncertain as to the best way to set this up. >>> This is definitely an interaction term. How you set it up is dependent on what parameterization makes the most sense to you. Following your method below, you want >>> >>> glmLRT(fit, contrast = c(1,-1,-1,1)) >> I'm a bit confused, but wouldn't this model be combining the controls (apo and sym) and comparing it to the treatments (apo and sym) in an additive fashion? When I run this model, I get different results than when I run your second model below. Are they suppose to provide the same output or did I misunderstand your post? (Certainly likely, I'm very new to GLM and edgeR). > > This contrast is the same as (Sym trt - Sym ctrl) - (Apo trt - Apo ctrl), which tests to see if the difference between treatment and control varies according to state. This is the same as the coefficient treat2:state2 in the alternative parameterization below. > > But two things here. First, I had a brain cramp in my earlier email to you. The correct interpretations of the coefficients are > > Apo ctrl > Apo trt - Apo ctrl > Sym ctrl - Apo ctrl > (Sym trt - Sym ctrl) - (Apo trt - Apo ctrl) > > So coef = 3 doesn't give you Sym trt - Sym ctrl. It is more difficult to get that contrast from this parameterization, so sticking with what you already had is the easiest way to go. > > Second, I don't see in your code below where you re-fit the model using the alternative parameterization, so I tried this with some data I have in hand. Note that you have to go all the way back to the beginning, recreating the dgeList object with the different design matrix. So, using your parameterization I get > > > topTags(glmLRT(fit, contrast = c(1,-1,-1,1))) > > symbol egids logFC logCPM LR PValue FDR > 319317 Snhg11 319317 -8.012675 8.499244 333.5330 1.634236e-74 2.088554e-70 > 21938 Tnfrsf1b 21938 6.895792 4.632327 268.2860 2.682053e-60 1.713832e-56 > 16365 Irg1 16365 7.709183 4.697518 248.7724 4.809099e-56 2.048676e-52 > 18133 Nov 18133 6.798803 6.007596 237.2266 1.582884e-53 5.057313e-50 > 13058 Cybb 13058 7.395509 5.371628 233.3577 1.104324e-52 2.822652e-49 > 20302 Ccl3 20302 7.776920 4.644823 220.7970 6.060828e-50 1.290956e-46 > 22671 Rnf112 22671 -6.945110 6.711583 217.2831 3.540102e-49 6.463215e-46 > 13733 Emr1 13733 6.855894 5.131320 210.6611 9.854565e-48 1.574267e-44 > 20266 Scn1b 20266 -5.007392 7.965721 206.7374 7.074398e-47 1.004565e-43 > 110834 Chrna3 110834 6.344460 4.341593 206.1794 9.363937e-47 1.196711e-43 > > and then re-parameterizing > > > treat <- factor(rep(c(1,2,1,2), c(5,3,1,3))) > > type <- factor(rep(c(1,2,1,2), c(3,2,4,3))) > > design2 <- model.matrix(~treat*type) > > dgelst2 <- estimateGLMCommonDisp(DGEList(counts=cnts, genes=tx.filt), design2) > Calculating library sizes from column totals. > > dgelst2 <- estimateGLMTrendedDisp(dgelst2, design2) > > dgelst2 <- estimateGLMTagwiseDisp(dgelst2, design2) > > fit2 <- glmFit(dgelst2, design2) > > topTags(glmLRT(fit2, coef=4)) > Coefficient: treat2:type2 > symbol egids logFC logCPM LR PValue FDR > 319317 Snhg11 319317 -7.869265 8.428033 388.4536 1.796893e-86 2.296429e-82 > 13058 Cybb 13058 7.824446 5.392286 319.0685 2.310988e-71 1.476721e-67 > 16365 Irg1 16365 7.773614 4.714316 249.1532 3.972389e-56 1.692238e-52 > 13733 Emr1 13733 7.444357 5.144043 246.3510 1.621694e-55 4.197947e-52 > 21938 Tnfrsf1b 21938 6.915307 4.651340 246.3258 1.642389e-55 4.197947e-52 > 20266 Scn1b 20266 -4.879057 7.909679 238.6288 7.828831e-54 1.667541e-50 > 18133 Nov 18133 6.871500 6.026197 237.6275 1.294259e-53 2.362947e-50 > 20302 Ccl3 20302 7.784824 4.663569 209.2621 1.990055e-47 3.179113e-44 > 100039795 Ildr2 100039795 4.813985 6.396443 199.6276 2.518196e-45 3.575838e-42 > 110834 Chrna3 110834 6.382514 4.361077 196.0642 1.509239e-44 1.928808e-41 > > So you can see that the likelihood ratios are the same - the only difference here is likely due to the fact that we are estimating the dispersions on different coefficients (e.g., the other coefficients in the model), so the denominator of the statistic will change, resulting in slightly different p-values and ordering. But all in all, very similar results. > > Best, > > Jim > > >>> or you could do >>> >>> state<- factor(rep(1:2, each = 8)) >>> treat<- factor(rep(1:2, each=4, times=2)) >>> >>> design<- model.matrix(~treat*state) >>> >>> and then >>> >>> glmLRT(fit, coef=4) >>> >>> will test the interaction, and coefficients 2 and 3 test the main effects for treatment and state, as you have done below. >> When I use coefficients 2 and 3 in the above model, I get different results from when I compare A and B (apo control to apo treatment) and C and D (sym control to sym treatment) in my original model. Should they be the same? I have posted my most recent session below using topTags to demonstrate the differing results. I ran my model first and then ran your recommended models, followed by your model with coefficients 2 and 3. Hopefully this all makes sense? >> Thanks so much, >> Morgan >> >>> library(edgeR) >>> #reading the counts from a file >>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig") >>> head (x) >> AC3 AC4 AC5 AC7 AT1 AT2 >> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55 >> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2 >> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961 >> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8 >> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 >> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470 >> AT4 AT5 SC1 SC2 SC3 SC5 >> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49 >> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2 >> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788 >> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5 >> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 >> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194 >> ST1 ST2 ST3 ST5 >> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82 >> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3 >> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488 >> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8 >> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 >> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221 >> >> #MY MODEL >> >>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST >>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C"," D","D","D","D")) >>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size >>> y<- DGEList (counts = x, group = group) >> Calculating library sizes from column totals. >>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case) >>> keep<- rowSums (cpm(y)>1)>= 4 >>> y2<- y[keep,] >>> dim(y2) >> [1] 17739 16 >>> y2$samples$lib.size<- colSums(y2$counts) >>> # Normalize for sample-specific effects >>> y3<- calcNormFactors (y2) >>> y3$samples >> group lib.size norm.factors >> AC3 A 11095093 1.0237872 >> AC4 A 5604841 0.9912734 >> AC5 A 7517691 0.9824405 >> AC7 A 5918039 0.9919439 >> AT1 B 6004830 0.9609672 >> AT2 B 7692419 0.9543765 >> AT4 B 5197815 0.9575609 >> AT5 B 7759281 0.9643070 >> SC1 C 3542875 1.0693587 >> SC2 C 6720709 1.0488297 >> SC3 C 3881774 1.0022528 >> SC5 C 5109584 1.0429925 >> ST1 D 10429666 0.9815467 >> ST2 D 9662842 1.0103866 >> ST3 D 8267838 1.0198095 >> ST5 D 10698815 1.0069064 >>> design<- model.matrix(~0+group, data=y3$samples) >>> colnames(design)<- c("A", "B", "C", "D") >>> design >> A B C D >> AC3 1 0 0 0 >> AC4 1 0 0 0 >> AC5 1 0 0 0 >> AC7 1 0 0 0 >> AT1 0 1 0 0 >> AT2 0 1 0 0 >> AT4 0 1 0 0 >> AT5 0 1 0 0 >> SC1 0 0 1 0 >> SC2 0 0 1 0 >> SC3 0 0 1 0 >> SC5 0 0 1 0 >> ST1 0 0 0 1 >> ST2 0 0 0 1 >> ST3 0 0 0 1 >> ST5 0 0 0 1 >> attr(,"assign") >> [1] 1 1 1 1 >> attr(,"contrasts") >> attr(,"contrasts")$group >> [1] "contr.treatment" >> >>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE) >> Disp = 0.02061 , BCV = 0.1436 >>> #estimate gene-wise dispersion >>> y5<- estimateGLMTrendedDisp(y4, design) >> Loading required package: splines >>> y6<- estimateGLMTagwiseDisp(y5,design) >>> fit<- glmFit(y6, design) >>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design) >>> #pairwise comparison of B and A >>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"]) >>> #table of top DE tags >>> topTags(lrt.BvsA) >> Coefficient: -1*A 1*B >> logFC logCPM LR >> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 4.506598 8.226827 461.5764 >> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 4.620617 7.708649 442.6438 >> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 4.281604 4.374277 339.8684 >> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 2.661941 7.444760 280.1675 >> Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840 2.227577 6.807869 256.4828 >> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 2.451201 9.549261 252.1390 >> Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099 2.119708 7.827086 246.6841 >> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 2.080116 6.409451 246.1767 >> Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963 2.124257 8.493526 244.0451 >> Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818 3.282939 6.540550 239.7896 >> PValue FDR >> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.181936e-102 3.870537e-98 >> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 2.877748e-98 2.552418e-94 >> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 6.816117e-76 4.030370e-72 >> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 6.903808e-63 3.061666e-59 >> Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840 1.002773e-57 3.557638e-54 >> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 8.874275e-57 2.623679e-53 >> Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099 1.371971e-55 3.476771e-52 >> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 1.770015e-55 3.924786e-52 >> Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963 5.160772e-55 1.017188e-51 >> Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818 4.371068e-54 7.753838e-51 >>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"]) >>> topTags(lrt.DvsC) >> Coefficient: -1*C 1*D >> logFC logCPM LR >> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.477714 8.226827 157.20112 >> Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425 -1.986510 5.205879 131.37771 >> Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676 -1.451476 6.950002 103.49253 >> Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096 -1.574898 6.554798 88.60841 >> Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 -2.204284 3.908595 87.97649 >> Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455 2.060784 3.603245 80.94718 >> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 2.280971 4.374277 80.10507 >> Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588 -2.016184 5.268778 80.01864 >> Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889 -2.035882 3.040181 76.14863 >> Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 -1.538410 5.854172 72.23494 >> PValue FDR >> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 4.625963e-36 8.205996e-32 >> Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425 2.047045e-30 1.815627e-26 >> Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676 2.613781e-24 1.545529e-20 >> Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096 4.812378e-21 2.134169e-17 >> Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 6.623716e-21 2.349962e-17 >> Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455 2.318323e-19 6.854121e-16 >> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 3.550203e-19 8.224107e-16 >> Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588 3.708938e-19 8.224107e-16 >> Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889 2.630983e-18 5.185667e-15 >> Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 1.910427e-17 3.388906e-14 >> >> #JIM'S 1ST MODEL >>> #interactions effect to test if Sym Treat diff from Apo Treat >>> lrt.interaction<- glmLRT(fit, contrast = c(1,-1,-1,1)) >>> topTags(lrt.interaction) >> Coefficient: 1*A -1*B -1*C 1*D >> logFC logCPM LR >> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 -4.134843 7.708649 220.54182 >> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 -1.899300 7.444760 76.74320 >> Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 -1.867935 6.863793 71.58929 >> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 -1.547852 6.409451 71.51138 >> Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460 -1.489774 6.525675 64.83901 >> Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213 -1.546709 7.548864 61.65883 >> Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 -1.360292 8.389845 59.29519 >> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 -1.606681 9.549261 59.26803 >> Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295 -1.274685 7.759520 58.79090 >> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 -2.028884 8.226827 57.76618 >> PValue FDR >> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 6.889670e-50 1.222159e-45 >> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 1.946973e-18 1.726868e-14 >> Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 2.649905e-17 1.222497e-13 >> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 2.756632e-17 1.222497e-13 >> Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460 8.127420e-16 2.883446e-12 >> Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213 4.084321e-15 1.207530e-11 >> Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 1.357077e-14 3.050973e-11 >> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 1.375939e-14 3.050973e-11 >> Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295 1.753507e-14 3.456161e-11 >> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.952016e-14 5.236581e-11 >> >> #ALTERNATIVE >> >>> state<- factor(rep(1:2, each = 8)) >>> treat<- factor(rep(1:2, each=4, times=2)) >>> design<- model.matrix(~treat*state) >>> design >> (Intercept) treat2 state2 treat2:state2 >> 1 1 0 0 0 >> 2 1 0 0 0 >> 3 1 0 0 0 >> 4 1 0 0 0 >> 5 1 1 0 0 >> 6 1 1 0 0 >> 7 1 1 0 0 >> 8 1 1 0 0 >> 9 1 0 1 0 >> 10 1 0 1 0 >> 11 1 0 1 0 >> 12 1 0 1 0 >> 13 1 1 1 1 >> 14 1 1 1 1 >> 15 1 1 1 1 >> 16 1 1 1 1 >> attr(,"assign") >> [1] 0 1 2 3 >> attr(,"contrasts") >> attr(,"contrasts")$treat >> [1] "contr.treatment" >> >> attr(,"contrasts")$state >> [1] "contr.treatment" >> >>> lrt.interaction2<- glmLRT(fit, coef=4) >>> topTags(lrt.interaction2) >> Coefficient: D >> logFC logCPM LR >> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 -23.55939 -0.05067457 1620.221 >> Locus_52010_Transcript_1/1_Confidence_0.000_Length_731 -22.84920 -0.18609176 1955.056 >> Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 -22.37722 -0.88447008 1923.027 >> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 -22.19649 -0.63260884 3087.708 >> Locus_124155_Transcript_1/1_Confidence_1.000_Length_648 -22.02352 -0.80276576 2491.409 >> Locus_1722_Transcript_1/1_Confidence_0.000_Length_731 -21.87997 -0.75239480 3251.146 >> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.87383 -0.23507267 1632.186 >> Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357 -21.62135 -0.13684358 1905.960 >> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 -21.40488 -0.62850673 2271.751 >> Locus_47566_Transcript_1/1_Confidence_0.000_Length_992 -21.40385 -0.62656773 2492.533 >> PValue FDR >> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 0 0 >> Locus_52010_Transcript_1/1_Confidence_0.000_Length_731 0 0 >> Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 0 0 >> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 0 0 >> Locus_124155_Transcript_1/1_Confidence_1.000_Length_648 0 0 >> Locus_1722_Transcript_1/1_Confidence_0.000_Length_731 0 0 >> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 0 0 >> Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357 0 0 >> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 0 0 >> Locus_47566_Transcript_1/1_Confidence_0.000_Length_992 0 0 >> >> #JIM'S 1ST MODEL: SAME AS C AND D COMPARISON (SYM CONTROL TO SYM TREATMENT)? >>> lrt.3<- glmLRT(fit, coef=3) >>> topTags(lrt.3) >> Coefficient: C >> logFC logCPM LR >> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 -23.20368 -0.63260884 2954.522 >> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 -22.24145 -0.05067457 1529.706 >> Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 -22.23039 -0.57979449 2739.489 >> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 -22.22741 -0.62850673 2193.659 >> Locus_34242_Transcript_1/1_Confidence_0.000_Length_290 -21.91166 -0.09022093 2767.658 >> Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 -21.91140 -0.57058617 1714.034 >> Locus_50361_Transcript_1/1_Confidence_0.000_Length_899 -21.90846 -0.07570223 3706.103 >> Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 -21.90643 -0.74190749 2140.909 >> Locus_8741_Transcript_2/2_Confidence_1.000_Length_380 -21.65265 -0.07156613 1494.994 >> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.45320 -0.23507267 1558.686 >> PValue FDR >> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 0 0 >> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 0 0 >> Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 0 0 >> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 0 0 >> Locus_34242_Transcript_1/1_Confidence_0.000_Length_290 0 0 >> Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 0 0 >> Locus_50361_Transcript_1/1_Confidence_0.000_Length_899 0 0 >> Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 0 0 >> Locus_8741_Transcript_2/2_Confidence_1.000_Length_380 0 0 >> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 0 0 >> >> #JIM'S 1ST MODEL: SAME AS A AND B COMPARISON (APO CONTROL AND APO TREATMENT)? >>> lrt.2<- glmLRT(fit, coef=2) >>> topTags(lrt.2) >> Coefficient: B >> logFC logCPM LR >> Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 -27.77711 6.430503 3886.142 >> Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422 -27.77711 3.945365 2525.716 >> Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333 -27.77711 3.041338 2328.365 >> Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222 -27.77711 6.168745 2973.233 >> Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091 -27.77711 4.657887 2296.824 >> Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 -27.77711 3.300734 2269.282 >> Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211 -27.77711 2.771320 2360.788 >> Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406 -27.77711 4.930170 3599.471 >> Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 -27.77711 2.112580 1802.150 >> Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925 -27.77711 6.014430 2939.592 >> PValue FDR >> Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 0 0 >> Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422 0 0 >> Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333 0 0 >> Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222 0 0 >> Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091 0 0 >> Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 0 0 >> Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211 0 0 >> Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406 0 0 >> Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 0 0 >> Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925 0 0 >>> Best, >>> >>> Jim >>> >>> >>>> Im using the 3.0.8 version of EdgeR and the 2.15.2 version of R. I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment. >>>> >>>> Thanks so much, >>>> >>>> Morgan >>>> >>>> Morgan Mouchka >>>> PhD Candidate >>>> E231 Corson Hall >>>> Dept. Ecology and Evolutionary Biology >>>> Cornell University >>>> mep74@cornell.edu >>>> http://www.eeb.cornell.edu/mouchka >>>> >>>>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig") >>>>> head (x) >>>> AC3 AC4 AC5 AC7 AT1 AT2 >>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55 >>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2 >>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961 >>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8 >>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 >>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470 >>>> AT4 AT5 SC1 SC2 SC3 SC5 >>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49 >>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2 >>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788 >>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5 >>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0 >>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194 >>>> ST1 ST2 ST3 ST5 >>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82 >>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3 >>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488 >>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8 >>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 >>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221 >>>>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST >>>>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C" ,"D","D","D","D")) >>>>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size >>>>> y<- DGEList (counts = x, group = group) >>>> Calculating library sizes from column totals. >>>>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case) >>>>> keep<- rowSums (cpm(y)>1)>= 4 >>>>> y2<- y[keep,] >>>>> dim(y2) >>>> [1] 17739 16 >>>>> y2$samples$lib.size<- colSums(y2$counts) >>>>> # Normalize for sample-specific effects >>>>> y3<- calcNormFactors (y2) >>>>> y3$samples >>>> group lib.size norm.factors >>>> AC3 A 11095093 1.0237872 >>>> AC4 A 5604841 0.9912734 >>>> AC5 A 7517691 0.9824405 >>>> AC7 A 5918039 0.9919439 >>>> AT1 B 6004830 0.9609672 >>>> AT2 B 7692419 0.9543765 >>>> AT4 B 5197815 0.9575609 >>>> AT5 B 7759281 0.9643070 >>>> SC1 C 3542875 1.0693587 >>>> SC2 C 6720709 1.0488297 >>>> SC3 C 3881774 1.0022528 >>>> SC5 C 5109584 1.0429925 >>>> ST1 D 10429666 0.9815467 >>>> ST2 D 9662842 1.0103866 >>>> ST3 D 8267838 1.0198095 >>>> ST5 D 10698815 1.0069064 >>>>> #plot biological coefficient of variation (BVC)between samples >>>>> plotMDS(y3) >>>>> #check levels >>>>> levels (y3$samples$group) >>>> [1] "A" "B" "C" "D" >>>>> #design matrix to describe treatment conditions >>>>> design<- model.matrix(~0+group, data=y3$samples) >>>>> colnames(design)<- c("A", "B", "C", "D") >>>>> design >>>> A B C D >>>> AC3 1 0 0 0 >>>> AC4 1 0 0 0 >>>> AC5 1 0 0 0 >>>> AC7 1 0 0 0 >>>> AT1 0 1 0 0 >>>> AT2 0 1 0 0 >>>> AT4 0 1 0 0 >>>> AT5 0 1 0 0 >>>> SC1 0 0 1 0 >>>> SC2 0 0 1 0 >>>> SC3 0 0 1 0 >>>> SC5 0 0 1 0 >>>> ST1 0 0 0 1 >>>> ST2 0 0 0 1 >>>> ST3 0 0 0 1 >>>> ST5 0 0 0 1 >>>> attr(,"assign") >>>> [1] 1 1 1 1 >>>> attr(,"contrasts") >>>> attr(,"contrasts")$group >>>> [1] "contr.treatment" >>>> >>>>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE) >>>> Disp = 0.02061 , BCV = 0.1436 >>>>> #estimate gene-wise dispersion >>>>> y5<- estimateGLMTrendedDisp(y4, design) >>>>> y6<- estimateGLMTagwiseDisp(y5,design) >>>>> plotBCV(y6) >>>>> fit<- glmFit(y6, design) >>>>> # make contrasts >>>>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design) >>>>> #pairwise comparison of B and A (apo treatmetn and apo control) >>>>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"]) >>>>> # summary of model fitting funtions >>>>> summary(de<-decideTestsDGE(lrt.BvsA)) >>>> [,1] >>>> -1 562 >>>> 0 16091 >>>> 1 1086 >>>>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"]) >>>>> summary(de<-decideTestsDGE(lrt.DvsC)) >>>> [,1] >>>> -1 496 >>>> 0 16710 >>>> 1 533 >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >> Morgan Mouchka >> PhD Candidate >> E231 Corson Hall >> Dept. Ecology and Evolutionary Biology >> Cornell University >> mep74@cornell.edu >> http://www.eeb.cornell.edu/mouchka >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > Morgan Mouchka PhD Candidate E231 Corson Hall Dept. Ecology and Evolutionary Biology Cornell University mep74@cornell.edu http://www.eeb.cornell.edu/mouchka [[alternative HTML version deleted]]

ADD REPLY • link 12.9 years ago Morgan Mouchka ▴ 30

Login before adding your answer.