Question: edgeR design matrix and contrast questions
0
gravatar for Findley Finseth
6.0 years ago by
Findley Finseth40 wrote:
Hello all -- I am using edgeR to analyze RNASeq data. Thank you so much for this software and the clear and straightforward user guide. I have a question regarding contrasts using the glm approach for a paired experimental design. I have three tissues (PG, Liv, and Test) from six subjects (1-6). One of my goals is to get a set of genes that are "enriched" for each tissue. To this end, I would like to compare each tissue to the average of the other two (e.g., PG - (liver + testes)/2), while accounting for the fact that my experiment is paired by subject. I understand how to build an appropriate design matrix and call contrasts to either compare each tissue to the average of the other two (e.g., edgeR section 3.4.3-4) or perform pairwise comparisons between tissues while including subject (e.g., edgeR section 3.4.1) . However, I start struggling when I try to combine these goals and compare one tissue to the average of the other two while pairing by subject. I have also considered the multi-level approach (e.g., limma section 8.7) but am not doing both independent and paired comparisons, so did not think that was an appropriate avenue. I am pasting my session output with preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question. On a slightly different note, I am getting the following error when using topTags (also on session output below): Error in abs(object$table$logFC) : Non-numeric argument to mathematical function I noticed there was a previous post in December about similarly passing a single contrast to a glmLRT and retrieving the same error, which was corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has more to do with how I have set up/called my contrasts, but thought I would mention it. Thank you in advance, Findley Finseth PhD Candidate Dept Ecology and Evolutionary Biology Cornell University Ithaca, NY 14850 > library(edgeR) Loading required package: limma > > # making a matrix of factors called "targets", Tissue = tissue, Ind2 = subject > targets <-readTargets() > targets X Tissue Ind2 1 JQ_112_Liv_TG Liv 1 2 JQ_122-2_PG_A PG 2 3 JQ_179_Liv_AC Liv 3 4 JQ_191_Test_C Test 4 5 JQ_255_PG_AGT PG 5 6 JQ_107_Liv_CC Liv 6 7 JQ_107_PG_AGT PG 6 8 JQ_122-2_Test Test 2 9 JQ_191_Liv_AT Liv 4 10 JQ_107_Test_G Test 6 11 JQ_112_PG_CGA PG 1 12 JQ_112_Test_A Test 1 13 JQ_122-2_Liv_ Liv 2 14 JQ_191_PG_AGT PG 4 15 JQ_179_PG_AGT PG 3 16 JQ_179_Test_A Test 3 17 JQ_255_Liv_GA Liv 5 18 JQ_255_Test_G Test 5 > > # importing raw data > rawdata <- read.delim("controlonly_practiceEdgeR.txt") > head(rawdata) Gene sample2 sample6 sample7 sample9 sample10 sample12 sample13 sample17 1 comp326924_c0_seq1 23 7 3 6 8 16 5 6 2 comp434050_c0_seq1 2 0 0 26 0 0 0 34 3 comp28996_c0_seq1 344 161 191 284 144 354 114 338 4 comp1083897_c0_seq1 1 4 1 0 1 2 0 0 5 comp544783_c0_seq1 3 0 0 4 0 0 0 11 6 comp654539_c0_seq1 0 0 0 9 0 1 0 11 sample20 sample22 sample24 sample25 sample29 sample30 sample36 sample37 sample39 1 6 20 4 8 10 2 1 4 32 2 0 49 0 10 0 0 0 19 1 3 154 815 196 163 252 51 74 168 352 4 0 1 4 0 0 2 0 0 4 5 1 19 0 2 2 0 0 3 0 6 1 4 0 5 0 0 0 3 0 sample40 1 2 2 18 3 182 4 0 5 2 6 3 > > # setting groups equal to tissue differences > group <- targets$Tissue > group <- as.factor(group) > group [1] Liv PG Liv Test PG Liv PG Test Liv Test PG Test Liv PG PG Test Liv [18] Test Levels: Liv PG Test > > > # making my DGE list > y <- DGEList(counts=rawdata[,2:19],genes=rawdata[,1],group=group) Calculating library sizes from column totals. > > # filtering out lowly expressed genes; since the smallest group size is six, we keeping genes with at least one count per million in at least six samples > keep <- rowSums(cpm(y)>1) >= 6 > y <- y[keep,] > dim(y) # retained 30,140 genes [1] 30140 18 > > > #recomputing the library sizes: > y$samples$lib.size <- colSums(y$counts) > > #calculating normalization factors > y <- calcNormFactors(y) > y$samples # looks good group lib.size norm.factors sample2 Liv 13003017 1.2319091 sample6 PG 12388919 0.7524641 sample7 Liv 8045685 0.6490458 sample9 Test 9164420 1.5566202 sample10 PG 15292051 0.5119163 sample12 Liv 13042773 0.8596694 sample13 PG 11862830 0.6419963 sample17 Test 9472129 1.6400614 sample20 Liv 6514347 0.8206096 sample22 Test 25926605 1.4096284 sample24 PG 12334446 0.7888070 sample25 Test 6283137 1.6230035 sample29 Liv 19970487 0.8060990 sample30 PG 6159314 0.7825644 sample36 PG 3897369 0.9081739 sample37 Test 5778563 1.6550333 sample39 Liv 15619497 0.9618765 sample40 Test 5076345 1.7061590 > > > > # setting subject equal to individuals to reflect fact that samples are paired > subject <- factor(targets$Ind2) > > ############### building design matrix; could use advice here regarding appropriateness########## > design4 <- model.matrix(~subject + group) > design4 (Intercept) subject2 subject3 subject4 subject5 subject6 groupPG groupTest 1 1 0 0 0 0 0 0 0 2 1 1 0 0 0 0 1 0 3 1 0 1 0 0 0 0 0 4 1 0 0 1 0 0 0 1 5 1 0 0 0 1 0 1 0 6 1 0 0 0 0 1 0 0 7 1 0 0 0 0 1 1 0 8 1 1 0 0 0 0 0 1 9 1 0 0 1 0 0 0 0 10 1 0 0 0 0 1 0 1 11 1 0 0 0 0 0 1 0 12 1 0 0 0 0 0 0 1 13 1 1 0 0 0 0 0 0 14 1 0 0 1 0 0 1 0 15 1 0 1 0 0 0 1 0 16 1 0 1 0 0 0 0 1 17 1 0 0 0 1 0 0 0 18 1 0 0 0 1 0 0 1 attr(,"assign") [1] 0 1 1 1 1 1 2 2 attr(,"contrasts") attr(,"contrasts")$subject [1] "contr.treatment" attr(,"contrasts")$group [1] "contr.treatment" > > # estimating dispersions > y4 <- estimateGLMCommonDisp(y,design4) > y4 <- estimateGLMTrendedDisp(y4,design4) Loading required package: splines > y4 <- estimateGLMTagwiseDisp(y4,design4) > > # building contrasts > > ##############This is where things start to get a bit sticky for me; I am making some preliminary contrasts below, but am unsure of whether I've treated the intercept appropriately. I think it refers to groupLiv, though I have some uncertainty about this, given that subject is the first term in the model########## > > #########ultimately, I would like to compare (PG - (Liv+Test)/2) and so on for each tissue type########### > > my.contrast <- makeContrasts( + Liv=(Intercept)-(groupPG+groupTest)/2, + PG=groupPG-((Intercept)+groupTest)/2, + Test=groupTest-(groupPG+(Intercept))/2, + levels=design4) Warning message: In makeContrasts(Liv = (Intercept) - (groupPG + groupTest)/2, PG = groupPG - : Renaming (Intercept) to Intercept > > > # fitting a glm > fit <- glmFit(y4,design4) > > > # likelihood ratio tests > > lrt.Liv <- glmLRT(fit, my.contrast[,"Liv"]) > lrt.PG <- glmLRT(fit, my.contrast[,"PG"]) > lrt.Testes <- glmLRT(fit, my.contrast[,"Test"]) > > #finding top tags; > topTags(lrt.Liv) Error in abs(object$table$logFC) : Non-numeric argument to mathematical function > topTagslrt.PG) Error in abs(object$table$logFC) : Non-numeric argument to mathematical function > topTags(lrt.Testes) Error in abs(object$table$logFC) : Non-numeric argument to mathematical function > > > #### Note: I am also getting an error after using topTags about passing a non-numeric argument to a mathermatical function. I noticed there was a previous post about similarly passing a single contrast to a glmLRT and retrieving the same error, which was corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has more to do with how I have set up/called my contrasts, but thought I would mention it. > > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.0.8 limma_3.14.4 loaded via a namespace (and not attached): [1] tools_2.15.2
rnaseq normalization edger • 1.4k views
ADD COMMENTlink modified 6.0 years ago by Gordon Smyth36k • written 6.0 years ago by Findley Finseth40
Answer: edgeR design matrix and contrast questions
0
gravatar for Ryan C. Thompson
6.0 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.2k wrote:
I would recommend creating a design matrix with no intercept term and putting the group factor first in the formula: ~0 + group + subject. This will yield one coefficient for each tissue. Also, if you use sum-to- zero contrasts for subject, then I believe the tissue coefficients are directly interpretable as fitted cpm values On Mar 5, 2013 11:16 PM, "Findley Finseth" <findleyransler@gmail.com> wrote: > Hello all -- > > > I am using edgeR to analyze RNASeq data. Thank you so much for this > software and the clear and straightforward user guide. I have a question > regarding contrasts using the glm approach for a paired experimental design. > > I have three tissues (PG, Liv, and Test) from six subjects (1-6). One of > my goals is to get a set of genes that are "enriched" for each tissue. To > this end, I would like to compare each tissue to the average of the other > two (e.g., PG - (liver + testes)/2), while accounting for the fact that my > experiment is paired by subject. I understand how to build an appropriate > design matrix and call contrasts to either compare each tissue to the > average of the other two (e.g., edgeR section 3.4.3-4) or perform pairwise > comparisons between tissues while including subject (e.g., edgeR section > 3.4.1) . However, I start struggling when I try to combine these goals and > compare one tissue to the average of the other two while pairing by > subject. I have also considered the multi-level approach (e.g., limma > section 8.7) but am not doing both independent and paired comparisons, so > did not think that was an appropriate avenue. > > I am pasting my session output with preliminary code below. Specifically, > I could use advice on whether 1) my design matrix and 2) the way I have > called my contrasts are appropriate for my question. > > > On a slightly different note, I am getting the following error when using > topTags (also on session output below): > > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > I noticed there was a previous post in December about similarly passing a > single contrast to a glmLRT and retrieving the same error, which was > corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has > more to do with how I have set up/called my contrasts, but thought I would > mention it. > > > > > Thank you in advance, > > Findley Finseth > > > PhD Candidate > > Dept Ecology and Evolutionary Biology > Cornell University > Ithaca, NY 14850 > > > > > > > library(edgeR) > Loading required package: limma > > > > # making a matrix of factors called "targets", Tissue = tissue, Ind2 = > subject > > targets <-readTargets() > > targets > X Tissue Ind2 > 1 JQ_112_Liv_TG Liv 1 > 2 JQ_122-2_PG_A PG 2 > 3 JQ_179_Liv_AC Liv 3 > 4 JQ_191_Test_C Test 4 > 5 JQ_255_PG_AGT PG 5 > 6 JQ_107_Liv_CC Liv 6 > 7 JQ_107_PG_AGT PG 6 > 8 JQ_122-2_Test Test 2 > 9 JQ_191_Liv_AT Liv 4 > 10 JQ_107_Test_G Test 6 > 11 JQ_112_PG_CGA PG 1 > 12 JQ_112_Test_A Test 1 > 13 JQ_122-2_Liv_ Liv 2 > 14 JQ_191_PG_AGT PG 4 > 15 JQ_179_PG_AGT PG 3 > 16 JQ_179_Test_A Test 3 > 17 JQ_255_Liv_GA Liv 5 > 18 JQ_255_Test_G Test 5 > > > > # importing raw data > > rawdata <- read.delim("controlonly_practiceEdgeR.txt") > > head(rawdata) > Gene sample2 sample6 sample7 sample9 sample10 sample12 > sample13 sample17 > 1 comp326924_c0_seq1 23 7 3 6 8 16 > 5 6 > 2 comp434050_c0_seq1 2 0 0 26 0 0 > 0 34 > 3 comp28996_c0_seq1 344 161 191 284 144 354 > 114 338 > 4 comp1083897_c0_seq1 1 4 1 0 1 2 > 0 0 > 5 comp544783_c0_seq1 3 0 0 4 0 0 > 0 11 > 6 comp654539_c0_seq1 0 0 0 9 0 1 > 0 11 > sample20 sample22 sample24 sample25 sample29 sample30 sample36 sample37 > sample39 > 1 6 20 4 8 10 2 1 4 > 32 > 2 0 49 0 10 0 0 0 19 > 1 > 3 154 815 196 163 252 51 74 168 > 352 > 4 0 1 4 0 0 2 0 0 > 4 > 5 1 19 0 2 2 0 0 3 > 0 > 6 1 4 0 5 0 0 0 3 > 0 > sample40 > 1 2 > 2 18 > 3 182 > 4 0 > 5 2 > 6 3 > > > > # setting groups equal to tissue differences > > group <- targets$Tissue > > group <- as.factor(group) > > group > [1] Liv PG Liv Test PG Liv PG Test Liv Test PG Test Liv PG > PG Test Liv > [18] Test > Levels: Liv PG Test > > > > > > # making my DGE list > > y <- DGEList(counts=rawdata[,2:19],genes=rawdata[,1],group=group) > Calculating library sizes from column totals. > > > > # filtering out lowly expressed genes; since the smallest group size is > six, we keeping genes with at least one count per million in at least six > samples > > keep <- rowSums(cpm(y)>1) >= 6 > > y <- y[keep,] > > dim(y) # retained 30,140 genes > [1] 30140 18 > > > > > > #recomputing the library sizes: > > y$samples$lib.size <- colSums(y$counts) > > > > #calculating normalization factors > > y <- calcNormFactors(y) > > y$samples # looks good > group lib.size norm.factors > sample2 Liv 13003017 1.2319091 > sample6 PG 12388919 0.7524641 > sample7 Liv 8045685 0.6490458 > sample9 Test 9164420 1.5566202 > sample10 PG 15292051 0.5119163 > sample12 Liv 13042773 0.8596694 > sample13 PG 11862830 0.6419963 > sample17 Test 9472129 1.6400614 > sample20 Liv 6514347 0.8206096 > sample22 Test 25926605 1.4096284 > sample24 PG 12334446 0.7888070 > sample25 Test 6283137 1.6230035 > sample29 Liv 19970487 0.8060990 > sample30 PG 6159314 0.7825644 > sample36 PG 3897369 0.9081739 > sample37 Test 5778563 1.6550333 > sample39 Liv 15619497 0.9618765 > sample40 Test 5076345 1.7061590 > > > > > > > > # setting subject equal to individuals to reflect fact that samples are > paired > > subject <- factor(targets$Ind2) > > > > ############### building design matrix; could use advice here regarding > appropriateness########## > > design4 <- model.matrix(~subject + group) > > design4 > (Intercept) subject2 subject3 subject4 subject5 subject6 groupPG > groupTest > 1 1 0 0 0 0 0 0 > 0 > 2 1 1 0 0 0 0 1 > 0 > 3 1 0 1 0 0 0 0 > 0 > 4 1 0 0 1 0 0 0 > 1 > 5 1 0 0 0 1 0 1 > 0 > 6 1 0 0 0 0 1 0 > 0 > 7 1 0 0 0 0 1 1 > 0 > 8 1 1 0 0 0 0 0 > 1 > 9 1 0 0 1 0 0 0 > 0 > 10 1 0 0 0 0 1 0 > 1 > 11 1 0 0 0 0 0 1 > 0 > 12 1 0 0 0 0 0 0 > 1 > 13 1 1 0 0 0 0 0 > 0 > 14 1 0 0 1 0 0 1 > 0 > 15 1 0 1 0 0 0 1 > 0 > 16 1 0 1 0 0 0 0 > 1 > 17 1 0 0 0 1 0 0 > 0 > 18 1 0 0 0 1 0 0 > 1 > attr(,"assign") > [1] 0 1 1 1 1 1 2 2 > attr(,"contrasts") > attr(,"contrasts")$subject > [1] "contr.treatment" > > attr(,"contrasts")$group > [1] "contr.treatment" > > > > > # estimating dispersions > > y4 <- estimateGLMCommonDisp(y,design4) > > y4 <- estimateGLMTrendedDisp(y4,design4) > Loading required package: splines > > y4 <- estimateGLMTagwiseDisp(y4,design4) > > > > # building contrasts > > > > ##############This is where things start to get a bit sticky for me; I > am making some preliminary contrasts below, but am unsure of whether I've > treated the intercept appropriately. I think it refers to groupLiv, though > I have some uncertainty about this, given that subject is the first term in > the model########## > > > > #########ultimately, I would like to compare (PG - (Liv+Test)/2) and so > on for each tissue type########### > > > > my.contrast <- makeContrasts( > + Liv=(Intercept)-(groupPG+groupTest)/2, > + PG=groupPG-((Intercept)+groupTest)/2, > + Test=groupTest-(groupPG+(Intercept))/2, > + levels=design4) > Warning message: > In makeContrasts(Liv = (Intercept) - (groupPG + groupTest)/2, PG = groupPG > - : > Renaming (Intercept) to Intercept > > > > > > # fitting a glm > > fit <- glmFit(y4,design4) > > > > > > # likelihood ratio tests > > > > lrt.Liv <- glmLRT(fit, my.contrast[,"Liv"]) > > lrt.PG <- glmLRT(fit, my.contrast[,"PG"]) > > lrt.Testes <- glmLRT(fit, my.contrast[,"Test"]) > > > > #finding top tags; > > topTags(lrt.Liv) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > topTagslrt.PG) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > topTags(lrt.Testes) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > > > > > #### Note: I am also getting an error after using topTags about passing > a non-numeric argument to a mathermatical function. I noticed there was a > previous post about similarly passing a single contrast to a glmLRT and > retrieving the same error, which was corrected in edgeR_3.0.7. As I am > using 3.0.8, I am assuming the error has more to do with how I have set > up/called my contrasts, but thought I would mention it. > > > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] edgeR_3.0.8 limma_3.14.4 > > loaded via a namespace (and not attached): > [1] tools_2.15.2 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 6.0 years ago by Ryan C. Thompson7.2k
Hi Ryan -- Thanks so much for your helpful response. One more clarification - apologies, I am new to contrasts. I originally set up my matrix as you suggested, but was concerned that the order of the factors mattered in the formula (as I know it sometimes can). In other words, to properly reflect that my experiment is paired, I thought "subject" would need to be first (which is why I did my funky contrasts). As I understand from your response, this is not a concern and subject is fine to list second. Thanks again, Findley On Mar 6, 2013, at 10:15 AM, Ryan Thompson wrote: > I would recommend creating a design matrix with no intercept term and putting the group factor first in the formula: ~0 + group + subject. This will yield one coefficient for each tissue. Also, if you use sum-to-zero contrasts for subject, then I believe the tissue coefficients are directly interpretable as fitted cpm values > > On Mar 5, 2013 11:16 PM, "Findley Finseth" <findleyransler@gmail.com> wrote: > Hello all -- > > > I am using edgeR to analyze RNASeq data. Thank you so much for this software and the clear and straightforward user guide. I have a question regarding contrasts using the glm approach for a paired experimental design. > > I have three tissues (PG, Liv, and Test) from six subjects (1-6). One of my goals is to get a set of genes that are "enriched" for each tissue. To this end, I would like to compare each tissue to the average of the other two (e.g., PG - (liver + testes)/2), while accounting for the fact that my experiment is paired by subject. I understand how to build an appropriate design matrix and call contrasts to either compare each tissue to the average of the other two (e.g., edgeR section 3.4.3-4) or perform pairwise comparisons between tissues while including subject (e.g., edgeR section 3.4.1) . However, I start struggling when I try to combine these goals and compare one tissue to the average of the other two while pairing by subject. I have also considered the multi-level approach (e.g., limma section 8.7) but am not doing both independent and paired comparisons, so did not think that was an appropriate avenue. > > I am pasting my session output with preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question. > > > On a slightly different note, I am getting the following error when using topTags (also on session output below): > > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > I noticed there was a previous post in December about similarly passing a single contrast to a glmLRT and retrieving the same error, which was corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has more to do with how I have set up/called my contrasts, but thought I would mention it. > > > > > Thank you in advance, > > Findley Finseth > > > PhD Candidate > > Dept Ecology and Evolutionary Biology > Cornell University > Ithaca, NY 14850 > > > > > > > library(edgeR) > Loading required package: limma > > > > # making a matrix of factors called "targets", Tissue = tissue, Ind2 = subject > > targets <-readTargets() > > targets > X Tissue Ind2 > 1 JQ_112_Liv_TG Liv 1 > 2 JQ_122-2_PG_A PG 2 > 3 JQ_179_Liv_AC Liv 3 > 4 JQ_191_Test_C Test 4 > 5 JQ_255_PG_AGT PG 5 > 6 JQ_107_Liv_CC Liv 6 > 7 JQ_107_PG_AGT PG 6 > 8 JQ_122-2_Test Test 2 > 9 JQ_191_Liv_AT Liv 4 > 10 JQ_107_Test_G Test 6 > 11 JQ_112_PG_CGA PG 1 > 12 JQ_112_Test_A Test 1 > 13 JQ_122-2_Liv_ Liv 2 > 14 JQ_191_PG_AGT PG 4 > 15 JQ_179_PG_AGT PG 3 > 16 JQ_179_Test_A Test 3 > 17 JQ_255_Liv_GA Liv 5 > 18 JQ_255_Test_G Test 5 > > > > # importing raw data > > rawdata <- read.delim("controlonly_practiceEdgeR.txt") > > head(rawdata) > Gene sample2 sample6 sample7 sample9 sample10 sample12 sample13 sample17 > 1 comp326924_c0_seq1 23 7 3 6 8 16 5 6 > 2 comp434050_c0_seq1 2 0 0 26 0 0 0 34 > 3 comp28996_c0_seq1 344 161 191 284 144 354 114 338 > 4 comp1083897_c0_seq1 1 4 1 0 1 2 0 0 > 5 comp544783_c0_seq1 3 0 0 4 0 0 0 11 > 6 comp654539_c0_seq1 0 0 0 9 0 1 0 11 > sample20 sample22 sample24 sample25 sample29 sample30 sample36 sample37 sample39 > 1 6 20 4 8 10 2 1 4 32 > 2 0 49 0 10 0 0 0 19 1 > 3 154 815 196 163 252 51 74 168 352 > 4 0 1 4 0 0 2 0 0 4 > 5 1 19 0 2 2 0 0 3 0 > 6 1 4 0 5 0 0 0 3 0 > sample40 > 1 2 > 2 18 > 3 182 > 4 0 > 5 2 > 6 3 > > > > # setting groups equal to tissue differences > > group <- targets$Tissue > > group <- as.factor(group) > > group > [1] Liv PG Liv Test PG Liv PG Test Liv Test PG Test Liv PG PG Test Liv > [18] Test > Levels: Liv PG Test > > > > > > # making my DGE list > > y <- DGEList(counts=rawdata[,2:19],genes=rawdata[,1],group=group) > Calculating library sizes from column totals. > > > > # filtering out lowly expressed genes; since the smallest group size is six, we keeping genes with at least one count per million in at least six samples > > keep <- rowSums(cpm(y)>1) >= 6 > > y <- y[keep,] > > dim(y) # retained 30,140 genes > [1] 30140 18 > > > > > > #recomputing the library sizes: > > y$samples$lib.size <- colSums(y$counts) > > > > #calculating normalization factors > > y <- calcNormFactors(y) > > y$samples # looks good > group lib.size norm.factors > sample2 Liv 13003017 1.2319091 > sample6 PG 12388919 0.7524641 > sample7 Liv 8045685 0.6490458 > sample9 Test 9164420 1.5566202 > sample10 PG 15292051 0.5119163 > sample12 Liv 13042773 0.8596694 > sample13 PG 11862830 0.6419963 > sample17 Test 9472129 1.6400614 > sample20 Liv 6514347 0.8206096 > sample22 Test 25926605 1.4096284 > sample24 PG 12334446 0.7888070 > sample25 Test 6283137 1.6230035 > sample29 Liv 19970487 0.8060990 > sample30 PG 6159314 0.7825644 > sample36 PG 3897369 0.9081739 > sample37 Test 5778563 1.6550333 > sample39 Liv 15619497 0.9618765 > sample40 Test 5076345 1.7061590 > > > > > > > > # setting subject equal to individuals to reflect fact that samples are paired > > subject <- factor(targets$Ind2) > > > > ############### building design matrix; could use advice here regarding appropriateness########## > > design4 <- model.matrix(~subject + group) > > design4 > (Intercept) subject2 subject3 subject4 subject5 subject6 groupPG groupTest > 1 1 0 0 0 0 0 0 0 > 2 1 1 0 0 0 0 1 0 > 3 1 0 1 0 0 0 0 0 > 4 1 0 0 1 0 0 0 1 > 5 1 0 0 0 1 0 1 0 > 6 1 0 0 0 0 1 0 0 > 7 1 0 0 0 0 1 1 0 > 8 1 1 0 0 0 0 0 1 > 9 1 0 0 1 0 0 0 0 > 10 1 0 0 0 0 1 0 1 > 11 1 0 0 0 0 0 1 0 > 12 1 0 0 0 0 0 0 1 > 13 1 1 0 0 0 0 0 0 > 14 1 0 0 1 0 0 1 0 > 15 1 0 1 0 0 0 1 0 > 16 1 0 1 0 0 0 0 1 > 17 1 0 0 0 1 0 0 0 > 18 1 0 0 0 1 0 0 1 > attr(,"assign") > [1] 0 1 1 1 1 1 2 2 > attr(,"contrasts") > attr(,"contrasts")$subject > [1] "contr.treatment" > > attr(,"contrasts")$group > [1] "contr.treatment" > > > > > # estimating dispersions > > y4 <- estimateGLMCommonDisp(y,design4) > > y4 <- estimateGLMTrendedDisp(y4,design4) > Loading required package: splines > > y4 <- estimateGLMTagwiseDisp(y4,design4) > > > > # building contrasts > > > > ##############This is where things start to get a bit sticky for me; I am making some preliminary contrasts below, but am unsure of whether I've treated the intercept appropriately. I think it refers to groupLiv, though I have some uncertainty about this, given that subject is the first term in the model########## > > > > #########ultimately, I would like to compare (PG - (Liv+Test)/2) and so on for each tissue type########### > > > > my.contrast <- makeContrasts( > + Liv=(Intercept)-(groupPG+groupTest)/2, > + PG=groupPG-((Intercept)+groupTest)/2, > + Test=groupTest-(groupPG+(Intercept))/2, > + levels=design4) > Warning message: > In makeContrasts(Liv = (Intercept) - (groupPG + groupTest)/2, PG = groupPG - : > Renaming (Intercept) to Intercept > > > > > > # fitting a glm > > fit <- glmFit(y4,design4) > > > > > > # likelihood ratio tests > > > > lrt.Liv <- glmLRT(fit, my.contrast[,"Liv"]) > > lrt.PG <- glmLRT(fit, my.contrast[,"PG"]) > > lrt.Testes <- glmLRT(fit, my.contrast[,"Test"]) > > > > #finding top tags; > > topTags(lrt.Liv) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > topTagslrt.PG) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > topTags(lrt.Testes) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > > > > > #### Note: I am also getting an error after using topTags about passing a non-numeric argument to a mathermatical function. I noticed there was a previous post about similarly passing a single contrast to a glmLRT and retrieving the same error, which was corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has more to do with how I have set up/called my contrasts, but thought I would mention it. > > > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.0.8 limma_3.14.4 > > loaded via a namespace (and not attached): > [1] tools_2.15.2 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLYlink written 6.0 years ago by Findley Finseth40
Answer: edgeR design matrix and contrast questions
0
gravatar for Ryan C. Thompson
6.0 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.2k wrote:
As long as your design formula contains the same factors (or products of factors), it will produce an equivalent design. The only difference is in the parametrization. The only time the order matters is if you are using a no-intercept formula, in which case the first factor is coded directly into the design, while the remaining factors are coded as contrasts. In a design with an intercept, all factors are coded as contrasts, and the order of factors in the formula is irrelevant. On Wed 06 Mar 2013 10:37:12 AM PST, Findley Finseth wrote: > Hi Ryan -- > > Thanks so much for your helpful response. > > One more clarification - apologies, I am new to contrasts. I > originally set up my matrix as you suggested, but was concerned that > the order of the factors mattered in the formula (as I know it > sometimes can). In other words, to properly reflect that my > experiment is paired, I thought "subject" would need to be first > (which is why I did my funky contrasts). As I understand from your > response, this is not a concern and subject is fine to list second. > > Thanks again, > Findley > > > On Mar 6, 2013, at 10:15 AM, Ryan Thompson wrote: > >> I would recommend creating a design matrix with no intercept term and >> putting the group factor first in the formula: ~0 + group + subject. >> This will yield one coefficient for each tissue. Also, if you use >> sum-to-zero contrasts for subject, then I believe the tissue >> coefficients are directly interpretable as fitted cpm values >> >> On Mar 5, 2013 11:16 PM, "Findley Finseth" <findleyransler at="" gmail.com="">> <mailto:findleyransler at="" gmail.com="">> wrote: >> >> Hello all -- >> >> >> I am using edgeR to analyze RNASeq data. Thank you so much for >> this software and the clear and straightforward user guide. I >> have a question regarding contrasts using the glm approach for a >> paired experimental design. >> >> I have three tissues (PG, Liv, and Test) from six subjects (1-6). >> One of my goals is to get a set of genes that are "enriched" >> for each tissue. To this end, I would like to compare each >> tissue to the average of the other two (e.g., PG - (liver + >> testes)/2), while accounting for the fact that my experiment is >> paired by subject. I understand how to build an appropriate >> design matrix and call contrasts to either compare each tissue to >> the average of the other two (e.g., edgeR section 3.4.3-4) or >> perform pairwise comparisons between tissues while including >> subject (e.g., edgeR section 3.4.1) . However, I start >> struggling when I try to combine these goals and compare one >> tissue to the average of the other two while pairing by subject. >> I have also considered the multi-level approach (e.g., limma >> section 8.7) but am not doing both independent and paired >> comparisons, so did not think that was an appropriate avenue. >> >> I am pasting my session output with preliminary code below. >> Specifically, I could use advice on whether 1) my design matrix >> and 2) the way I have called my contrasts are appropriate for my >> question. >> >> >> On a slightly different note, I am getting the following error >> when using topTags (also on session output below): >> >> Error in abs(object$table$logFC) : >> Non-numeric argument to mathematical function >> >> I noticed there was a previous post in December about similarly >> passing a single contrast to a glmLRT and retrieving the same >> error, which was corrected in edgeR_3.0.7. As I am using 3.0.8, >> I am assuming the error has more to do with how I have set >> up/called my contrasts, but thought I would mention it. >> >> >> >> >> Thank you in advance, >> >> Findley Finseth >> >> >> PhD Candidate >> >> Dept Ecology and Evolutionary Biology >> Cornell University >> Ithaca, NY 14850 >> >> >> >> >> >> > library(edgeR) >> Loading required package: limma >> > >> > # making a matrix of factors called "targets", Tissue = tissue, >> Ind2 = subject >> > targets <-readTargets() >> > targets >> X Tissue Ind2 >> 1 JQ_112_Liv_TG Liv 1 >> 2 JQ_122-2_PG_A PG 2 >> 3 JQ_179_Liv_AC Liv 3 >> 4 JQ_191_Test_C Test 4 >> 5 JQ_255_PG_AGT PG 5 >> 6 JQ_107_Liv_CC Liv 6 >> 7 JQ_107_PG_AGT PG 6 >> 8 JQ_122-2_Test Test 2 >> 9 JQ_191_Liv_AT Liv 4 >> 10 JQ_107_Test_G Test 6 >> 11 JQ_112_PG_CGA PG 1 >> 12 JQ_112_Test_A Test 1 >> 13 JQ_122-2_Liv_ Liv 2 >> 14 JQ_191_PG_AGT PG 4 >> 15 JQ_179_PG_AGT PG 3 >> 16 JQ_179_Test_A Test 3 >> 17 JQ_255_Liv_GA Liv 5 >> 18 JQ_255_Test_G Test 5 >> > >> > # importing raw data >> > rawdata <- read.delim("controlonly_practiceEdgeR.txt") >> > head(rawdata) >> Gene sample2 sample6 sample7 sample9 sample10 >> sample12 sample13 sample17 >> 1 comp326924_c0_seq1 23 7 3 6 8 >> 16 5 6 >> 2 comp434050_c0_seq1 2 0 0 26 0 >> 0 0 34 >> 3 comp28996_c0_seq1 344 161 191 284 144 >> 354 114 338 >> 4 comp1083897_c0_seq1 1 4 1 0 1 >> 2 0 0 >> 5 comp544783_c0_seq1 3 0 0 4 0 >> 0 0 11 >> 6 comp654539_c0_seq1 0 0 0 9 0 >> 1 0 11 >> sample20 sample22 sample24 sample25 sample29 sample30 sample36 >> sample37 sample39 >> 1 6 20 4 8 10 2 1 >> 4 32 >> 2 0 49 0 10 0 0 0 >> 19 1 >> 3 154 815 196 163 252 51 74 >> 168 352 >> 4 0 1 4 0 0 2 0 >> 0 4 >> 5 1 19 0 2 2 0 0 >> 3 0 >> 6 1 4 0 5 0 0 0 >> 3 0 >> sample40 >> 1 2 >> 2 18 >> 3 182 >> 4 0 >> 5 2 >> 6 3 >> > >> > # setting groups equal to tissue differences >> > group <- targets$Tissue >> > group <- as.factor(group) >> > group >> [1] Liv PG Liv Test PG Liv PG Test Liv Test PG Test >> Liv PG PG Test Liv >> [18] Test >> Levels: Liv PG Test >> > >> > >> > # making my DGE list >> > y <- DGEList(counts=rawdata[,2:19],genes=rawdata[,1],group=group) >> Calculating library sizes from column totals. >> > >> > # filtering out lowly expressed genes; since the smallest group >> size is six, we keeping genes with at least one count per million >> in at least six samples >> > keep <- rowSums(cpm(y)>1) >= 6 >> > y <- y[keep,] >> > dim(y) # retained 30,140 genes >> [1] 30140 18 >> > >> > >> > #recomputing the library sizes: >> > y$samples$lib.size <- colSums(y$counts) >> > >> > #calculating normalization factors >> > y <- calcNormFactors(y) >> > y$samples # looks good >> group lib.size norm.factors >> sample2 Liv 13003017 1.2319091 >> sample6 PG 12388919 0.7524641 >> sample7 Liv 8045685 0.6490458 >> sample9 Test 9164420 1.5566202 >> sample10 PG 15292051 0.5119163 >> sample12 Liv 13042773 0.8596694 >> sample13 PG 11862830 0.6419963 >> sample17 Test 9472129 1.6400614 >> sample20 Liv 6514347 0.8206096 >> sample22 Test 25926605 1.4096284 >> sample24 PG 12334446 0.7888070 >> sample25 Test 6283137 1.6230035 >> sample29 Liv 19970487 0.8060990 >> sample30 PG 6159314 0.7825644 >> sample36 PG 3897369 0.9081739 >> sample37 Test 5778563 1.6550333 >> sample39 Liv 15619497 0.9618765 >> sample40 Test 5076345 1.7061590 >> > >> > >> > >> > # setting subject equal to individuals to reflect fact that >> samples are paired >> > subject <- factor(targets$Ind2) >> > >> > ############### building design matrix; could use advice here >> regarding appropriateness########## >> > design4 <- model.matrix(~subject + group) >> > design4 >> (Intercept) subject2 subject3 subject4 subject5 subject6 >> groupPG groupTest >> 1 1 0 0 0 0 0 >> 0 0 >> 2 1 1 0 0 0 0 >> 1 0 >> 3 1 0 1 0 0 0 >> 0 0 >> 4 1 0 0 1 0 0 >> 0 1 >> 5 1 0 0 0 1 0 >> 1 0 >> 6 1 0 0 0 0 1 >> 0 0 >> 7 1 0 0 0 0 1 >> 1 0 >> 8 1 1 0 0 0 0 >> 0 1 >> 9 1 0 0 1 0 0 >> 0 0 >> 10 1 0 0 0 0 1 >> 0 1 >> 11 1 0 0 0 0 0 >> 1 0 >> 12 1 0 0 0 0 0 >> 0 1 >> 13 1 1 0 0 0 0 >> 0 0 >> 14 1 0 0 1 0 0 >> 1 0 >> 15 1 0 1 0 0 0 >> 1 0 >> 16 1 0 1 0 0 0 >> 0 1 >> 17 1 0 0 0 1 0 >> 0 0 >> 18 1 0 0 0 1 0 >> 0 1 >> attr(,"assign") >> [1] 0 1 1 1 1 1 2 2 >> attr(,"contrasts") >> attr(,"contrasts")$subject >> [1] "contr.treatment" >> >> attr(,"contrasts")$group >> [1] "contr.treatment" >> >> > >> > # estimating dispersions >> > y4 <- estimateGLMCommonDisp(y,design4) >> > y4 <- estimateGLMTrendedDisp(y4,design4) >> Loading required package: splines >> > y4 <- estimateGLMTagwiseDisp(y4,design4) >> > >> > # building contrasts >> > >> > ##############This is where things start to get a bit sticky >> for me; I am making some preliminary contrasts below, but am >> unsure of whether I've treated the intercept appropriately. I >> think it refers to groupLiv, though I have some uncertainty about >> this, given that subject is the first term in the model########## >> > >> > #########ultimately, I would like to compare (PG - >> (Liv+Test)/2) and so on for each tissue type########### >> > >> > my.contrast <- makeContrasts( >> + Liv=(Intercept)-(groupPG+groupTest)/2, >> + PG=groupPG-((Intercept)+groupTest)/2, >> + Test=groupTest-(groupPG+(Intercept))/2, >> + levels=design4) >> Warning message: >> In makeContrasts(Liv = (Intercept) - (groupPG + groupTest)/2, PG >> = groupPG - : >> Renaming (Intercept) to Intercept >> > >> > >> > # fitting a glm >> > fit <- glmFit(y4,design4) >> > >> > >> > # likelihood ratio tests >> > >> > lrt.Liv <- glmLRT(fit, my.contrast[,"Liv"]) >> > lrt.PG <- glmLRT(fit, my.contrast[,"PG"]) >> > lrt.Testes <- glmLRT(fit, my.contrast[,"Test"]) >> > >> > #finding top tags; >> > topTags(lrt.Liv) >> Error in abs(object$table$logFC) : >> Non-numeric argument to mathematical function >> > topTagslrt.PG) >> Error in abs(object$table$logFC) : >> Non-numeric argument to mathematical function >> > topTags(lrt.Testes) >> Error in abs(object$table$logFC) : >> Non-numeric argument to mathematical function >> > >> > >> > #### Note: I am also getting an error after using topTags about >> passing a non-numeric argument to a mathermatical function. I >> noticed there was a previous post about similarly passing a >> single contrast to a glmLRT and retrieving the same error, which >> was corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming >> the error has more to do with how I have set up/called my >> contrasts, but thought I would mention it. >> > >> > sessionInfo() >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] edgeR_3.0.8 limma_3.14.4 >> >> loaded via a namespace (and not attached): >> [1] tools_2.15.2 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD COMMENTlink written 6.0 years ago by Ryan C. Thompson7.2k
Answer: edgeR design matrix and contrast questions
0
gravatar for Gordon Smyth
6.0 years ago by
Gordon Smyth36k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth36k wrote:
Dear Findley, The fact that you have paired samples actually doesn't change the way that contrasts are formed in edgeR. Assuming you use the glm code, the same contrast code works as if you had no pairing. The design matrix is fine: design4 <- model.matrix(~subject + group) however your contrasts are incorrect. One never needs to include the intercept term in a contrast. The coefficients of interest in your model are called groupPG and groupTest. Both of these are relative to the Liv group, so you should think of groupPG as representing PG-Liv and groupTest as representing Test-Liv. Hence to test Liv-(PG+Test)/2 you want the contrast -(groupPG-groupTest)/2. To test PG-(Liv+Test)/2 you want the contrast groupPG-groupLiv/2. To test Test-(PG+Liv)/2 you want the contrast groupTest-groupPG/2. Best wishes Gordon > Date: Tue, 5 Mar 2013 15:21:18 -0500 > From: Findley Finseth <findleyransler at="" gmail.com=""> > To: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR design matrix and contrast questions > > Hello all -- > > > I am using edgeR to analyze RNASeq data. Thank you so much for this > software and the clear and straightforward user guide. I have a > question regarding contrasts using the glm approach for a paired > experimental design. > > I have three tissues (PG, Liv, and Test) from six subjects (1-6). One > of my goals is to get a set of genes that are "enriched" for each > tissue. To this end, I would like to compare each tissue to the average > of the other two (e.g., PG - (liver + testes)/2), while accounting for > the fact that my experiment is paired by subject. I understand how to > build an appropriate design matrix and call contrasts to either compare > each tissue to the average of the other two (e.g., edgeR section > 3.4.3-4) or perform pairwise comparisons between tissues while including > subject (e.g., edgeR section 3.4.1) . However, I start struggling when > I try to combine these goals and compare one tissue to the average of > the other two while pairing by subject. I have also considered the > multi-level approach (e.g., limma section 8.7) but am not doing both > independent and paired comparisons, so did not think that was an > appropriate avenue. > > I am pasting my session output with preliminary code below. > Specifically, I could use advice on whether 1) my design matrix and 2) > the way I have called my contrasts are appropriate for my question. > > > On a slightly different note, I am getting the following error when > using topTags (also on session output below): > > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function > > I noticed there was a previous post in December about similarly passing > a single contrast to a glmLRT and retrieving the same error, which was > corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error > has more to do with how I have set up/called my contrasts, but thought I > would mention it. > > > > > Thank you in advance, > > Findley Finseth > > > PhD Candidate > > Dept Ecology and Evolutionary Biology > Cornell University > Ithaca, NY 14850 > > > > > >> library(edgeR) > Loading required package: limma >> >> # making a matrix of factors called "targets", Tissue = tissue, Ind2 = subject >> targets <-readTargets() >> targets > X Tissue Ind2 > 1 JQ_112_Liv_TG Liv 1 > 2 JQ_122-2_PG_A PG 2 > 3 JQ_179_Liv_AC Liv 3 > 4 JQ_191_Test_C Test 4 > 5 JQ_255_PG_AGT PG 5 > 6 JQ_107_Liv_CC Liv 6 > 7 JQ_107_PG_AGT PG 6 > 8 JQ_122-2_Test Test 2 > 9 JQ_191_Liv_AT Liv 4 > 10 JQ_107_Test_G Test 6 > 11 JQ_112_PG_CGA PG 1 > 12 JQ_112_Test_A Test 1 > 13 JQ_122-2_Liv_ Liv 2 > 14 JQ_191_PG_AGT PG 4 > 15 JQ_179_PG_AGT PG 3 > 16 JQ_179_Test_A Test 3 > 17 JQ_255_Liv_GA Liv 5 > 18 JQ_255_Test_G Test 5 >> >> # importing raw data >> rawdata <- read.delim("controlonly_practiceEdgeR.txt") >> head(rawdata) > Gene sample2 sample6 sample7 sample9 sample10 sample12 sample13 sample17 > 1 comp326924_c0_seq1 23 7 3 6 8 16 5 6 > 2 comp434050_c0_seq1 2 0 0 26 0 0 0 34 > 3 comp28996_c0_seq1 344 161 191 284 144 354 114 338 > 4 comp1083897_c0_seq1 1 4 1 0 1 2 0 0 > 5 comp544783_c0_seq1 3 0 0 4 0 0 0 11 > 6 comp654539_c0_seq1 0 0 0 9 0 1 0 11 > sample20 sample22 sample24 sample25 sample29 sample30 sample36 sample37 sample39 > 1 6 20 4 8 10 2 1 4 32 > 2 0 49 0 10 0 0 0 19 1 > 3 154 815 196 163 252 51 74 168 352 > 4 0 1 4 0 0 2 0 0 4 > 5 1 19 0 2 2 0 0 3 0 > 6 1 4 0 5 0 0 0 3 0 > sample40 > 1 2 > 2 18 > 3 182 > 4 0 > 5 2 > 6 3 >> >> # setting groups equal to tissue differences >> group <- targets$Tissue >> group <- as.factor(group) >> group > [1] Liv PG Liv Test PG Liv PG Test Liv Test PG Test Liv PG PG Test Liv > [18] Test > Levels: Liv PG Test >> >> >> # making my DGE list >> y <- DGEList(counts=rawdata[,2:19],genes=rawdata[,1],group=group) > Calculating library sizes from column totals. >> >> # filtering out lowly expressed genes; since the smallest group size is six, we keeping genes with at least one count per million in at least six samples >> keep <- rowSums(cpm(y)>1) >= 6 >> y <- y[keep,] >> dim(y) # retained 30,140 genes > [1] 30140 18 >> >> >> #recomputing the library sizes: >> y$samples$lib.size <- colSums(y$counts) >> >> #calculating normalization factors >> y <- calcNormFactors(y) >> y$samples # looks good > group lib.size norm.factors > sample2 Liv 13003017 1.2319091 > sample6 PG 12388919 0.7524641 > sample7 Liv 8045685 0.6490458 > sample9 Test 9164420 1.5566202 > sample10 PG 15292051 0.5119163 > sample12 Liv 13042773 0.8596694 > sample13 PG 11862830 0.6419963 > sample17 Test 9472129 1.6400614 > sample20 Liv 6514347 0.8206096 > sample22 Test 25926605 1.4096284 > sample24 PG 12334446 0.7888070 > sample25 Test 6283137 1.6230035 > sample29 Liv 19970487 0.8060990 > sample30 PG 6159314 0.7825644 > sample36 PG 3897369 0.9081739 > sample37 Test 5778563 1.6550333 > sample39 Liv 15619497 0.9618765 > sample40 Test 5076345 1.7061590 >> >> >> >> # setting subject equal to individuals to reflect fact that samples are paired >> subject <- factor(targets$Ind2) >> >> ############### building design matrix; could use advice here regarding appropriateness########## >> design4 <- model.matrix(~subject + group) >> design4 > (Intercept) subject2 subject3 subject4 subject5 subject6 groupPG groupTest > 1 1 0 0 0 0 0 0 0 > 2 1 1 0 0 0 0 1 0 > 3 1 0 1 0 0 0 0 0 > 4 1 0 0 1 0 0 0 1 > 5 1 0 0 0 1 0 1 0 > 6 1 0 0 0 0 1 0 0 > 7 1 0 0 0 0 1 1 0 > 8 1 1 0 0 0 0 0 1 > 9 1 0 0 1 0 0 0 0 > 10 1 0 0 0 0 1 0 1 > 11 1 0 0 0 0 0 1 0 > 12 1 0 0 0 0 0 0 1 > 13 1 1 0 0 0 0 0 0 > 14 1 0 0 1 0 0 1 0 > 15 1 0 1 0 0 0 1 0 > 16 1 0 1 0 0 0 0 1 > 17 1 0 0 0 1 0 0 0 > 18 1 0 0 0 1 0 0 1 > attr(,"assign") > [1] 0 1 1 1 1 1 2 2 > attr(,"contrasts") > attr(,"contrasts")$subject > [1] "contr.treatment" > > attr(,"contrasts")$group > [1] "contr.treatment" > >> >> # estimating dispersions >> y4 <- estimateGLMCommonDisp(y,design4) >> y4 <- estimateGLMTrendedDisp(y4,design4) > Loading required package: splines >> y4 <- estimateGLMTagwiseDisp(y4,design4) >> >> # building contrasts >> >> ##############This is where things start to get a bit sticky for me; I am making some preliminary contrasts below, but am unsure of whether I've treated the intercept appropriately. I think it refers to groupLiv, though I have some uncertainty about this, given that subject is the first term in the model########## >> >> #########ultimately, I would like to compare (PG - (Liv+Test)/2) and so on for each tissue type########### >> >> my.contrast <- makeContrasts( > + Liv=(Intercept)-(groupPG+groupTest)/2, > + PG=groupPG-((Intercept)+groupTest)/2, > + Test=groupTest-(groupPG+(Intercept))/2, > + levels=design4) > Warning message: > In makeContrasts(Liv = (Intercept) - (groupPG + groupTest)/2, PG = groupPG - : > Renaming (Intercept) to Intercept >> >> >> # fitting a glm >> fit <- glmFit(y4,design4) >> >> >> # likelihood ratio tests >> >> lrt.Liv <- glmLRT(fit, my.contrast[,"Liv"]) >> lrt.PG <- glmLRT(fit, my.contrast[,"PG"]) >> lrt.Testes <- glmLRT(fit, my.contrast[,"Test"]) >> >> #finding top tags; >> topTags(lrt.Liv) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function >> topTagslrt.PG) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function >> topTags(lrt.Testes) > Error in abs(object$table$logFC) : > Non-numeric argument to mathematical function >> >> >> #### Note: I am also getting an error after using topTags about passing a non-numeric argument to a mathermatical function. I noticed there was a previous post about similarly passing a single contrast to a glmLRT and retrieving the same error, which was corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has more to do with how I have set up/called my contrasts, but thought I would mention it. >> >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.0.8 limma_3.14.4 > > loaded via a namespace (and not attached): > [1] tools_2.15.2 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENTlink written 6.0 years ago by Gordon Smyth36k
I just noticed a typo: To test PG-(Liv+Test)/2 you need the contrast groupPG-groupTest/2, rather than what I wrote below. Gordon On Thu, 7 Mar 2013, Gordon K Smyth wrote: > Dear Findley, > > The fact that you have paired samples actually doesn't change the way that > contrasts are formed in edgeR. Assuming you use the glm code, the same > contrast code works as if you had no pairing. > > The design matrix is fine: > > design4 <- model.matrix(~subject + group) > > however your contrasts are incorrect. One never needs to include the > intercept term in a contrast. > > The coefficients of interest in your model are called groupPG and groupTest. > Both of these are relative to the Liv group, so you should think of groupPG > as representing PG-Liv and groupTest as representing Test-Liv. > > Hence to test Liv-(PG+Test)/2 you want the contrast -(groupPG- groupTest)/2. > To test PG-(Liv+Test)/2 you want the contrast groupPG-groupLiv/2. To test > Test-(PG+Liv)/2 you want the contrast groupTest-groupPG/2. > > Best wishes > Gordon > > >> Date: Tue, 5 Mar 2013 15:21:18 -0500 >> From: Findley Finseth <findleyransler at="" gmail.com=""> >> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >> Subject: [BioC] edgeR design matrix and contrast questions >> >> Hello all -- >> >> >> I am using edgeR to analyze RNASeq data. Thank you so much for this >> software and the clear and straightforward user guide. I have a question >> regarding contrasts using the glm approach for a paired experimental >> design. >> >> I have three tissues (PG, Liv, and Test) from six subjects (1-6). One of >> my goals is to get a set of genes that are "enriched" for each tissue. To >> this end, I would like to compare each tissue to the average of the other >> two (e.g., PG - (liver + testes)/2), while accounting for the fact that my >> experiment is paired by subject. I understand how to build an appropriate >> design matrix and call contrasts to either compare each tissue to the >> average of the other two (e.g., edgeR section 3.4.3-4) or perform pairwise >> comparisons between tissues while including subject (e.g., edgeR section >> 3.4.1) . However, I start struggling when I try to combine these goals and >> compare one tissue to the average of the other two while pairing by >> subject. I have also considered the multi-level approach (e.g., limma >> section 8.7) but am not doing both independent and paired comparisons, so >> did not think that was an appropriate avenue. >> >> I am pasting my session output with preliminary code below. Specifically, I >> could use advice on whether 1) my design matrix and 2) the way I have >> called my contrasts are appropriate for my question. >> >> >> On a slightly different note, I am getting the following error when using >> topTags (also on session output below): >> >> Error in abs(object$table$logFC) : >> Non-numeric argument to mathematical function >> >> I noticed there was a previous post in December about similarly passing a >> single contrast to a glmLRT and retrieving the same error, which was >> corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has >> more to do with how I have set up/called my contrasts, but thought I would >> mention it. >> >> >> >> >> Thank you in advance, >> >> Findley Finseth >> >> >> PhD Candidate >> >> Dept Ecology and Evolutionary Biology >> Cornell University >> Ithaca, NY 14850 >> >> >> >> >> >>> library(edgeR) >> Loading required package: limma >>> >>> # making a matrix of factors called "targets", Tissue = tissue, Ind2 = >>> subject >>> targets <-readTargets() >>> targets >> X Tissue Ind2 >> 1 JQ_112_Liv_TG Liv 1 >> 2 JQ_122-2_PG_A PG 2 >> 3 JQ_179_Liv_AC Liv 3 >> 4 JQ_191_Test_C Test 4 >> 5 JQ_255_PG_AGT PG 5 >> 6 JQ_107_Liv_CC Liv 6 >> 7 JQ_107_PG_AGT PG 6 >> 8 JQ_122-2_Test Test 2 >> 9 JQ_191_Liv_AT Liv 4 >> 10 JQ_107_Test_G Test 6 >> 11 JQ_112_PG_CGA PG 1 >> 12 JQ_112_Test_A Test 1 >> 13 JQ_122-2_Liv_ Liv 2 >> 14 JQ_191_PG_AGT PG 4 >> 15 JQ_179_PG_AGT PG 3 >> 16 JQ_179_Test_A Test 3 >> 17 JQ_255_Liv_GA Liv 5 >> 18 JQ_255_Test_G Test 5 >>> >>> # importing raw data >>> rawdata <- read.delim("controlonly_practiceEdgeR.txt") >>> head(rawdata) >> Gene sample2 sample6 sample7 sample9 sample10 sample12 >> sample13 sample17 >> 1 comp326924_c0_seq1 23 7 3 6 8 16 >> 5 6 >> 2 comp434050_c0_seq1 2 0 0 26 0 0 >> 0 34 >> 3 comp28996_c0_seq1 344 161 191 284 144 354 >> 114 338 >> 4 comp1083897_c0_seq1 1 4 1 0 1 2 >> 0 0 >> 5 comp544783_c0_seq1 3 0 0 4 0 0 >> 0 11 >> 6 comp654539_c0_seq1 0 0 0 9 0 1 >> 0 11 >> sample20 sample22 sample24 sample25 sample29 sample30 sample36 sample37 >> sample39 >> 1 6 20 4 8 10 2 1 4 >> 32 >> 2 0 49 0 10 0 0 0 19 >> 1 >> 3 154 815 196 163 252 51 74 168 >> 352 >> 4 0 1 4 0 0 2 0 0 >> 4 >> 5 1 19 0 2 2 0 0 3 >> 0 >> 6 1 4 0 5 0 0 0 3 >> 0 >> sample40 >> 1 2 >> 2 18 >> 3 182 >> 4 0 >> 5 2 >> 6 3 >>> >>> # setting groups equal to tissue differences >>> group <- targets$Tissue >>> group <- as.factor(group) >>> group >> [1] Liv PG Liv Test PG Liv PG Test Liv Test PG Test Liv PG >> PG Test Liv >> [18] Test >> Levels: Liv PG Test >>> >>> >>> # making my DGE list >>> y <- DGEList(counts=rawdata[,2:19],genes=rawdata[,1],group=group) >> Calculating library sizes from column totals. >>> >>> # filtering out lowly expressed genes; since the smallest group size is >>> six, we keeping genes with at least one count per million in at least six >>> samples >>> keep <- rowSums(cpm(y)>1) >= 6 >>> y <- y[keep,] >>> dim(y) # retained 30,140 genes >> [1] 30140 18 >>> >>> >>> #recomputing the library sizes: >>> y$samples$lib.size <- colSums(y$counts) >>> >>> #calculating normalization factors >>> y <- calcNormFactors(y) >>> y$samples # looks good >> group lib.size norm.factors >> sample2 Liv 13003017 1.2319091 >> sample6 PG 12388919 0.7524641 >> sample7 Liv 8045685 0.6490458 >> sample9 Test 9164420 1.5566202 >> sample10 PG 15292051 0.5119163 >> sample12 Liv 13042773 0.8596694 >> sample13 PG 11862830 0.6419963 >> sample17 Test 9472129 1.6400614 >> sample20 Liv 6514347 0.8206096 >> sample22 Test 25926605 1.4096284 >> sample24 PG 12334446 0.7888070 >> sample25 Test 6283137 1.6230035 >> sample29 Liv 19970487 0.8060990 >> sample30 PG 6159314 0.7825644 >> sample36 PG 3897369 0.9081739 >> sample37 Test 5778563 1.6550333 >> sample39 Liv 15619497 0.9618765 >> sample40 Test 5076345 1.7061590 >>> >>> >>> >>> # setting subject equal to individuals to reflect fact that samples are >>> paired >>> subject <- factor(targets$Ind2) >>> >>> ############### building design matrix; could use advice here regarding >>> appropriateness########## >>> design4 <- model.matrix(~subject + group) >>> design4 >> (Intercept) subject2 subject3 subject4 subject5 subject6 groupPG >> groupTest >> 1 1 0 0 0 0 0 0 >> 0 >> 2 1 1 0 0 0 0 1 >> 0 >> 3 1 0 1 0 0 0 0 >> 0 >> 4 1 0 0 1 0 0 0 >> 1 >> 5 1 0 0 0 1 0 1 >> 0 >> 6 1 0 0 0 0 1 0 >> 0 >> 7 1 0 0 0 0 1 1 >> 0 >> 8 1 1 0 0 0 0 0 >> 1 >> 9 1 0 0 1 0 0 0 >> 0 >> 10 1 0 0 0 0 1 0 >> 1 >> 11 1 0 0 0 0 0 1 >> 0 >> 12 1 0 0 0 0 0 0 >> 1 >> 13 1 1 0 0 0 0 0 >> 0 >> 14 1 0 0 1 0 0 1 >> 0 >> 15 1 0 1 0 0 0 1 >> 0 >> 16 1 0 1 0 0 0 0 >> 1 >> 17 1 0 0 0 1 0 0 >> 0 >> 18 1 0 0 0 1 0 0 >> 1 >> attr(,"assign") >> [1] 0 1 1 1 1 1 2 2 >> attr(,"contrasts") >> attr(,"contrasts")$subject >> [1] "contr.treatment" >> >> attr(,"contrasts")$group >> [1] "contr.treatment" >> >>> >>> # estimating dispersions >>> y4 <- estimateGLMCommonDisp(y,design4) >>> y4 <- estimateGLMTrendedDisp(y4,design4) >> Loading required package: splines >>> y4 <- estimateGLMTagwiseDisp(y4,design4) >>> >>> # building contrasts >>> >>> ##############This is where things start to get a bit sticky for me; I am >>> making some preliminary contrasts below, but am unsure of whether I've >>> treated the intercept appropriately. I think it refers to groupLiv, >>> though I have some uncertainty about this, given that subject is the first >>> term in the model########## >>> >>> #########ultimately, I would like to compare (PG - (Liv+Test)/2) and so >>> on for each tissue type########### >>> >>> my.contrast <- makeContrasts( >> + Liv=(Intercept)-(groupPG+groupTest)/2, >> + PG=groupPG-((Intercept)+groupTest)/2, >> + Test=groupTest-(groupPG+(Intercept))/2, >> + levels=design4) >> Warning message: >> In makeContrasts(Liv = (Intercept) - (groupPG + groupTest)/2, PG = groupPG >> - : >> Renaming (Intercept) to Intercept >>> >>> >>> # fitting a glm >>> fit <- glmFit(y4,design4) >>> >>> >>> # likelihood ratio tests >>> >>> lrt.Liv <- glmLRT(fit, my.contrast[,"Liv"]) >>> lrt.PG <- glmLRT(fit, my.contrast[,"PG"]) >>> lrt.Testes <- glmLRT(fit, my.contrast[,"Test"]) >>> >>> #finding top tags; >>> topTags(lrt.Liv) >> Error in abs(object$table$logFC) : >> Non-numeric argument to mathematical function >>> topTagslrt.PG) >> Error in abs(object$table$logFC) : >> Non-numeric argument to mathematical function >>> topTags(lrt.Testes) >> Error in abs(object$table$logFC) : >> Non-numeric argument to mathematical function >>> >>> >>> #### Note: I am also getting an error after using topTags about passing a >>> non-numeric argument to a mathermatical function. I noticed there was a >>> previous post about similarly passing a single contrast to a glmLRT and >>> retrieving the same error, which was corrected in edgeR_3.0.7. As I am >>> using 3.0.8, I am assuming the error has more to do with how I have set >>> up/called my contrasts, but thought I would mention it. >>> >>> sessionInfo() >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] edgeR_3.0.8 limma_3.14.4 >> >> loaded via a namespace (and not attached): >> [1] tools_2.15.2 >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLYlink written 6.0 years ago by Gordon Smyth36k
Hi Gordon and all - In case it helps others, I think there may be one other very minor typo. To test, Liv-(PG+Test)/2, I believe I need the contrast -(groupPG+groupTest)/2 rather than -(groupPG-groupTest)/2 as originally written. The original way was removing the Liv term altogether. In case anyone is interested, I ran analyses with my design matrix and contrasts set up as Gordon suggested here and as Ryan did earlier. As expected, both give the same suite of DE genes for each tissue type. Best regards, Findley Finseth On Wed, Mar 6, 2013 at 8:59 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > I just noticed a typo: To test PG-(Liv+Test)/2 you need the contrast > groupPG-groupTest/2, rather than what I wrote below. > > Gordon > > > On Thu, 7 Mar 2013, Gordon K Smyth wrote: > > Dear Findley, >> >> The fact that you have paired samples actually doesn't change the way >> that contrasts are formed in edgeR. Assuming you use the glm code, the >> same contrast code works as if you had no pairing. >> >> The design matrix is fine: >> >> design4 <- model.matrix(~subject + group) >> >> however your contrasts are incorrect. One never needs to include the >> intercept term in a contrast. >> >> The coefficients of interest in your model are called groupPG and >> groupTest. Both of these are relative to the Liv group, so you should think >> of groupPG as representing PG-Liv and groupTest as representing Test-Liv. >> >> Hence to test Liv-(PG+Test)/2 you want the contrast >> -(groupPG-groupTest)/2. To test PG-(Liv+Test)/2 you want the contrast >> groupPG-groupLiv/2. To test Test-(PG+Liv)/2 you want the contrast >> groupTest-groupPG/2. >> >> Best wishes >> Gordon >> >> >> Date: Tue, 5 Mar 2013 15:21:18 -0500 >>> From: Findley Finseth <findleyransler@gmail.com> >>> To: Bioconductor mailing list <bioconductor@r-project.org> >>> Subject: [BioC] edgeR design matrix and contrast questions >>> >>> Hello all -- >>> >>> >>> I am using edgeR to analyze RNASeq data. Thank you so much for this >>> software and the clear and straightforward user guide. I have a question >>> regarding contrasts using the glm approach for a paired experimental design. >>> >>> I have three tissues (PG, Liv, and Test) from six subjects (1-6). One >>> of my goals is to get a set of genes that are "enriched" for each tissue. >>> To this end, I would like to compare each tissue to the average of the >>> other two (e.g., PG - (liver + testes)/2), while accounting for the fact >>> that my experiment is paired by subject. I understand how to build an >>> appropriate design matrix and call contrasts to either compare each tissue >>> to the average of the other two (e.g., edgeR section 3.4.3-4) or perform >>> pairwise comparisons between tissues while including subject (e.g., edgeR >>> section 3.4.1) . However, I start struggling when I try to combine these >>> goals and compare one tissue to the average of the other two while pairing >>> by subject. I have also considered the multi-level approach (e.g., limma >>> section 8.7) but am not doing both independent and paired comparisons, so >>> did not think that was an appropriate avenue. >>> >>> I am pasting my session output with preliminary code below. >>> Specifically, I could use advice on whether 1) my design matrix and 2) the >>> way I have called my contrasts are appropriate for my question. >>> >>> >>> On a slightly different note, I am getting the following error when >>> using topTags (also on session output below): >>> >>> Error in abs(object$table$logFC) : >>> Non-numeric argument to mathematical function >>> >>> I noticed there was a previous post in December about similarly passing >>> a single contrast to a glmLRT and retrieving the same error, which was >>> corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has >>> more to do with how I have set up/called my contrasts, but thought I would >>> mention it. >>> >>> >>> >>> >>> Thank you in advance, >>> >>> Findley Finseth >>> >>> >>> PhD Candidate >>> >>> Dept Ecology and Evolutionary Biology >>> Cornell University >>> Ithaca, NY 14850 >>> >>> >>> >>> >>> >>> library(edgeR) >>>> >>> Loading required package: limma >>> >>>> >>>> # making a matrix of factors called "targets", Tissue = tissue, Ind2 = >>>> subject >>>> targets <-readTargets() >>>> targets >>>> >>> X Tissue Ind2 >>> 1 JQ_112_Liv_TG Liv 1 >>> 2 JQ_122-2_PG_A PG 2 >>> 3 JQ_179_Liv_AC Liv 3 >>> 4 JQ_191_Test_C Test 4 >>> 5 JQ_255_PG_AGT PG 5 >>> 6 JQ_107_Liv_CC Liv 6 >>> 7 JQ_107_PG_AGT PG 6 >>> 8 JQ_122-2_Test Test 2 >>> 9 JQ_191_Liv_AT Liv 4 >>> 10 JQ_107_Test_G Test 6 >>> 11 JQ_112_PG_CGA PG 1 >>> 12 JQ_112_Test_A Test 1 >>> 13 JQ_122-2_Liv_ Liv 2 >>> 14 JQ_191_PG_AGT PG 4 >>> 15 JQ_179_PG_AGT PG 3 >>> 16 JQ_179_Test_A Test 3 >>> 17 JQ_255_Liv_GA Liv 5 >>> 18 JQ_255_Test_G Test 5 >>> >>>> >>>> # importing raw data >>>> rawdata <- read.delim("controlonly_**practiceEdgeR.txt") >>>> head(rawdata) >>>> >>> Gene sample2 sample6 sample7 sample9 sample10 sample12 >>> sample13 sample17 >>> 1 comp326924_c0_seq1 23 7 3 6 8 16 >>> 5 6 >>> 2 comp434050_c0_seq1 2 0 0 26 0 0 >>> 0 34 >>> 3 comp28996_c0_seq1 344 161 191 284 144 354 >>> 114 338 >>> 4 comp1083897_c0_seq1 1 4 1 0 1 2 >>> 0 0 >>> 5 comp544783_c0_seq1 3 0 0 4 0 0 >>> 0 11 >>> 6 comp654539_c0_seq1 0 0 0 9 0 1 >>> 0 11 >>> sample20 sample22 sample24 sample25 sample29 sample30 sample36 sample37 >>> sample39 >>> 1 6 20 4 8 10 2 1 >>> 4 32 >>> 2 0 49 0 10 0 0 0 >>> 19 1 >>> 3 154 815 196 163 252 51 74 >>> 168 352 >>> 4 0 1 4 0 0 2 0 >>> 0 4 >>> 5 1 19 0 2 2 0 0 >>> 3 0 >>> 6 1 4 0 5 0 0 0 >>> 3 0 >>> sample40 >>> 1 2 >>> 2 18 >>> 3 182 >>> 4 0 >>> 5 2 >>> 6 3 >>> >>>> >>>> # setting groups equal to tissue differences >>>> group <- targets$Tissue >>>> group <- as.factor(group) >>>> group >>>> >>> [1] Liv PG Liv Test PG Liv PG Test Liv Test PG Test Liv PG >>> PG Test Liv >>> [18] Test >>> Levels: Liv PG Test >>> >>>> >>>> >>>> # making my DGE list >>>> y <- DGEList(counts=rawdata[,2:19],**genes=rawdata[,1],group=group) >>>> >>> Calculating library sizes from column totals. >>> >>>> >>>> # filtering out lowly expressed genes; since the smallest group size is >>>> six, we keeping genes with at least one count per million in at least six >>>> samples >>>> keep <- rowSums(cpm(y)>1) >= 6 >>>> y <- y[keep,] >>>> dim(y) # retained 30,140 genes >>>> >>> [1] 30140 18 >>> >>>> >>>> >>>> #recomputing the library sizes: >>>> y$samples$lib.size <- colSums(y$counts) >>>> >>>> #calculating normalization factors >>>> y <- calcNormFactors(y) >>>> y$samples # looks good >>>> >>> group lib.size norm.factors >>> sample2 Liv 13003017 1.2319091 >>> sample6 PG 12388919 0.7524641 >>> sample7 Liv 8045685 0.6490458 >>> sample9 Test 9164420 1.5566202 >>> sample10 PG 15292051 0.5119163 >>> sample12 Liv 13042773 0.8596694 >>> sample13 PG 11862830 0.6419963 >>> sample17 Test 9472129 1.6400614 >>> sample20 Liv 6514347 0.8206096 >>> sample22 Test 25926605 1.4096284 >>> sample24 PG 12334446 0.7888070 >>> sample25 Test 6283137 1.6230035 >>> sample29 Liv 19970487 0.8060990 >>> sample30 PG 6159314 0.7825644 >>> sample36 PG 3897369 0.9081739 >>> sample37 Test 5778563 1.6550333 >>> sample39 Liv 15619497 0.9618765 >>> sample40 Test 5076345 1.7061590 >>> >>>> >>>> >>>> >>>> # setting subject equal to individuals to reflect fact that samples are >>>> paired >>>> subject <- factor(targets$Ind2) >>>> >>>> ############### building design matrix; could use advice here regarding >>>> appropriateness########## >>>> design4 <- model.matrix(~subject + group) >>>> design4 >>>> >>> (Intercept) subject2 subject3 subject4 subject5 subject6 groupPG >>> groupTest >>> 1 1 0 0 0 0 0 0 0 >>> 2 1 1 0 0 0 0 1 0 >>> 3 1 0 1 0 0 0 0 0 >>> 4 1 0 0 1 0 0 0 1 >>> 5 1 0 0 0 1 0 1 0 >>> 6 1 0 0 0 0 1 0 0 >>> 7 1 0 0 0 0 1 1 0 >>> 8 1 1 0 0 0 0 0 1 >>> 9 1 0 0 1 0 0 0 0 >>> 10 1 0 0 0 0 1 0 1 >>> 11 1 0 0 0 0 0 1 0 >>> 12 1 0 0 0 0 0 0 1 >>> 13 1 1 0 0 0 0 0 0 >>> 14 1 0 0 1 0 0 1 0 >>> 15 1 0 1 0 0 0 1 0 >>> 16 1 0 1 0 0 0 0 1 >>> 17 1 0 0 0 1 0 0 0 >>> 18 1 0 0 0 1 0 0 1 >>> attr(,"assign") >>> [1] 0 1 1 1 1 1 2 2 >>> attr(,"contrasts") >>> attr(,"contrasts")$subject >>> [1] "contr.treatment" >>> >>> attr(,"contrasts")$group >>> [1] "contr.treatment" >>> >>> >>>> # estimating dispersions >>>> y4 <- estimateGLMCommonDisp(y,**design4) >>>> y4 <- estimateGLMTrendedDisp(y4,**design4) >>>> >>> Loading required package: splines >>> >>>> y4 <- estimateGLMTagwiseDisp(y4,**design4) >>>> >>>> # building contrasts >>>> >>>> ##############This is where things start to get a bit sticky for me; I >>>> am making some preliminary contrasts below, but am unsure of whether I've >>>> treated the intercept appropriately. I think it refers to groupLiv, though >>>> I have some uncertainty about this, given that subject is the first term in >>>> the model########## >>>> >>>> #########ultimately, I would like to compare (PG - (Liv+Test)/2) and >>>> so on for each tissue type########### >>>> >>>> my.contrast <- makeContrasts( >>>> >>> + Liv=(Intercept)-(groupPG+**groupTest)/2, >>> + PG=groupPG-((Intercept)+**groupTest)/2, >>> + Test=groupTest-(groupPG+(**Intercept))/2, >>> + levels=design4) >>> Warning message: >>> In makeContrasts(Liv = (Intercept) - (groupPG + groupTest)/2, PG = >>> groupPG - : >>> Renaming (Intercept) to Intercept >>> >>>> >>>> >>>> # fitting a glm >>>> fit <- glmFit(y4,design4) >>>> >>>> >>>> # likelihood ratio tests >>>> >>>> lrt.Liv <- glmLRT(fit, my.contrast[,"Liv"]) >>>> lrt.PG <- glmLRT(fit, my.contrast[,"PG"]) >>>> lrt.Testes <- glmLRT(fit, my.contrast[,"Test"]) >>>> >>>> #finding top tags; >>>> topTags(lrt.Liv) >>>> >>> Error in abs(object$table$logFC) : >>> Non-numeric argument to mathematical function >>> >>>> topTagslrt.PG) >>>> >>> Error in abs(object$table$logFC) : >>> Non-numeric argument to mathematical function >>> >>>> topTags(lrt.Testes) >>>> >>> Error in abs(object$table$logFC) : >>> Non-numeric argument to mathematical function >>> >>>> >>>> >>>> #### Note: I am also getting an error after using topTags about passing >>>> a non-numeric argument to a mathermatical function. I noticed there was a >>>> previous post about similarly passing a single contrast to a glmLRT and >>>> retrieving the same error, which was corrected in edgeR_3.0.7. As I am >>>> using 3.0.8, I am assuming the error has more to do with how I have set >>>> up/called my contrasts, but thought I would mention it. >>>> >>>> sessionInfo() >>>> >>> R version 2.15.2 (2012-10-26) >>> Platform: x86_64-apple-darwin9.8.0/x86_**64 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.**UTF-8/C/en_US.UTF-8/en_US.UTF-**8 >>> >>> attached base packages: >>> [1] splines stats graphics grDevices utils datasets methods >>> base >>> >>> other attached packages: >>> [1] edgeR_3.0.8 limma_3.14.4 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.15.2 >>> >>> >> > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:20}}
ADD REPLYlink written 6.0 years ago by Findley Finseth40
Dear List, I have 4 datasets and am interested in generating a pairs plot for each. However, I'm interested in all the 4 pairs plots to be in a single window. I tried mfrow but that didn't work, then I tried split.screen (code below) but that doesn't work either. # Window With 2 Rows, Top & Bottom Row In 2 Columns #jpeg("multScatter.jpg",height=1000,width=1000) split.screen(c(2,1)) # Makes Screen 1 and 2 split.screen(c(1,2), screen=1) # Makes Screen 3 and 4 split.screen(c(1,2), screen=2) # Makes Screen 5 and 6 # Pairs plots on Screens 3, 4, 5 & 6 screen(3) pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") screen(4) pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") screen(5) pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") screen(6) pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") #dev.off() ############### # Window With 1 Rows and 4 columns #jpeg("multScatter.jpg",height=1000,width=1000) split.screen(c(1,4)) # Makes Screen 1 to 4 screen(1) pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") screen(2) pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") screen(3) pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") screen(4) pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") #dev.off() Any help or alternatives would be highly appreacited! ManyThanks, Natasha
ADD REPLYlink written 6.0 years ago by Natasha440
Hi Natasha, The pairs() function from what package? By 'didn't work' what exactly do you mean? You are not giving us much to go on here. Best, Jim On 3/18/13 7:59 AM, Natasha Sahgal wrote: > Dear List, > > I have 4 datasets and am interested in generating a pairs plot for each. > However, I'm interested in all the 4 pairs plots to be in a single window. > > I tried mfrow but that didn't work, then I tried split.screen (code below) but that doesn't work either. > > # Window With 2 Rows, Top & Bottom Row In 2 Columns > #jpeg("multScatter.jpg",height=1000,width=1000) > split.screen(c(2,1)) # Makes Screen 1 and 2 > split.screen(c(1,2), screen=1) # Makes Screen 3 and 4 > split.screen(c(1,2), screen=2) # Makes Screen 5 and 6 > > # Pairs plots on Screens 3, 4, 5 & 6 > screen(3) > pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") > screen(4) > pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") > screen(5) > pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") > screen(6) > pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") > #dev.off() > > ############### > > # Window With 1 Rows and 4 columns > #jpeg("multScatter.jpg",height=1000,width=1000) > split.screen(c(1,4)) # Makes Screen 1 to 4 > screen(1) > pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") > screen(2) > pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") > screen(3) > pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") > screen(4) > pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") > #dev.off() > > Any help or alternatives would be highly appreacited! > > > ManyThanks, > Natasha > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 6.0 years ago by James W. MacDonald49k
Hi Jim, Sorry, didn't realise. The pairs function from the graphics package. So if I do something like: png("Data4.png") pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") dev.off() it works, and I generate 4 separate scatterplots for each dataset. However I want all 4 plots in a single figure/file, rather than having 4 separate files. (I.e. 1 row with 4 columns, or 2 rows with 2 columns each). Thus, I then tried, png("AllData.png") par(mfrow=c(1,4)) pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") dev.off() But this did not give me a single plot window with all 4 plots, rather just the last plot for Data 4. Then I found the split.screen function (graphics package), the code in my previous email, but that too gave just the final plot for data 4 (and this did look weird!) My Data is basically 4 datasets (e.g. array &/or RNA-Seq) each with 4 samples. Hope that makes sense. Many Thanks, Natasha sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] scatterplot3d_0.3-33 Hmisc_3.10-1 survival_2.36-14 [4] DESeq_1.10.1 lattice_0.20-10 locfit_1.5-8 [7] Biobase_2.18.0 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] annotate_1.36.0 AnnotationDbi_1.20.2 cluster_1.14.3 [4] DBI_0.2-5 genefilter_1.40.0 geneplotter_1.36.0 [7] grid_2.15.2 IRanges_1.16.4 parallel_2.15.2 [10] RColorBrewer_1.0-5 RSQLite_0.11.2 stats4_2.15.2 [13] tools_2.15.2 XML_3.95-0.1 xtable_1.7-0 -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 18 March 2013 20:44 To: Natasha Sahgal Cc: Bioconductor mailing list Subject: Re: [BioC] multiple pairs plot in a single window Hi Natasha, The pairs() function from what package? By 'didn't work' what exactly do you mean? You are not giving us much to go on here. Best, Jim On 3/18/13 7:59 AM, Natasha Sahgal wrote: > Dear List, > > I have 4 datasets and am interested in generating a pairs plot for each. > However, I'm interested in all the 4 pairs plots to be in a single window. > > I tried mfrow but that didn't work, then I tried split.screen (code below) but that doesn't work either. > > # Window With 2 Rows, Top & Bottom Row In 2 Columns > #jpeg("multScatter.jpg",height=1000,width=1000) > split.screen(c(2,1)) # Makes Screen 1 and 2 split.screen(c(1,2), > screen=1) # Makes Screen 3 and 4 split.screen(c(1,2), screen=2) # > Makes Screen 5 and 6 > > # Pairs plots on Screens 3, 4, 5 & 6 > screen(3) > pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") > screen(4) > pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") > screen(5) > pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") > screen(6) > pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") > #dev.off() > > ############### > > # Window With 1 Rows and 4 columns > #jpeg("multScatter.jpg",height=1000,width=1000) > split.screen(c(1,4)) # Makes Screen 1 to 4 > screen(1) > pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") > screen(2) > pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") > screen(3) > pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") > screen(4) > pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") > #dev.off() > > Any help or alternatives would be highly appreacited! > > > ManyThanks, > Natasha > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 6.0 years ago by Natasha440
Hi Natasha, I see. Well, you can't use either split.screen() or par(mfrow()) and pairs() at the same time, as the pairs() function itself internally uses par(mfrow()) to create all the pairs plots. So what happens is you are changing the graphics settings, then the pairs() function saves those parameters, sets them the way it needs them, does the plot, and the resets back to your settings before exiting. Anyway, I am not entirely clear what you want. If you simply want a single file with four pages in it, then you want to use a pdf output. IIRC, none of the other graphics drivers support multiple pages. If you really want them all on a single page, then you might need to write a function of your own based in part on the code in pairs(). Best, Jim On 3/19/13 6:12 AM, Natasha Sahgal wrote: > Hi Jim, > > Sorry, didn't realise. > > The pairs function from the graphics package. So if I do something like: > > png("Data4.png") > pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") > dev.off() > > it works, and I generate 4 separate scatterplots for each dataset. > > However I want all 4 plots in a single figure/file, rather than having 4 separate files. (I.e. 1 row with 4 columns, or 2 rows with 2 columns each). > > Thus, I then tried, > > png("AllData.png") > par(mfrow=c(1,4)) > pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") > pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") > pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") > pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") > dev.off() > > But this did not give me a single plot window with all 4 plots, rather just the last plot for Data 4. Then I found the split.screen function (graphics package), the code in my previous email, but that too gave just the final plot for data 4 (and this did look weird!) > > > My Data is basically 4 datasets (e.g. array &/or RNA-Seq) each with 4 samples. > > > Hope that makes sense. > > Many Thanks, > Natasha > > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] scatterplot3d_0.3-33 Hmisc_3.10-1 survival_2.36-14 > [4] DESeq_1.10.1 lattice_0.20-10 locfit_1.5-8 > [7] Biobase_2.18.0 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] annotate_1.36.0 AnnotationDbi_1.20.2 cluster_1.14.3 > [4] DBI_0.2-5 genefilter_1.40.0 geneplotter_1.36.0 > [7] grid_2.15.2 IRanges_1.16.4 parallel_2.15.2 > [10] RColorBrewer_1.0-5 RSQLite_0.11.2 stats4_2.15.2 > [13] tools_2.15.2 XML_3.95-0.1 xtable_1.7-0 > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 18 March 2013 20:44 > To: Natasha Sahgal > Cc: Bioconductor mailing list > Subject: Re: [BioC] multiple pairs plot in a single window > > Hi Natasha, > > The pairs() function from what package? By 'didn't work' what exactly do you mean? You are not giving us much to go on here. > > Best, > > Jim > > > On 3/18/13 7:59 AM, Natasha Sahgal wrote: >> Dear List, >> >> I have 4 datasets and am interested in generating a pairs plot for each. >> However, I'm interested in all the 4 pairs plots to be in a single window. >> >> I tried mfrow but that didn't work, then I tried split.screen (code below) but that doesn't work either. >> >> # Window With 2 Rows, Top & Bottom Row In 2 Columns >> #jpeg("multScatter.jpg",height=1000,width=1000) >> split.screen(c(2,1)) # Makes Screen 1 and 2 split.screen(c(1,2), >> screen=1) # Makes Screen 3 and 4 split.screen(c(1,2), screen=2) # >> Makes Screen 5 and 6 >> >> # Pairs plots on Screens 3, 4, 5 & 6 >> screen(3) >> pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") >> screen(4) >> pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") >> screen(5) >> pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") >> screen(6) >> pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") >> #dev.off() >> >> ############### >> >> # Window With 1 Rows and 4 columns >> #jpeg("multScatter.jpg",height=1000,width=1000) >> split.screen(c(1,4)) # Makes Screen 1 to 4 >> screen(1) >> pairs(a, upper.panel=panel.cor, labels=group, main="Data 1") >> screen(2) >> pairs(b,upper.panel=panel.cor,labels=group,main="Data 2") >> screen(3) >> pairs(c, upper.panel=panel.cor,labels=group,main="Data 3") >> screen(4) >> pairs(d, upper.panel=panel.cor, labels=group, main="Data 4") >> #dev.off() >> >> Any help or alternatives would be highly appreacited! >> >> >> ManyThanks, >> Natasha >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 6.0 years ago by James W. MacDonald49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 350 users visited in the last hour